scQUEST.individuality module

Summary

Classes:

Individuality

Computes the individuality of each observation in the data set according to [Wagner2019].

Reference

class Individuality(n_neighbors=100, radius=1.0, graph_type='knn', graph=None, prior='frequency', metric='minkowski', metric_params=<factory>, nn_params=<factory>)[source]

Bases: object

Computes the individuality of each observation in the data set according to [Wagner2019].

n_neighbors

number of neighbors in kNN graph

Type: Union[None, int]

radius

radius in radius graph

Type: Union[None, float]

graph_type

type of graph to build, either knn or radius

Type: str

graph

if provided, uses this graph instead of construction an own one

Type: Union[None, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix]

prior

either frequency, uniform or custom prior probabilities for each class/group/label. If set to frequency the empirical class/group probabilities are used as prior. If set to uniform all classes/groups have the same prior probability.

Type: Union[str, numpy.ndarray]

metric

distance metric to use when constructing the graph topology

Type: Union[Callable[[numpy.ndarray, numpy.ndarray], float], str]

metric_params

additional kwargs passed to metric

Type: dict

nn_params

additional kwargs passed to NearestNeighbors

Type: dict

Notes

The posterior class probabilities for each observation are computed as follows [Bishop]:

\[\begin{split}p(c_i | x) &= \frac{p(x | c_i) * p(c_i)}{p(x)} \\ p(x | c_i) &= \frac{K_i}{N_i * V} \\ p(x) &= \sum_{i \in C}p(x | c_i)*p(c_i) \\ p(c_i) &= \frac{N_i}{N} \;\; \texttt{if prior=frequency} \\ p(c_i) &= \frac{1}{|C|} \;\; \texttt{if prior=uniform} \\\end{split}\]

Where

\(C\) is the set of classes and \(c_i \in C\) a particular class

\(x\) is the feature vector of an observation

\(K_i\) the number of neighbor of class \(c_i\)

\(V\) the volume of the sphere that either

contains n_neighbors or

has radius radius.

\(N\) the total number of observations and \(N_i\) the number of observations belonging to class \(c_i\)

The resulting observation-level class probabilities estimates (\(N \times |C|\)) are aggregated (averaged) by each class to result in class-level individuality estimates (\(|C| \times |C|\)).

Returns: Instance of Individuality.

n_neighbors: Union[None, int] = 100

radius: Union[None, float] = 1.0

graph_type: str = 'knn'

graph: Union[None, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix] = None

prior: Union[str, numpy.ndarray] = 'frequency'

metric: Union[Callable[[numpy.ndarray, numpy.ndarray], float], str] = 'minkowski'

metric_params: dict

nn_params: dict

__post_init__()[source]

predict(ad, labels, layer=None, inplace=True)[source]

Performs prediction of the individuality of each observation and aggregates (average) results for each label. If you wish to access the posterior probabilities for each observation (cell) use compute_individuality().

Parameters

X – matrix with observations as rows and columns as features.
labels (Union[Iterable, Categorical]) – indicator for the group/sample an observation belongs to.

Return type

DataFrame

Returns

DataFrame with rows as observations and columns with the estimated probability to belong in the given group/sample

static compute_individuality(g, num_seq_labs, prior)[source]

Computes the observation-level individuality based on a given graph structure and labels. See Individuality for an in-depth description.

Parameters

g (Union[ndarray, csr_matrix, csc_matrix]) – graph encoding the observation-level interactions
num_seq_labs (ndarray) – labels of the observations. Must be sequential and numeric, i.e. [0,1,2,…,N]
prior (Union[str, ndarray]) – either frequency, uniform or custom prior probabilities for each class/group/label. If set to frequency the empirical class/group probabilities are used as prior. If set to uniform all classes/groups have the same prior probability.

Return type

ndarray

Returns

\(N\times |C|\) ndarray with \(N\) observations across \(|C|\) classes.

_build_topology(x)[source]

Return type: Union[ndarray, csr_matrix, csc_matrix]

_check_args()[source]

Return type: None

__annotations__ = {'graph': typing.Union[NoneType, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix], 'graph_type': <class 'str'>, 'metric': typing.Union[typing.Callable[[numpy.ndarray, numpy.ndarray], float], str], 'metric_params': <class 'dict'>, 'n_neighbors': typing.Union[NoneType, int], 'nn_params': <class 'dict'>, 'prior': typing.Union[str, numpy.ndarray], 'radius': typing.Union[NoneType, float]}

__dataclass_fields__ = {'graph': Field(name='graph',type=typing.Union[NoneType, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'graph_type': Field(name='graph_type',type=<class 'str'>,default='knn',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'metric': Field(name='metric',type=typing.Union[typing.Callable[[numpy.ndarray, numpy.ndarray], float], str],default='minkowski',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'metric_params': Field(name='metric_params',type=<class 'dict'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'n_neighbors': Field(name='n_neighbors',type=typing.Union[NoneType, int],default=100,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'nn_params': Field(name='nn_params',type=<class 'dict'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'prior': Field(name='prior',type=typing.Union[str, numpy.ndarray],default='frequency',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'radius': Field(name='radius',type=typing.Union[NoneType, float],default=1.0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}

__dataclass_params__ = _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False)

__dict__ = mappingproxy({'__module__': 'scQUEST.individuality', '__annotations__': {'n_neighbors': typing.Union[NoneType, int], 'radius': typing.Union[NoneType, float], 'graph_type': <class 'str'>, 'graph': typing.Union[NoneType, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix], 'prior': typing.Union[str, numpy.ndarray], 'metric': typing.Union[typing.Callable[[numpy.ndarray, numpy.ndarray], float], str], 'metric_params': <class 'dict'>, 'nn_params': <class 'dict'>}, '__doc__': 'Computes the individuality of each observation in the data set according to [Wagner2019]_.\n\n Attributes:\n n_neighbors: number of neighbors in *k*\\ NN graph\n radius: radius in radius graph\n graph_type: type of graph to build, either ``knn`` or ``radius``\n graph: if provided, uses this graph instead of construction an own one\n prior:\n either ``frequency``, ``uniform`` or custom prior probabilities for each class/group/label.\n If set to ``frequency`` the empirical class/group probabilities are used as prior.\n If set to ``uniform`` all classes/groups have the same prior probability.\n metric: distance metric to use when constructing the graph topology\n metric_params: additional kwargs passed to :attr:`~Individuality.metric`\n nn_params: additional kwargs passed to :class:`~.sklearn.NearestNeighbors`\n\n Notes:\n The posterior class probabilities for each observation are computed as follows [Bishop]_:\n\n .. math::\n\n p(c_i | x) &= \\frac{p(x | c_i) * p(c_i)}{p(x)} \\\\\n p(x | c_i) &= \\frac{K_i}{N_i * V} \\\\\n p(x) &= \\sum_{i \\in C}p(x | c_i)*p(c_i) \\\\\n p(c_i) &= \\frac{N_i}{N} \\;\\; \\texttt{if prior=frequency} \\\\\n p(c_i) &= \\frac{1}{|C|} \\;\\; \\texttt{if prior=uniform} \\\\\n\n Where\n\n - :math:`C` is the set of classes and :math:`c_i \\in C` a particular class\n - :math:`x` is the feature vector of an observation\n - :math:`K_i` the number of neighbor of class :math:`c_i`\n - :math:`V` the volume of the sphere that either\n\n - contains :attr:`~Individuality.n_neighbors` or\n - has radius :attr:`~Individuality.radius`.\n\n - :math:`N` the total number of observations and :math:`N_i` the number of observations belonging to class :math:`c_i`\n\n The resulting observation-level class probabilities estimates (:math:`N \\times |C|`) are aggregated (averaged) by each class\n to result in class-level individuality estimates (:math:`|C| \\times |C|`).\n\n Returns:\n Instance of :class:`Individuality`.\n\n ', 'n_neighbors': 100, 'radius': 1.0, 'graph_type': 'knn', 'graph': None, 'prior': 'frequency', 'metric': 'minkowski', '__post_init__': <function Individuality.__post_init__>, 'predict': <function Individuality.predict>, 'compute_individuality': <staticmethod object>, '_build_topology': <function Individuality._build_topology>, '_check_args': <function Individuality._check_args>, '__dict__': <attribute '__dict__' of 'Individuality' objects>, '__weakref__': <attribute '__weakref__' of 'Individuality' objects>, '__dataclass_params__': _DataclassParams(init=True,repr=True,eq=True,order=False,unsafe_hash=False,frozen=False), '__dataclass_fields__': {'n_neighbors': Field(name='n_neighbors',type=typing.Union[NoneType, int],default=100,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'radius': Field(name='radius',type=typing.Union[NoneType, float],default=1.0,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'graph_type': Field(name='graph_type',type=<class 'str'>,default='knn',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'graph': Field(name='graph',type=typing.Union[NoneType, numpy.ndarray, scipy.sparse._csr.csr_matrix, scipy.sparse._csc.csc_matrix],default=None,default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'prior': Field(name='prior',type=typing.Union[str, numpy.ndarray],default='frequency',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'metric': Field(name='metric',type=typing.Union[typing.Callable[[numpy.ndarray, numpy.ndarray], float], str],default='minkowski',default_factory=<dataclasses._MISSING_TYPE object>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'metric_params': Field(name='metric_params',type=<class 'dict'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), 'nn_params': Field(name='nn_params',type=<class 'dict'>,default=<dataclasses._MISSING_TYPE object>,default_factory=<class 'dict'>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD)}, '__init__': <function __create_fn__.<locals>.__init__>, '__repr__': <function __create_fn__.<locals>.__repr__>, '__eq__': <function __create_fn__.<locals>.__eq__>, '__hash__': None})

__doc__ = 'Computes the individuality of each observation in the data set according to [Wagner2019]_.\n\n Attributes:\n n_neighbors: number of neighbors in *k*\\ NN graph\n radius: radius in radius graph\n graph_type: type of graph to build, either ``knn`` or ``radius``\n graph: if provided, uses this graph instead of construction an own one\n prior:\n either ``frequency``, ``uniform`` or custom prior probabilities for each class/group/label.\n If set to ``frequency`` the empirical class/group probabilities are used as prior.\n If set to ``uniform`` all classes/groups have the same prior probability.\n metric: distance metric to use when constructing the graph topology\n metric_params: additional kwargs passed to :attr:`~Individuality.metric`\n nn_params: additional kwargs passed to :class:`~.sklearn.NearestNeighbors`\n\n Notes:\n The posterior class probabilities for each observation are computed as follows [Bishop]_:\n\n .. math::\n\n p(c_i | x) &= \\frac{p(x | c_i) * p(c_i)}{p(x)} \\\\\n p(x | c_i) &= \\frac{K_i}{N_i * V} \\\\\n p(x) &= \\sum_{i \\in C}p(x | c_i)*p(c_i) \\\\\n p(c_i) &= \\frac{N_i}{N} \\;\\; \\texttt{if prior=frequency} \\\\\n p(c_i) &= \\frac{1}{|C|} \\;\\; \\texttt{if prior=uniform} \\\\\n\n Where\n\n - :math:`C` is the set of classes and :math:`c_i \\in C` a particular class\n - :math:`x` is the feature vector of an observation\n - :math:`K_i` the number of neighbor of class :math:`c_i`\n - :math:`V` the volume of the sphere that either\n\n - contains :attr:`~Individuality.n_neighbors` or\n - has radius :attr:`~Individuality.radius`.\n\n - :math:`N` the total number of observations and :math:`N_i` the number of observations belonging to class :math:`c_i`\n\n The resulting observation-level class probabilities estimates (:math:`N \\times |C|`) are aggregated (averaged) by each class\n to result in class-level individuality estimates (:math:`|C| \\times |C|`).\n\n Returns:\n Instance of :class:`Individuality`.\n\n '

__eq__(other): Return self==value.

__hash__ = None

__init__(n_neighbors=100, radius=1.0, graph_type='knn', graph=None, prior='frequency', metric='minkowski', metric_params=<factory>, nn_params=<factory>)

__module__ = 'scQUEST.individuality'

__repr__(): Return repr(self).

__weakref__: list of weak references to the object (if defined)