stereo.core.StPipeline.highly_variable_genes#

StPipeline.highly_variable_genes(groups=None, method='seurat', n_top_genes=2000, min_disp=0.5, max_disp=inf, min_mean=0.0125, max_mean=3, span=0.3, n_bins=20, res_key='highly_variable_genes')[source]#

Annotate highly variable genes, refering to Scanpy. Which method to implement depends on flavor,including Seurat [Satija15] , Cell Ranger [Zheng17] and Seurat v3 [Stuart19].

Parameters:
  • groups (Optional[str]) – if specified, highly variable genes are selected within each batch separately and merged, which simply avoids the selection of batch-specific genes and acts as a lightweight batch correction method. For all flavors, genes are first sorted by how many batches they are a HVG. For dispersion-based flavors ties are broken by normalized dispersion. If flavor is 'seurat_v3', ties are broken by the median (across batches) rank based on within- batch normalized variance.

  • method (Literal['seurat', 'cell_ranger', 'seurat_v3']) – Choose the flavor to identify highly variable genes. For the dispersion-based methods in their default workflows, Seurat passes the cutoffs whereas Cell Ranger passes n_top_genes.

  • n_top_genes (Optional[int]) – number of highly variable genes to keep. Mandatory if flavor='seurat_v3'.

  • min_disp (Optional[float]) – if n_top_genes is not None, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor='seurat_v3'.

  • max_disp (Optional[float]) – if n_top_genes is not None, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor='seurat_v3'.

  • min_mean (Optional[float]) – if n_top_genes is not None, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor='seurat_v3'.

  • max_mean (Optional[float]) – if n_top_genes is not None, this and all other cutoffs for the means and the normalized dispersions are ignored. Ignored if flavor='seurat_v3'.

  • span (Optional[float]) – the fraction of data (cells) used when estimating the variance in the Loess model fit if flavor='seurat_v3'.

  • n_bins (int) – number of bins for binning the mean gene expression. Normalization is done with respect to each bin. If just a single gene falls into a bin, the normalized dispersion is artificially set to 1.

  • res_key – the key for getting the result from self.result.

Return type:

An object of StereoExpData with the result of highly variable genes.