stereo.algorithm.community_detection.CommunityDetection.main#

CommunityDetection.main(**kwargs)[source]#

CCD divides the tissue using sliding windows by accommodating multiple window sizes, and enables the simultaneous analysis of multiple slices from the same tissue. CCD consists of the three main steps:

Single or multiple-size sliding windows ($w$) are moved through the surface of the tissue with defined horizontal and vertical step while calculating the percentages ($[p_1, p_2,…,p_n]$) of each cell type inside of it. A feature vector ($fv$) with size equal to the number of cell types ($n$) is created for each processed window across all available tissue slices:

\[\begin{equation} \forall w_i\rightarrow (fv_i = [p_1, p_2,...,p_n]) \end{equation}\]
Feature vectors from all windows are fed to the clustering algorithm ($C$) such as Leiden, Spectral or Hierarchical to obtain community labels ($l$). The number of the desired communities ($cn$) can be predefined explicitly as a parameter (Spectral or Hierarchical clustering) or by setting the resolution of clustering (Leiden):

\[\begin{equation} C(\forall fv_i) \rightarrow l_i, l_i \in {l_1, l_2, ..., l_{cn}} \end{equation}\]
Community label is assigned to each cell-spot ($cs$) by majority voting ($MV$) using community labels from all windows covering it:

\[\begin{equation} MV(\forall l_i)\text{ where } spatial(cs_j) \in w_i \rightarrow l_j, l_j \in {l_1, l_2, ..., l_{cn}} \end{equation}\]

The window size and sliding step are optional CCD parameters. If not provided, the optimal window size is calculated throughout the iterative process with goal of having average number of cell-spots in all windows in range [30, 50]. Sliding step is set to the half of the window size.

Note

All the parameters are key word arguments.

Parameters:

annotation – The key specified the cell type in obs.
tfile – File path to Anndata object with calculated cell mixtures for data windows, output of calc_feature_matrix.
out_path – Absolute path to store outputs, default to ‘./results’.
cluster_algo – Clustering algorithm, default to leiden.
resolution – Resolution of leiden clustering algorithm. Ignored for spectral and agglomerative, default to 0.2.
n_clusters – Number of clusters for spectral and agglomerative clustering, ignored for leiden, default to 10.
spot_size – Size of the spot on plot, default to 30.
verbose – Show logging messages. 0 - Show warnings, >0 show info, default to 0.
plotting –
Save plots flag, default to 5, available values include:

0 - No plotting and saving.

1 - save clustering plot.

2 - additionally save plots of cell type images statistics and cell mixture plots.

3 - additionally save cell and cluster abundance plots and cell mixture plots for all slices and cluster mixture plots and boxplots for each slice.

4 - additionally save cell type images, abundance plots and cell percentage table for each slice.

5 - additionally save color plots.
project_name – Project name that is used to name a directory containing all the slices used, default to community.
skip_stats – Skip statistics calculation on cell community clustering result. A table of cell mixtures and comparative spatial plots of cell types and mixtures will not be created, default to False.
total_cell_norm – Total number of cells per window mixture after normalization, default to 10000.
downsample_rate – Rate by which the binary image of cells is downsampled before calculating the entropy and scatteredness metrics. If no value is provided, downsample_rate will be equal to 1/2 of minimal window size, default to None.
num_threads – Number of threads that will be used to speed up community calling, default to 5.
entropy_thres – Threshold value for spatial cell type entropy for filtering out overdispersed cell types, default to 1.0.
scatter_thres – Threshold value for spatial cell type scatteredness for filtering out overdispersed cell types, default to 1.0.
win_sizes – Comma separated list of window sizes for analyzing the cell community.
sliding_steps – Comma separated list of sliding steps for sliding window.
min_cluster_size – Minimum number of cell for cluster to be plotted in plot_stats(), default to 200.
min_perc_to_show – Minimum percentage of cell type in cluster for cell type to be plotted in plot_stats(), default to 4.
min_num_celltype – Minimum number of cell types that have more than min_perc_celltype in a cluster, for a cluster to be shown in plot_celltype_table(), default to 1.
min_perc_celltype – Minimum percentage of cells of a cell type which at least min_num_celltype cell types need to have to show a cluster in plot_celltype_table().
min_cells_coeff – Multiple od standard deviations from mean values where the cutoff for m, default to 1.5.
color_plot_system – Color system for display of cluster specific windows, default rgb.
save_adata – Save adata file with resulting .obs column of cell community labels, default to False.
min_count_per_type – Minimum number of cells per cell type needed to use the cell type for cell communities extraction (in percentages), default to 0.1.
hide_plots – Stop plots from displaying in notebooks or standard ouput. Used for batch processing, default to True.
dpi – DPI (dots per inch) used for plotting figures, default to 100.

Returns:

Object of CommunityDetection.