Comparative Analysis#

In scientific research, comparing disturbed or disease samples with control samples allows for the exploration of changes in functional mechanisms at both global and local levels. Stereopy is specifically designed to identify global and local diversities in comparative samples by employing cell-level and gene-level analysis modules, supported by novel algorithms. The cell-level analysis focuses on cell diversity in terms of cell type, cell-cell occurrence, and cell community via multi-sample comparisons. At gene level, Stereopy investigates gene diversity within cell types and cell communities, proposing the concept of constant and conditional markers [Fang23].

Here we use Slide-seq mouse kidney samples to test comparative analysis [Marshall22]. Please download test data, and here also offers the access to full data from the paper.

Preparing data#

[1]:
import stereo as st
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
[2]:
ms_data = MSData()
data = st.io.read_h5ad('../data/Puck_191223_19_rename.h5ad')
data._ann_data.obsm['spatial'] = data._ann_data.obsm['X_spatial']
ms_data['WT'] = data
data = st.io.read_h5ad('../data/Puck_200104_07_rename.h5ad')
data._ann_data.obsm['spatial'] = data._ann_data.obsm['X_spatial']
ms_data['UMOD_KI'] = data

## merge data
ms_data.integrate()

## merged author_cell_type col to ms_data
ms_data.to_integrate(res_key = 'author_cell_type', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['author_cell_type']*ms_data.num_slice)
[3]:
ms_data.plt.cluster_scatter(
    mode='integrate',
    scope=slice_generator[:],
    res_key='author_cell_type',
    reorganize_coordinate=2,
    horizontal_offset_additional=500
    )
[2023-11-22 14:08:13][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[3]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_4_3.png

Cell Co-occurrence of Comparative Analysis#

Here we use cell co-occurrence to discover the global cell diversity.

See Co-occurrence for more imformation.

[4]:
ms_data['WT'].tl.co_occurrence(
        method='stereopy',
        cluster_res_key='author_cell_type',
        res_key='co-occurrence',
        dist_thres=180, # max threshold to measure co-occurence
        steps=6,  # step numbers to divide threshold interval evenly
        genelist=None,
        gene_thresh=0, # min threshold for gene expression in a cell
        n_jobs=-1
        )
[2023-11-22 14:08:14][Stereo][192767][MainThread][140261435713344][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f911c1ed430>
[4]:
AnnData object with n_obs × n_vars = 36299 × 21296
    obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'schema_version', 'title', 'sn', 'co-occurrence'
    obsm: 'X_spatial', 'spatial'
[5]:
ms_data['UMOD_KI'].tl.co_occurrence(
        method='stereopy',
        cluster_res_key='author_cell_type',
        res_key='co-occurrence',
        dist_thres=180, # max threshold to measure co-occurence
        steps=6,  # step numbers to divide threshold interval evenly
        genelist=None,
        gene_thresh=0, # min threshold for gene expression in a cell
        n_jobs=-1
        )
[2023-11-22 14:08:26][Stereo][192767][MainThread][140261435713344][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f90de87ad60>
[5]:
AnnData object with n_obs × n_vars = 31194 × 21263
    obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'author_cell_type_colors', 'schema_version', 'title', 'sn', 'co-occurrence'
    obsm: 'X_spatial', 'spatial'

Co-occurrence on multi samples#

[6]:
from stereo.algorithm.co_occurrence import  CoOccurrence
CoOccurrence.ms_co_occur_integrate(
    ms_data=ms_data,
    scope='UMOD_KI|WT',
    use_col='author_cell_type',
    res_key='co-occurrence'
    )
[7]:
ms_data.plt.co_occurrence_heatmap(
    mode='integrate',
    scope=slice_generator[:],
    res_key='co-occurrence',
    cluster_res_key='author_cell_type'
    )
[2023-11-22 14:08:34][Stereo][192767][MainThread][140261435713344][ms_pipeline][128][INFO]: register plot_func co_occurrence_heatmap to <class 'stereo.core.stereo_exp_data.StereoExpData'>-140260185451152
[7]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_10_3.png
[8]:
ms_data.plt.co_occurrence_plot(
    mode='integrate',
    scope=slice_generator[:],
    res_key='co-occurrence',
    groups=['Fibroblast']
    )
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][ms_pipeline][128][INFO]: register plot_func co_occurrence_plot to <class 'stereo.core.stereo_exp_data.StereoExpData'>-140260185451152
[8]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_11_3.png

From Above, we can see the increment of co-occurrence of Fibroblase with other cell types except for MC and CD-IC from WT to UMOD KI samples.

Cell Community Detection of Comparative Analysis#

Here we also used CCD to explore local cell diversity.

See also Cell Community Detection for more information.

[9]:
ms_ccd = ms_data.tl.ms_community_detection(
        annotation='author_cell_type',
        win_sizes='300,150',
        sliding_steps='150,50',
        resolution=0.2,
        # scatter_thres=0.2,
        cluster_algo='leiden',
        n_clusters=7,
        out_path="./ccd_result",
        plotting=3,
        hide_plots=True
        )
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_0
                     window size: 300
                     sliding step: 150
                     cells mean: 150.62
                     cells median: 169.0
                     num horizontal windows: 16
                     num vertical windows: 16


[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_0
                     window size: 150
                     sliding step: 50
                     cells mean: 40.51
                     cells median: 43.0
                     num horizontal windows: 32
                     num vertical windows: 32


[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_1
                     window size: 300
                     sliding step: 150
                     cells mean: 132.18
                     cells median: 150.0
                     num horizontal windows: 16
                     num vertical windows: 16


[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_1
                     window size: 150
                     sliding step: 50
                     cells mean: 36.10
                     cells median: 38.0
                     num horizontal windows: 32
                     num vertical windows: 32


[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][96][INFO]: Downsample rate is not provided by user - proceeding to calculate one based on minimal window size.
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][100][INFO]: donwsample_rate = 75
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 3 cells < 36 (0.1 % of 36299)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 19 cells < 36 (0.1 % of 36299)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1142s
set rcParams
set original_rcParams
[2023-11-22 14:08:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_annotation took 0.5254s
[2023-11-22 14:08:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 0.2566s
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 1.0625s
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 2 cells < 31 (0.1 % of 31194)
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 31 cells < 31 (0.1 % of 31194)
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1016s
set rcParams
[2023-11-22 14:08:40][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_annotation took 0.4851s
[2023-11-22 14:08:40][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 0.7578s
[2023-11-22 14:08:41][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 1.0130s
[2023-11-22 14:08:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1157s
[2023-11-22 14:08:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1031s
[2023-11-22 14:08:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_annotation took 0.9576s
[2023-11-22 14:08:43][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run neighbors...
[2023-11-22 14:10:58][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: neighbors end, consume time 135.1017s.
[2023-11-22 14:10:58][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run leiden...
[2023-11-22 14:10:59][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: leiden end, consume time 1.0295s.
[2023-11-22 14:10:59][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function cluster took 136.2322s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.9303s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][758][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_0.csv
... storing 'leiden' as categorical
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_0.h5ad.
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1789s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_clustering took 0.5976s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0247s
set rcParams
[2023-11-22 14:11:08][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_stats took 3.7973s
set rcParams
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 1.7224s
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1186s
set rcParams
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1055s
[2023-11-22 14:11:11][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1095s
[2023-11-22 14:11:11][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1054s
[2023-11-22 14:11:12][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1121s
[2023-11-22 14:11:12][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1027s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1107s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1032s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1124s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1021s
[2023-11-22 14:11:14][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1086s
[2023-11-22 14:11:14][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1128s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.7562s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1040s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_mixtures took 6.3520s
set rcParams
[2023-11-22 14:11:19][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function boxplot_stats took 2.9817s
[2023-11-22 14:11:19][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_0_stats.h5ad.
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.4321s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][758][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_1.csv
... storing 'leiden' as categorical
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_1.h5ad.
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1434s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_clustering took 0.5420s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0211s
set rcParams
[2023-11-22 14:11:28][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_stats took 3.6931s
set rcParams
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 1.4127s
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1038s
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0960s
set rcParams
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1006s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0952s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0959s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0916s
[2023-11-22 14:11:32][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1018s
[2023-11-22 14:11:32][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0967s
[2023-11-22 14:11:33][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0982s
[2023-11-22 14:11:33][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0968s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1004s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0906s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_mixtures took 4.5858s
set rcParams
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function boxplot_stats took 3.4428s
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_1_stats.h5ad.
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1040s
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1003s
[2023-11-22 14:11:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_clustering took 0.8462s
set rcParams
[2023-11-22 14:11:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_mixtures_total took 3.6870s
set rcParams
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cell_abundance_total took 0.3357s
set rcParams
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_abundance_total took 0.2620s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function generate_report took 0.1031s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function _main took 185.9941s
reset original_rcParams
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>

Visualization for CCD#

[10]:
ms_ccd.plot('all_clustering')
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1063s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0925s
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_16_1.png
[2023-11-22 14:11:44][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_clustering took 1.4686s
reset original_rcParams
[11]:
ms_ccd.plot('cell_types_table')
set rcParams
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_17_1.png
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 2.1316s
reset original_rcParams
[12]:
## transfer ccd result to ms_data

ms_data.to_integrate(res_key = 'cell_communities', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['cell_communities']*ms_data.num_slice)

Manually merge cell communities to tissue#

[13]:
anno_dict = {'0':'medulla',
             '1':'cortex',
             '2':'boundary',
             '3':'pelvis',
             '4':'other_immune',
             '5':'cortex',
             '6':'pelvis',
             'unknown':'unknown'}

ms_data.tl.annotation(
    mode='integrate',
    scope=slice_generator[:],
    annotation_information=anno_dict,
    cluster_res_key='cell_communities',
    res_key='tissue'
    )

ms_data.tl.annotation(
    mode='isolated',
    scope=slice_generator[:],
    annotation_information=anno_dict,
    cluster_res_key='cell_communities',
    res_key='tissue'
    )
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: annotation end, consume time 0.2616s.
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][st_pipeline][44][INFO]: annotation end, consume time 0.1843s.
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.2s
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][st_pipeline][44][INFO]: annotation end, consume time 0.2591s.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.3s finished
[14]:
ms_data.plt.cluster_scatter(
    mode='integrate',
    scope=slice_generator[:],
    res_key='tissue',
    reorganize_coordinate=2,
    horizontal_offset_additional=500
    )
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[14]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_21_3.png

Gene diversity of cell community v.s. cell type#

Taken WT as example

[15]:
ms_data.tl.raw_checkpoint(mode='isolated')
ms_data.tl.raw_checkpoint(mode='integrate')
[2023-11-22 14:11:48][Stereo][192767][Thread-183][140250624255744][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2023-11-22 14:11:48][Stereo][192767][Thread-184][140250632648448][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished

Preprocessing for marker#

[16]:
ms_data.tl.normalize_total()
ms_data.tl.log1p()
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: normalize_total end, consume time 0.2144s.
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: log1p end, consume time 0.5883s.

Marker for tissue#

[17]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False)
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run find_marker_genes...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][tool_base][151][INFO]: start to run...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:11:50][Stereo][192767][MainThread][140261435713344][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 1.2010s.
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][tool_base][153][INFO]: end to run.
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: find_marker_genes end, consume time 2.2644s.
[18]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes')
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[18]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_28_3.png

Plot medulla markers#

[19]:
medulla_marker = ms_data.tl.result['scope_[0,1]']['marker_genes']['medulla.vs.rest']

medulla_marker = medulla_marker.loc[(medulla_marker['pvalues_adj'] < 0.05) & (medulla_marker['pct_rest'] < 0.2)]

medulla_marker = medulla_marker.sort_values('scores', ascending=False)
medulla_marker_list = medulla_marker['genes'][:8]
medulla_marker
[19]:
scores pvalues pvalues_adj log2fc genes pct pct_rest
16148 114.030865 0.0 0.0 3.427668 Slc12a1 0.541348 0.186195
14214 60.757634 0.0 0.0 2.418859 Ppp1r1a 0.226809 0.070483
19125 54.568371 0.0 0.0 2.082375 Wfdc15b 0.227547 0.089344
18748 54.432050 0.0 0.0 1.929499 Umod 0.259800 0.112661
3704 45.836450 0.0 0.0 2.118043 Ckb 0.148778 0.050720
... ... ... ... ... ... ... ...
16382 -84.253558 0.0 0.0 -4.977401 Slc3a1 0.007048 0.191023
16185 -84.503654 0.0 0.0 -4.917349 Slc17a3 0.007316 0.193013
11186 -84.697973 0.0 0.0 -5.285447 Lrp2 0.005974 0.187442
14545 -84.896155 0.0 0.0 -5.022214 Pter 0.006880 0.192058
4364 -86.060252 0.0 0.0 -5.323331 Cyp2a4 0.005773 0.191713

9126 rows × 7 columns

[20]:
ms_data.plt.marker_genes_scatter(res_key = 'marker_genes', groups='medulla', genes=medulla_marker_list)
[2023-11-22 14:11:53][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[20]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_31_3.png

Marker for author cell type#

[21]:
ms_data.tl.find_marker_genes(cluster_res_key='author_cell_type', use_highly_genes=False, use_raw =False, res_key = 'marker_genes_celltype')
[2023-11-22 14:11:53][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2023-11-22 14:11:53][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run find_marker_genes...
[2023-11-22 14:11:54][Stereo][192767][MainThread][140261435713344][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:11:54][Stereo][192767][MainThread][140261435713344][tool_base][151][INFO]: start to run...
[2023-11-22 14:11:54][Stereo][192767][MainThread][140261435713344][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:11:55][Stereo][192767][MainThread][140261435713344][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 1.1568s.
[2023-11-22 14:11:57][Stereo][192767][MainThread][140261435713344][tool_base][153][INFO]: end to run.
[2023-11-22 14:11:57][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: find_marker_genes end, consume time 3.1933s.
[22]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes_celltype')
[2023-11-22 14:11:57][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[22]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_34_3.png

Plot medulla corresponding celltype markers#

[23]:
ms_data.plt.marker_genes_scatter(res_key='marker_genes_celltype', markers_num=4, groups=['EC', 'Other_Immune', 'TAL' ] )
[2023-11-22 14:11:58][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[23]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_36_3.png

Conditional marker and constant marker#

[24]:
ms_data.tl.raw_checkpoint(mode='isolated')
[2023-11-22 14:12:00][Stereo][192767][Thread-202][140250901083904][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2023-11-22 14:12:00][Stereo][192767][Thread-203][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.3s finished
[25]:
ms_data.tl.normalize_total(mode='isolated')
ms_data.tl.log1p(mode='isolated')
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run normalize_total
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][st_pipeline][41][INFO]: start to run normalize_total...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][st_pipeline][44][INFO]: normalize_total end, consume time 0.2100s.
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][st_pipeline][44][INFO]: normalize_total end, consume time 0.2285s.
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run log1p
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][st_pipeline][44][INFO]: log1p end, consume time 0.0751s.
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][st_pipeline][44][INFO]: log1p end, consume time 0.0982s.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.2s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[26]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False, mode = 'isolated')
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run find_marker_genes
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][st_pipeline][41][INFO]: start to run find_marker_genes...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][st_pipeline][41][INFO]: start to run find_marker_genes...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][tool_base][151][INFO]: start to run...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][tool_base][151][INFO]: start to run...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:12:02][Stereo][192767][Thread-218][140253086627584][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 0.4522s.
[2023-11-22 14:12:02][Stereo][192767][Thread-217][140250892691200][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 0.5568s.
[2023-11-22 14:12:03][Stereo][192767][Thread-218][140253086627584][tool_base][153][INFO]: end to run.
[2023-11-22 14:12:03][Stereo][192767][Thread-218][140253086627584][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.8071s.
[2023-11-22 14:12:03][Stereo][192767][Thread-217][140250892691200][tool_base][153][INFO]: end to run.
[2023-11-22 14:12:03][Stereo][192767][Thread-217][140250892691200][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.8774s.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    1.8s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    1.9s finished

UMOD KI conditional marker count as N_top_marker increases.

[27]:
ret = {}
for name in ms_data.merged_data.tl.result['tissue']['group'].cat.categories:
    if name != 'unknown':
        ret[name] = []
        for N_top_marker in range(200):
            ret[name].append(N_top_marker-(len(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
                                               set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))) )
[28]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (5,5))
ax = fig.subplots()

#ct2color =  dict(zip(adata_wt.obs['tissue_domain'].cat.categories, adata_wt.uns['tissue_domain_colors']))

for x, y in ret.items():
    ax.plot(y, label=x)
ax.legend()
ax.set_title('UMOD KI conditional markers amount')
ax.set_xlabel('N top DEGs')
ax.set_ylabel('N UMOD KI conditional DEGs')
[28]:
Text(0, 0.5, 'N UMOD KI conditional DEGs')
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_43_1.png

Plot expression pattern for constant and conditional markers#

[29]:
N_top_marker = 200
name = 'medulla'
[30]:
constant_marker = list(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
                       set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))

WT_conditional_marker = [x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
UMOD_KI_conditional_marker = [x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
constant_marker = constant_marker[:10]
[31]:
all_marker_list = constant_marker
all_marker_list.extend(WT_conditional_marker)
all_marker_list.extend(UMOD_KI_conditional_marker)
[32]:
ms_data.tl.scale()
[2023-11-22 14:12:03][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run scale
[2023-11-22 14:12:03][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run scale...
[2023-11-22 14:12:14][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: scale end, consume time 10.9675s.
[33]:
ms_data.merged_data.tl.result['tissue'].index = ms_data.merged_data.tl.result['tissue']['bins']

celllist = ms_data.obs.loc[(ms_data.merged_data.tl.result['tissue']['group']=='medulla') & (ms_data.obs['batch']=='0')].index
WT_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix

celllist = ms_data.obs.loc[(ms_data.merged_data.tl.result['tissue']['group']=='medulla') & (ms_data.obs['batch']=='1')].index
UMOD_KI_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
INFO:numba.core.transforms:finding looplift candidates
INFO:numba.core.transforms:finding looplift candidates
[34]:
import pandas as pd
import seaborn as sns
medulla_exp = pd.DataFrame([WT_medulla_exp.mean(axis=0), UMOD_KI_medulla_exp.mean(axis=0)])
medulla_exp.columns = all_marker_list
medulla_exp.index = ['WT', 'UMOD KI']

fig, ax = plt.subplots(1, figsize=(20,1))
sns.heatmap(medulla_exp, center = 0, ax=ax, cmap='bwr', vmin=-0.5, vmax=0.5,)
[34]:
<Axes: >
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_50_1.png