Comparative Analysis

In scientific research, comparing disturbed or disease samples with control samples allows for the exploration of changes in functional mechanisms at both global and local levels. Stereopy is specifically designed to identify global and local diversities in comparative samples by employing cell-level and gene-level analysis modules, supported by novel algorithms. The cell-level analysis focuses on cell diversity in terms of cell type, cell-cell occurrence, and cell community via multi-sample comparisons. At gene level, Stereopy investigates gene diversity within cell types and cell communities, proposing the concept of constant and conditional markers [Fang23].

Here we use Slide-seq mouse kidney samples to test comparative analysis [Marshall22]. Please download test data, and here also offers the access to full data from the paper.

Preparing data

[1]:
import stereo as st
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
[2]:
ms_data = MSData(_var_type='union')
data = st.io.read_h5ad('../../data/Puck_191223_19.h5ad', spatial_key='X_spatial')
ms_data['WT'] = data
data = st.io.read_h5ad('../../data/Puck_200104_07.h5ad', spatial_key='X_spatial')
ms_data['UMOD_KI'] = data

## merge data
ms_data.integrate()

## merged author_cell_type col to ms_data
ms_data.to_integrate(res_key = 'author_cell_type', scope = slice_generator[:], _from=slice_generator[:], type='obs', item='author_cell_type')
ms_data
[2]:
ms_data: {'WT': (36299, 21296), 'UMOD_KI': (31194, 21263)}
num_slice: 2
names: ['WT', 'UMOD_KI']
merged_data: id(140056957462992)
obs: ['batch', 'author_cell_type']
var: []
relationship: other
var_type: union to 22656
current_mode: integrate
current_scope: scope_[0,1]
scopes_data: ['scope_[0,1]:id(140056957462992)']
mss: ["scope_[0,1]:['author_cell_type']"]
[3]:
ms_data.plt.cluster_scatter(
    mode='integrate',
    scope=slice_generator[:],
    res_key='author_cell_type',
    reorganize_coordinate=2,
    horizontal_offset_additional=500
    )
[2024-08-29 18:01:02][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[3]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_4_3.png

Cell Co-occurrence of Comparative Analysis

Here we use cell co-occurrence to discover the global cell diversity.

See Co-occurrence for more imformation.

[4]:
ms_data['WT'].tl.co_occurrence(
        method='stereopy',
        cluster_res_key='author_cell_type',
        res_key='co-occurrence',
        dist_thres=180, # max threshold to measure co-occurence
        steps=6,  # step numbers to divide threshold interval evenly
        genelist=None,
        gene_thresh=0, # min threshold for gene expression in a cell
        n_jobs=-1
        )
[2024-08-29 18:01:02][Stereo][785302][MainThread][140059044030272][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f618d32a7c0>
[4]:
AnnData object with n_obs × n_vars = 36299 × 21296
    obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'schema_version', 'title', 'sn', 'co-occurrence', 'key_record'
    obsm: 'X_spatial'
[5]:
ms_data['UMOD_KI'].tl.co_occurrence(
        method='stereopy',
        cluster_res_key='author_cell_type',
        res_key='co-occurrence',
        dist_thres=180, # max threshold to measure co-occurence
        steps=6,  # step numbers to divide threshold interval evenly
        genelist=None,
        gene_thresh=0, # min threshold for gene expression in a cell
        n_jobs=-1
        )
[2024-08-29 18:01:12][Stereo][785302][MainThread][140059044030272][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f618cc39fa0>
[5]:
AnnData object with n_obs × n_vars = 31194 × 21263
    obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
    var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
    uns: 'schema_version', 'title', 'sn', 'co-occurrence', 'key_record'
    obsm: 'X_spatial'

Co-occurrence on multi samples

[6]:
from stereo.algorithm.co_occurrence import  CoOccurrence
CoOccurrence.ms_co_occur_integrate(
    ms_data=ms_data,
    scope='UMOD_KI|WT',
    use_col='author_cell_type',
    res_key='co-occurrence'
    )
[7]:
ms_data.plt.co_occurrence_heatmap(
    mode='integrate',
    scope=slice_generator[:],
    res_key='co-occurrence',
    cluster_res_key='author_cell_type'
    )
[2024-08-29 18:01:19][Stereo][785302][MainThread][140059044030272][ms_pipeline][131][INFO]: register plot_func co_occurrence_heatmap to <class 'stereo.core.stereo_exp_data.AnnBasedStereoExpData'>-140056957462992
[7]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_10_3.png
[8]:
ms_data.plt.co_occurrence_plot(
    mode='integrate',
    scope=slice_generator[:],
    res_key='co-occurrence',
    groups=['Fibroblast']
    )
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][ms_pipeline][131][INFO]: register plot_func co_occurrence_plot to <class 'stereo.core.stereo_exp_data.AnnBasedStereoExpData'>-140056957462992
[8]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_11_3.png

From Above, we can see the increment of co-occurrence of Fibroblast with other cell types except for MC and CD-IC from WT to UMOD KI samples.

Cell Community Detection of Comparative Analysis

Here we also used CCD to explore local cell diversity.

See also Cell Community Detection for more information.

[9]:
ms_ccd = ms_data.tl.ms_community_detection(
        annotation='author_cell_type',
        win_sizes='300,150',
        sliding_steps='150,50',
        resolution=0.2,
        # scatter_thres=0.2,
        cluster_algo='leiden',
        n_clusters=7,
        out_path="./ccd_result",
        plotting=3,
        hide_plots=True
        )
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_0
                     window size: 300
                     sliding step: 150
                     cells mean: 150.62
                     cells median: 169.0
                     num horizontal windows: 16
                     num vertical windows: 16


[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_0
                     window size: 150
                     sliding step: 50
                     cells mean: 40.51
                     cells median: 43.0
                     num horizontal windows: 32
                     num vertical windows: 32


[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_1
                     window size: 300
                     sliding step: 150
                     cells mean: 132.18
                     cells median: 150.0
                     num horizontal windows: 16
                     num vertical windows: 16


[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_1
                     window size: 150
                     sliding step: 50
                     cells mean: 36.10
                     cells median: 38.0
                     num horizontal windows: 32
                     num vertical windows: 32


[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][96][INFO]: Downsample rate is not provided by user - proceeding to calculate one based on minimal window size.
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][100][INFO]: donwsample_rate = 75
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 3 cells < 36 (0.1 % of 36299)
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 19 cells < 36 (0.1 % of 36299)
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0639s
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_annotation took 0.4012s
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.1352s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.6845s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 2 cells < 31 (0.1 % of 31194)
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 31 cells < 31 (0.1 % of 31194)
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0699s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_annotation took 0.4077s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.1180s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.6697s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0700s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0667s
[2024-08-29 18:01:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_annotation took 0.7282s
[2024-08-29 18:01:25][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run neighbors...
2024-08-29 18:01:28.076298: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-08-29 18:01:28.103217: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 18:01:28.907330: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2024-08-29 18:01:42][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: neighbors end, consume time 16.9953s.
[2024-08-29 18:01:42][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run leiden...
[2024-08-29 18:01:45][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: leiden end, consume time 3.2460s.
[2024-08-29 18:01:45][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function cluster took 20.2484s
/root/miniforge3/envs/stereopy/lib/python3.8/site-packages/pandas/core/indexing.py:1728: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  self.obj[key] = value
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.3492s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][762][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_0.csv
... storing 'leiden' as categorical
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_0.h5ad.
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.1030s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_clustering took 0.5053s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0180s
[2024-08-29 18:01:53][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_stats took 2.9624s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.3562s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0651s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0727s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0637s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0816s
[2024-08-29 18:01:56][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0745s
[2024-08-29 18:01:56][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0915s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0676s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0889s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0776s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0782s
[2024-08-29 18:01:58][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0846s
[2024-08-29 18:01:58][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.1013s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0723s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0834s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_mixtures took 4.8307s
[2024-08-29 18:02:03][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function boxplot_stats took 3.7169s
[2024-08-29 18:02:03][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_0_stats.h5ad.
/root/miniforge3/envs/stereopy/lib/python3.8/site-packages/pandas/core/indexing.py:1728: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
  self.obj[key] = value
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.8398s
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][762][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_1.csv
... storing 'leiden' as categorical
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_1.h5ad.
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0867s
[2024-08-29 18:02:09][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_clustering took 0.5132s
[2024-08-29 18:02:09][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0364s
[2024-08-29 18:02:12][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_stats took 3.3399s
[2024-08-29 18:02:13][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.2943s
[2024-08-29 18:02:13][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0779s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0783s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0673s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0784s
[2024-08-29 18:02:15][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0686s
[2024-08-29 18:02:15][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0763s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0718s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0771s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0727s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0984s
[2024-08-29 18:02:17][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0734s
[2024-08-29 18:02:17][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0755s
[2024-08-29 18:02:18][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_mixtures took 4.2218s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function boxplot_stats took 3.2157s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_1_stats.h5ad.
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0816s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0762s
[2024-08-29 18:02:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_clustering took 0.7723s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_mixtures_total took 2.9256s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cell_abundance_total took 0.3702s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_abundance_total took 0.2698s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function generate_report took 0.3498s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function _main took 64.2934s
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>

Visualization for CCD

[10]:
ms_ccd.plot('all_clustering')
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0739s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0775s
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_16_1.png
[2024-08-29 18:02:27][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_clustering took 1.2210s
[11]:
ms_ccd.plot('cell_types_table')
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_17_0.png
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.6808s
<Figure size 640x480 with 0 Axes>
[12]:
## transfer ccd result to ms_data

ms_data.to_integrate(res_key = 'cell_communities', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['cell_communities']*ms_data.num_slice)

Manually merge cell communities to tissue

[13]:
anno_dict = {'0':'medulla',
             '1':'cortex',
             '2':'boundary',
             '3':'pelvis',
             '4':'other_immune',
             '5':'cortex',
             '6':'pelvis',
             'unknown':'unknown'}

ms_data.tl.annotation(
    mode='integrate',
    scope=slice_generator[:],
    annotation_information=anno_dict,
    cluster_res_key='cell_communities',
    res_key='tissue'
    )

ms_data.tl.annotation(
    mode='isolated',
    scope=slice_generator[:],
    annotation_information=anno_dict,
    cluster_res_key='cell_communities',
    res_key='tissue'
    )
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: annotation end, consume time 0.0768s.
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][st_pipeline][44][INFO]: annotation end, consume time 0.0716s.
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][st_pipeline][44][INFO]: annotation end, consume time 0.0712s.
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[14]:
ms_data.plt.cluster_scatter(
    mode='integrate',
    scope=slice_generator[:],
    res_key='tissue',
    reorganize_coordinate=2,
    horizontal_offset_additional=500
    )
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[14]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_21_3.png

Gene diversity of cell community v.s. cell type

Taken WT as example

[15]:
ms_data.tl.raw_checkpoint(mode='isolated')
ms_data.tl.raw_checkpoint(mode='integrate')
[2024-08-29 18:02:30][Stereo][785302][Thread-63][140051068888640][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:30][Stereo][785302][Thread-64][140051060495936][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.0s finished

Preprocessing for marker

[16]:
ms_data.tl.normalize_total()
ms_data.tl.log1p()
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: normalize_total end, consume time 0.1039s.
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run log1p...
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: log1p end, consume time 0.9183s.

Marker for tissue

[17]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False)
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run find_marker_genes...
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: find_marker_genes end, consume time 2.3076s.
[18]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes')
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[18]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_28_3.png

Plot medulla markers

[19]:
medulla_marker = ms_data.tl.result['scope_[0,1]']['marker_genes']['medulla.vs.rest']

medulla_marker = medulla_marker.loc[(medulla_marker['pvalues_adj'] < 0.05) & (medulla_marker['pct_rest'] < 0.2)]

medulla_marker = medulla_marker.sort_values('scores', ascending=False)
medulla_marker_list = medulla_marker['genes'][:8]
medulla_marker
[19]:
scores pvalues pvalues_adj log2fc genes pct pct_rest mean_count
0 114.183655 0.0 0.0 3.426115 ENSMUSG00000027202 0.538050 0.184545 1.663084
2 61.116464 0.0 0.0 2.430726 ENSMUSG00000022490 0.225488 0.069651 0.317728
3 54.538624 0.0 0.0 2.075901 ENSMUSG00000018211 0.225950 0.088956 0.345794
4 54.307349 0.0 0.0 1.920096 ENSMUSG00000030963 0.258017 0.112316 0.429489
5 46.074215 0.0 0.0 2.126544 ENSMUSG00000001270 0.147868 0.050264 0.184752
... ... ... ... ... ... ... ... ...
22611 -84.611139 0.0 0.0 -5.021427 ENSMUSG00000024131 0.006942 0.193352 0.007207
22612 -84.693316 0.0 0.0 -4.928562 ENSMUSG00000036083 0.007372 0.195231 0.007736
22614 -84.911546 0.0 0.0 -5.308575 ENSMUSG00000027070 0.005983 0.189646 0.006182
22615 -85.269715 0.0 0.0 -5.068659 ENSMUSG00000026730 0.006777 0.194399 0.007405
22616 -86.208055 0.0 0.0 -5.330188 ENSMUSG00000074254 0.005818 0.193942 0.005884

9120 rows × 8 columns

[20]:
ms_data.plt.marker_genes_scatter(res_key = 'marker_genes', groups='medulla', genes=medulla_marker_list)
[2024-08-29 18:02:34][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[20]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_31_3.png

Marker for author cell type

[21]:
ms_data.tl.find_marker_genes(cluster_res_key='author_cell_type', use_highly_genes=False, use_raw =False, res_key = 'marker_genes_author')
[2024-08-29 18:02:35][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2024-08-29 18:02:35][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run find_marker_genes...
[2024-08-29 18:02:37][Stereo][785302][MainThread][140059044030272][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:37][Stereo][785302][MainThread][140059044030272][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:38][Stereo][785302][MainThread][140059044030272][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:38][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: find_marker_genes end, consume time 3.4135s.
[22]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes_author')
[2024-08-29 18:02:38][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[22]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_34_3.png

Plot medulla corresponding celltype markers

[23]:
ms_data.plt.marker_genes_scatter(res_key='marker_genes_author', markers_num=4, groups=['EC', 'Other_Immune', 'TAL' ] )
[2024-08-29 18:02:40][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[23]:
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_36_3.png

Conditional marker and constant marker

[24]:
ms_data.tl.raw_checkpoint(mode='isolated')
[2024-08-29 18:02:40][Stereo][785302][Thread-82][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:40][Stereo][785302][Thread-83][140051052103232][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.0s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.0s finished
[25]:
ms_data.tl.normalize_total(mode='isolated')
ms_data.tl.log1p(mode='isolated')
[2024-08-29 18:02:40][Stereo][785302][Thread-87][140050929411648][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2024-08-29 18:02:40][Stereo][785302][Thread-88][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run normalize_total
[2024-08-29 18:02:40][Stereo][785302][Thread-87][140050929411648][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:40][Stereo][785302][Thread-88][140050918397504][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:41][Stereo][785302][Thread-87][140050929411648][st_pipeline][44][INFO]: normalize_total end, consume time 0.0457s.
[2024-08-29 18:02:41][Stereo][785302][Thread-88][140050918397504][st_pipeline][44][INFO]: normalize_total end, consume time 0.0673s.
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][st_pipeline][41][INFO]: start to run log1p...
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run log1p
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][st_pipeline][41][INFO]: start to run log1p...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][st_pipeline][44][INFO]: log1p end, consume time 0.0785s.
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][st_pipeline][44][INFO]: log1p end, consume time 0.0824s.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    0.1s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.1s finished
[26]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False, mode = 'isolated')
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run find_marker_genes
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][st_pipeline][41][INFO]: start to run find_marker_genes...
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][st_pipeline][41][INFO]: start to run find_marker_genes...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:42][Stereo][785302][Thread-97][140051052897856][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:42][Stereo][785302][Thread-98][140051043710528][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:42][Stereo][785302][Thread-97][140051052897856][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.3789s.
[2024-08-29 18:02:42][Stereo][785302][Thread-98][140051043710528][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.3771s.
[Parallel(n_jobs=2)]: Done   1 tasks      | elapsed:    1.4s
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    1.4s finished

UMOD KI conditional marker count as N_top_marker increases.

[27]:
ret = {}
for name in ms_data.merged_data.tl.result['tissue']['group'].cat.categories:
    if name != 'unknown':
        ret[name] = []
        for N_top_marker in range(200):
            ret[name].append(N_top_marker-(len(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
                                               set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))) )
[28]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (5,5))
ax = fig.subplots()

#ct2color =  dict(zip(adata_wt.obs['tissue_domain'].cat.categories, adata_wt.uns['tissue_domain_colors']))

for x, y in ret.items():
    ax.plot(y, label=x)
ax.legend()
ax.set_title('UMOD KI conditional markers amount')
ax.set_xlabel('N top DEGs')
ax.set_ylabel('N UMOD KI conditional DEGs')
[28]:
Text(0, 0.5, 'N UMOD KI conditional DEGs')
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_43_1.png

Plot expression pattern for constant and conditional markers

[29]:
N_top_marker = 200
name = 'medulla'
[30]:
constant_marker = list(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
                       set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))

WT_conditional_marker = [x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
UMOD_KI_conditional_marker = [x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
constant_marker = constant_marker[:10]
[31]:
all_marker_list = constant_marker
all_marker_list.extend(WT_conditional_marker)
all_marker_list.extend(UMOD_KI_conditional_marker)
[32]:
ms_data.tl.scale()
[2024-08-29 18:05:04][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run scale
[2024-08-29 18:05:04][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run scale...
[2024-08-29 18:05:15][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: scale end, consume time 11.0059s.
[33]:
celllist = ms_data.obs.loc[(ms_data.obs['tissue'] == 'medulla') & (ms_data.obs['batch'] == '0')].index
WT_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix

celllist = ms_data.obs.loc[(ms_data.obs['tissue'] == 'medulla') & (ms_data.obs['batch'] == '1')].index
UMOD_KI_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
[34]:
import pandas as pd
import seaborn as sns
medulla_exp = pd.DataFrame([WT_medulla_exp.mean(axis=0), UMOD_KI_medulla_exp.mean(axis=0)])
medulla_exp.columns = all_marker_list
medulla_exp.index = ['WT', 'UMOD KI']

fig, ax = plt.subplots(1, figsize=(20,1))
sns.heatmap(medulla_exp, center = 0, ax=ax, cmap='bwr', vmin=-0.5, vmax=0.5,)
[34]:
<Axes: >
../_images/Tutorials%28Multi-sample%29_Comparative_Analysis_50_1.png
[ ]: