Comparative Analysis#
In scientific research, comparing disturbed or disease samples with control samples allows for the exploration of changes in functional mechanisms at both global and local levels. Stereopy is specifically designed to identify global and local diversities in comparative samples by employing cell-level and gene-level analysis modules, supported by novel algorithms. The cell-level analysis focuses on cell diversity in terms of cell type, cell-cell occurrence, and cell community via multi-sample comparisons. At gene level, Stereopy investigates gene diversity within cell types and cell communities, proposing the concept of constant and conditional markers [Fang23].
Here we use Slide-seq mouse kidney samples to test comparative analysis [Marshall22]. Please download test data, and here also offers the access to full data from the paper.
Preparing data#
[1]:
import stereo as st
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
[2]:
ms_data = MSData()
data = st.io.read_h5ad('../data/Puck_191223_19_rename.h5ad')
data._ann_data.obsm['spatial'] = data._ann_data.obsm['X_spatial']
ms_data['WT'] = data
data = st.io.read_h5ad('../data/Puck_200104_07_rename.h5ad')
data._ann_data.obsm['spatial'] = data._ann_data.obsm['X_spatial']
ms_data['UMOD_KI'] = data
## merge data
ms_data.integrate()
## merged author_cell_type col to ms_data
ms_data.to_integrate(res_key = 'author_cell_type', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['author_cell_type']*ms_data.num_slice)
[3]:
ms_data.plt.cluster_scatter(
mode='integrate',
scope=slice_generator[:],
res_key='author_cell_type',
reorganize_coordinate=2,
horizontal_offset_additional=500
)
[2023-11-22 14:08:13][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[3]:
Cell Co-occurrence of Comparative Analysis#
Here we use cell co-occurrence to discover the global cell diversity.
See Co-occurrence for more imformation.
[4]:
ms_data['WT'].tl.co_occurrence(
method='stereopy',
cluster_res_key='author_cell_type',
res_key='co-occurrence',
dist_thres=180, # max threshold to measure co-occurence
steps=6, # step numbers to divide threshold interval evenly
genelist=None,
gene_thresh=0, # min threshold for gene expression in a cell
n_jobs=-1
)
[2023-11-22 14:08:14][Stereo][192767][MainThread][140261435713344][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f911c1ed430>
[4]:
AnnData object with n_obs × n_vars = 36299 × 21296
obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
uns: 'schema_version', 'title', 'sn', 'co-occurrence'
obsm: 'X_spatial', 'spatial'
[5]:
ms_data['UMOD_KI'].tl.co_occurrence(
method='stereopy',
cluster_res_key='author_cell_type',
res_key='co-occurrence',
dist_thres=180, # max threshold to measure co-occurence
steps=6, # step numbers to divide threshold interval evenly
genelist=None,
gene_thresh=0, # min threshold for gene expression in a cell
n_jobs=-1
)
[2023-11-22 14:08:26][Stereo][192767][MainThread][140261435713344][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f90de87ad60>
[5]:
AnnData object with n_obs × n_vars = 31194 × 21263
obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
uns: 'author_cell_type_colors', 'schema_version', 'title', 'sn', 'co-occurrence'
obsm: 'X_spatial', 'spatial'
Co-occurrence on multi samples#
[6]:
from stereo.algorithm.co_occurrence import CoOccurrence
CoOccurrence.ms_co_occur_integrate(
ms_data=ms_data,
scope='UMOD_KI|WT',
use_col='author_cell_type',
res_key='co-occurrence'
)
[7]:
ms_data.plt.co_occurrence_heatmap(
mode='integrate',
scope=slice_generator[:],
res_key='co-occurrence',
cluster_res_key='author_cell_type'
)
[2023-11-22 14:08:34][Stereo][192767][MainThread][140261435713344][ms_pipeline][128][INFO]: register plot_func co_occurrence_heatmap to <class 'stereo.core.stereo_exp_data.StereoExpData'>-140260185451152
[7]:
[8]:
ms_data.plt.co_occurrence_plot(
mode='integrate',
scope=slice_generator[:],
res_key='co-occurrence',
groups=['Fibroblast']
)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][ms_pipeline][128][INFO]: register plot_func co_occurrence_plot to <class 'stereo.core.stereo_exp_data.StereoExpData'>-140260185451152
[8]:
From Above, we can see the increment of co-occurrence of Fibroblase with other cell types except for MC and CD-IC from WT to UMOD KI samples.
Cell Community Detection of Comparative Analysis#
Here we also used CCD to explore local cell diversity.
See also Cell Community Detection for more information.
[9]:
ms_ccd = ms_data.tl.ms_community_detection(
annotation='author_cell_type',
win_sizes='300,150',
sliding_steps='150,50',
resolution=0.2,
# scatter_thres=0.2,
cluster_algo='leiden',
n_clusters=7,
out_path="./ccd_result",
plotting=3,
hide_plots=True
)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_0
window size: 300
sliding step: 150
cells mean: 150.62
cells median: 169.0
num horizontal windows: 16
num vertical windows: 16
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_0
window size: 150
sliding step: 50
cells mean: 40.51
cells median: 43.0
num horizontal windows: 32
num vertical windows: 32
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_1
window size: 300
sliding step: 150
cells mean: 132.18
cells median: 150.0
num horizontal windows: 16
num vertical windows: 16
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][357][INFO]: Window size info for slice: Slice_1
window size: 150
sliding step: 50
cells mean: 36.10
cells median: 38.0
num horizontal windows: 32
num vertical windows: 32
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][96][INFO]: Downsample rate is not provided by user - proceeding to calculate one based on minimal window size.
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_detection][100][INFO]: donwsample_rate = 75
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 3 cells < 36 (0.1 % of 36299)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 19 cells < 36 (0.1 % of 36299)
[2023-11-22 14:08:37][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1142s
set rcParams
set original_rcParams
[2023-11-22 14:08:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_annotation took 0.5254s
[2023-11-22 14:08:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 0.2566s
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 1.0625s
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 2 cells < 31 (0.1 % of 31194)
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 31 cells < 31 (0.1 % of 31194)
[2023-11-22 14:08:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1016s
set rcParams
[2023-11-22 14:08:40][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_annotation took 0.4851s
[2023-11-22 14:08:40][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 0.7578s
[2023-11-22 14:08:41][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calc_feature_matrix took 1.0130s
[2023-11-22 14:08:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1157s
[2023-11-22 14:08:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1031s
[2023-11-22 14:08:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_annotation took 0.9576s
[2023-11-22 14:08:43][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run neighbors...
[2023-11-22 14:10:58][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: neighbors end, consume time 135.1017s.
[2023-11-22 14:10:58][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run leiden...
[2023-11-22 14:10:59][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: leiden end, consume time 1.0295s.
[2023-11-22 14:10:59][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function cluster took 136.2322s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.9303s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][758][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_0.csv
... storing 'leiden' as categorical
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_0.h5ad.
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1789s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_clustering took 0.5976s
[2023-11-22 14:11:04][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0247s
set rcParams
[2023-11-22 14:11:08][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_stats took 3.7973s
set rcParams
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 1.7224s
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1186s
set rcParams
[2023-11-22 14:11:10][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1055s
[2023-11-22 14:11:11][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1095s
[2023-11-22 14:11:11][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1054s
[2023-11-22 14:11:12][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1121s
[2023-11-22 14:11:12][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1027s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1107s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1032s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1124s
[2023-11-22 14:11:13][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1021s
[2023-11-22 14:11:14][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1086s
[2023-11-22 14:11:14][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1128s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.7562s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1040s
[2023-11-22 14:11:16][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_mixtures took 6.3520s
set rcParams
[2023-11-22 14:11:19][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function boxplot_stats took 2.9817s
[2023-11-22 14:11:19][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_0_stats.h5ad.
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.4321s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][758][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_1.csv
... storing 'leiden' as categorical
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_1.h5ad.
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1434s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_clustering took 0.5420s
[2023-11-22 14:11:24][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0211s
set rcParams
[2023-11-22 14:11:28][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_stats took 3.6931s
set rcParams
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 1.4127s
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1038s
[2023-11-22 14:11:30][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0960s
set rcParams
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1006s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0952s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0959s
[2023-11-22 14:11:31][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0916s
[2023-11-22 14:11:32][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1018s
[2023-11-22 14:11:32][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0967s
[2023-11-22 14:11:33][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0982s
[2023-11-22 14:11:33][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0968s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1004s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0906s
[2023-11-22 14:11:34][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_mixtures took 4.5858s
set rcParams
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function boxplot_stats took 3.4428s
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][community_clustering_algorithm][737][INFO]: Saved clustering result tissue_Slice_1_stats.h5ad.
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1040s
[2023-11-22 14:11:38][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1003s
[2023-11-22 14:11:39][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_clustering took 0.8462s
set rcParams
[2023-11-22 14:11:42][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_mixtures_total took 3.6870s
set rcParams
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cell_abundance_total took 0.3357s
set rcParams
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_cluster_abundance_total took 0.2620s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function generate_report took 0.1031s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function _main took 185.9941s
reset original_rcParams
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Visualization for CCD#
[10]:
ms_ccd.plot('all_clustering')
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.1063s
[2023-11-22 14:11:43][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_spatial took 0.0925s
[2023-11-22 14:11:44][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_all_clustering took 1.4686s
reset original_rcParams
[11]:
ms_ccd.plot('cell_types_table')
set rcParams
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][utils][27][INFO]: Function plot_celltype_table took 2.1316s
reset original_rcParams
[12]:
## transfer ccd result to ms_data
ms_data.to_integrate(res_key = 'cell_communities', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['cell_communities']*ms_data.num_slice)
Manually merge cell communities to tissue#
[13]:
anno_dict = {'0':'medulla',
'1':'cortex',
'2':'boundary',
'3':'pelvis',
'4':'other_immune',
'5':'cortex',
'6':'pelvis',
'unknown':'unknown'}
ms_data.tl.annotation(
mode='integrate',
scope=slice_generator[:],
annotation_information=anno_dict,
cluster_res_key='cell_communities',
res_key='tissue'
)
ms_data.tl.annotation(
mode='isolated',
scope=slice_generator[:],
annotation_information=anno_dict,
cluster_res_key='cell_communities',
res_key='tissue'
)
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: annotation end, consume time 0.2616s.
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run annotation
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][st_pipeline][41][INFO]: start to run annotation...
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2023-11-22 14:11:47][Stereo][192767][Thread-179][140250892691200][st_pipeline][44][INFO]: annotation end, consume time 0.1843s.
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.2s
[2023-11-22 14:11:47][Stereo][192767][Thread-178][140250901083904][st_pipeline][44][INFO]: annotation end, consume time 0.2591s.
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.3s finished
[14]:
ms_data.plt.cluster_scatter(
mode='integrate',
scope=slice_generator[:],
res_key='tissue',
reorganize_coordinate=2,
horizontal_offset_additional=500
)
[2023-11-22 14:11:47][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[14]:
Gene diversity of cell community v.s. cell type#
Taken WT as example
[15]:
ms_data.tl.raw_checkpoint(mode='isolated')
ms_data.tl.raw_checkpoint(mode='integrate')
[2023-11-22 14:11:48][Stereo][192767][Thread-183][140250624255744][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2023-11-22 14:11:48][Stereo][192767][Thread-184][140250632648448][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.1s finished
Preprocessing for marker#
[16]:
ms_data.tl.normalize_total()
ms_data.tl.log1p()
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: normalize_total end, consume time 0.2144s.
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2023-11-22 14:11:48][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: log1p end, consume time 0.5883s.
Marker for tissue#
[17]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False)
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run find_marker_genes...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][tool_base][151][INFO]: start to run...
[2023-11-22 14:11:49][Stereo][192767][MainThread][140261435713344][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:11:50][Stereo][192767][MainThread][140261435713344][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 1.2010s.
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][tool_base][153][INFO]: end to run.
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: find_marker_genes end, consume time 2.2644s.
[18]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes')
[2023-11-22 14:11:51][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[18]:
Plot medulla markers#
[19]:
medulla_marker = ms_data.tl.result['scope_[0,1]']['marker_genes']['medulla.vs.rest']
medulla_marker = medulla_marker.loc[(medulla_marker['pvalues_adj'] < 0.05) & (medulla_marker['pct_rest'] < 0.2)]
medulla_marker = medulla_marker.sort_values('scores', ascending=False)
medulla_marker_list = medulla_marker['genes'][:8]
medulla_marker
[19]:
scores | pvalues | pvalues_adj | log2fc | genes | pct | pct_rest | |
---|---|---|---|---|---|---|---|
16148 | 114.030865 | 0.0 | 0.0 | 3.427668 | Slc12a1 | 0.541348 | 0.186195 |
14214 | 60.757634 | 0.0 | 0.0 | 2.418859 | Ppp1r1a | 0.226809 | 0.070483 |
19125 | 54.568371 | 0.0 | 0.0 | 2.082375 | Wfdc15b | 0.227547 | 0.089344 |
18748 | 54.432050 | 0.0 | 0.0 | 1.929499 | Umod | 0.259800 | 0.112661 |
3704 | 45.836450 | 0.0 | 0.0 | 2.118043 | Ckb | 0.148778 | 0.050720 |
... | ... | ... | ... | ... | ... | ... | ... |
16382 | -84.253558 | 0.0 | 0.0 | -4.977401 | Slc3a1 | 0.007048 | 0.191023 |
16185 | -84.503654 | 0.0 | 0.0 | -4.917349 | Slc17a3 | 0.007316 | 0.193013 |
11186 | -84.697973 | 0.0 | 0.0 | -5.285447 | Lrp2 | 0.005974 | 0.187442 |
14545 | -84.896155 | 0.0 | 0.0 | -5.022214 | Pter | 0.006880 | 0.192058 |
4364 | -86.060252 | 0.0 | 0.0 | -5.323331 | Cyp2a4 | 0.005773 | 0.191713 |
9126 rows × 7 columns
[20]:
ms_data.plt.marker_genes_scatter(res_key = 'marker_genes', groups='medulla', genes=medulla_marker_list)
[2023-11-22 14:11:53][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[20]:
Plot medulla corresponding celltype markers#
[23]:
ms_data.plt.marker_genes_scatter(res_key='marker_genes_celltype', markers_num=4, groups=['EC', 'Other_Immune', 'TAL' ] )
[2023-11-22 14:11:58][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[23]:
Conditional marker and constant marker#
[24]:
ms_data.tl.raw_checkpoint(mode='isolated')
[2023-11-22 14:12:00][Stereo][192767][Thread-202][140250901083904][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2023-11-22 14:12:00][Stereo][192767][Thread-203][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.2s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.3s finished
[25]:
ms_data.tl.normalize_total(mode='isolated')
ms_data.tl.log1p(mode='isolated')
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run normalize_total
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][st_pipeline][41][INFO]: start to run normalize_total...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2023-11-22 14:12:01][Stereo][192767][Thread-207][140250368562944][st_pipeline][44][INFO]: normalize_total end, consume time 0.2100s.
[2023-11-22 14:12:01][Stereo][192767][Thread-208][140250492430080][st_pipeline][44][INFO]: normalize_total end, consume time 0.2285s.
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run log1p
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][st_pipeline][41][INFO]: start to run log1p...
[2023-11-22 14:12:01][Stereo][192767][Thread-213][140250626356992][st_pipeline][44][INFO]: log1p end, consume time 0.0751s.
[2023-11-22 14:12:01][Stereo][192767][Thread-212][140250892691200][st_pipeline][44][INFO]: log1p end, consume time 0.0982s.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.2s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.2s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.1s finished
[26]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False, mode = 'isolated')
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][ms_pipeline][163][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][ms_pipeline][163][INFO]: data_obj(idx=1) in ms_data start to run find_marker_genes
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][st_pipeline][41][INFO]: start to run find_marker_genes...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][st_pipeline][41][INFO]: start to run find_marker_genes...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][tool_base][151][INFO]: start to run...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][tool_base][122][INFO]: read group information, grouping by group column.
[2023-11-22 14:12:01][Stereo][192767][Thread-217][140250892691200][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][tool_base][151][INFO]: start to run...
[2023-11-22 14:12:01][Stereo][192767][Thread-218][140253086627584][time_consume][57][INFO]: start to run calc_pct_and_pct_rest...
[2023-11-22 14:12:02][Stereo][192767][Thread-218][140253086627584][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 0.4522s.
[2023-11-22 14:12:02][Stereo][192767][Thread-217][140250892691200][time_consume][60][INFO]: calc_pct_and_pct_rest end, consume time 0.5568s.
[2023-11-22 14:12:03][Stereo][192767][Thread-218][140253086627584][tool_base][153][INFO]: end to run.
[2023-11-22 14:12:03][Stereo][192767][Thread-218][140253086627584][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.8071s.
[2023-11-22 14:12:03][Stereo][192767][Thread-217][140250892691200][tool_base][153][INFO]: end to run.
[2023-11-22 14:12:03][Stereo][192767][Thread-217][140250892691200][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.8774s.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 1.8s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 1.9s finished
UMOD KI conditional marker count as N_top_marker
increases.
[27]:
ret = {}
for name in ms_data.merged_data.tl.result['tissue']['group'].cat.categories:
if name != 'unknown':
ret[name] = []
for N_top_marker in range(200):
ret[name].append(N_top_marker-(len(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))) )
[28]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (5,5))
ax = fig.subplots()
#ct2color = dict(zip(adata_wt.obs['tissue_domain'].cat.categories, adata_wt.uns['tissue_domain_colors']))
for x, y in ret.items():
ax.plot(y, label=x)
ax.legend()
ax.set_title('UMOD KI conditional markers amount')
ax.set_xlabel('N top DEGs')
ax.set_ylabel('N UMOD KI conditional DEGs')
[28]:
Text(0, 0.5, 'N UMOD KI conditional DEGs')
Plot expression pattern for constant and conditional markers#
[29]:
N_top_marker = 200
name = 'medulla'
[30]:
constant_marker = list(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))
WT_conditional_marker = [x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
UMOD_KI_conditional_marker = [x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
constant_marker = constant_marker[:10]
[31]:
all_marker_list = constant_marker
all_marker_list.extend(WT_conditional_marker)
all_marker_list.extend(UMOD_KI_conditional_marker)
[32]:
ms_data.tl.scale()
[2023-11-22 14:12:03][Stereo][192767][MainThread][140261435713344][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run scale
[2023-11-22 14:12:03][Stereo][192767][MainThread][140261435713344][st_pipeline][41][INFO]: start to run scale...
[2023-11-22 14:12:14][Stereo][192767][MainThread][140261435713344][st_pipeline][44][INFO]: scale end, consume time 10.9675s.
[33]:
ms_data.merged_data.tl.result['tissue'].index = ms_data.merged_data.tl.result['tissue']['bins']
celllist = ms_data.obs.loc[(ms_data.merged_data.tl.result['tissue']['group']=='medulla') & (ms_data.obs['batch']=='0')].index
WT_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
celllist = ms_data.obs.loc[(ms_data.merged_data.tl.result['tissue']['group']=='medulla') & (ms_data.obs['batch']=='1')].index
UMOD_KI_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
INFO:numba.core.transforms:finding looplift candidates
INFO:numba.core.transforms:finding looplift candidates
[34]:
import pandas as pd
import seaborn as sns
medulla_exp = pd.DataFrame([WT_medulla_exp.mean(axis=0), UMOD_KI_medulla_exp.mean(axis=0)])
medulla_exp.columns = all_marker_list
medulla_exp.index = ['WT', 'UMOD KI']
fig, ax = plt.subplots(1, figsize=(20,1))
sns.heatmap(medulla_exp, center = 0, ax=ax, cmap='bwr', vmin=-0.5, vmax=0.5,)
[34]:
<Axes: >