Comparative Analysis¶
In scientific research, comparing disturbed or disease samples with control samples allows for the exploration of changes in functional mechanisms at both global and local levels. Stereopy is specifically designed to identify global and local diversities in comparative samples by employing cell-level and gene-level analysis modules, supported by novel algorithms. The cell-level analysis focuses on cell diversity in terms of cell type, cell-cell occurrence, and cell community via multi-sample comparisons. At gene level, Stereopy investigates gene diversity within cell types and cell communities, proposing the concept of constant and conditional markers [Fang23].
Here we use Slide-seq mouse kidney samples to test comparative analysis [Marshall22]. Please download test data, and here also offers the access to full data from the paper.
Preparing data¶
[1]:
import stereo as st
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator
[2]:
ms_data = MSData(_var_type='union')
data = st.io.read_h5ad('../../data/Puck_191223_19.h5ad', spatial_key='X_spatial')
ms_data['WT'] = data
data = st.io.read_h5ad('../../data/Puck_200104_07.h5ad', spatial_key='X_spatial')
ms_data['UMOD_KI'] = data
## merge data
ms_data.integrate()
## merged author_cell_type col to ms_data
ms_data.to_integrate(res_key = 'author_cell_type', scope = slice_generator[:], _from=slice_generator[:], type='obs', item='author_cell_type')
ms_data
[2]:
ms_data: {'WT': (36299, 21296), 'UMOD_KI': (31194, 21263)}
num_slice: 2
names: ['WT', 'UMOD_KI']
merged_data: id(140056957462992)
obs: ['batch', 'author_cell_type']
var: []
relationship: other
var_type: union to 22656
current_mode: integrate
current_scope: scope_[0,1]
scopes_data: ['scope_[0,1]:id(140056957462992)']
mss: ["scope_[0,1]:['author_cell_type']"]
[3]:
ms_data.plt.cluster_scatter(
mode='integrate',
scope=slice_generator[:],
res_key='author_cell_type',
reorganize_coordinate=2,
horizontal_offset_additional=500
)
[2024-08-29 18:01:02][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[3]:
Cell Co-occurrence of Comparative Analysis¶
Here we use cell co-occurrence to discover the global cell diversity.
See Co-occurrence for more imformation.
[4]:
ms_data['WT'].tl.co_occurrence(
method='stereopy',
cluster_res_key='author_cell_type',
res_key='co-occurrence',
dist_thres=180, # max threshold to measure co-occurence
steps=6, # step numbers to divide threshold interval evenly
genelist=None,
gene_thresh=0, # min threshold for gene expression in a cell
n_jobs=-1
)
[2024-08-29 18:01:02][Stereo][785302][MainThread][140059044030272][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f618d32a7c0>
[4]:
AnnData object with n_obs × n_vars = 36299 × 21296
obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
uns: 'schema_version', 'title', 'sn', 'co-occurrence', 'key_record'
obsm: 'X_spatial'
[5]:
ms_data['UMOD_KI'].tl.co_occurrence(
method='stereopy',
cluster_res_key='author_cell_type',
res_key='co-occurrence',
dist_thres=180, # max threshold to measure co-occurence
steps=6, # step numbers to divide threshold interval evenly
genelist=None,
gene_thresh=0, # min threshold for gene expression in a cell
n_jobs=-1
)
[2024-08-29 18:01:12][Stereo][785302][MainThread][140059044030272][st_pipeline][77][INFO]: register algorithm co_occurrence to <stereo.core.st_pipeline.AnnBasedStPipeline object at 0x7f618cc39fa0>
[5]:
AnnData object with n_obs × n_vars = 31194 × 21263
obs: 'assay_ontology_term_id', 'self_reported_ethnicity_ontology_term_id', 'is_primary_data', 'organism_ontology_term_id', 'sample', 'tissue_ontology_term_id', 'disease_state', 'sex_ontology_term_id', 'genotype', 'development_stage_ontology_term_id', 'author_cell_type', 'cell_type_ontology_term_id', 'disease_ontology_term_id', 'donor_id', 'suspension_type', 'cell_type', 'assay', 'disease', 'organism', 'sex', 'tissue', 'self_reported_ethnicity', 'development_stage', 'batch'
var: 'feature_is_filtered', 'feature_name', 'feature_reference', 'feature_biotype'
uns: 'schema_version', 'title', 'sn', 'co-occurrence', 'key_record'
obsm: 'X_spatial'
Co-occurrence on multi samples¶
[6]:
from stereo.algorithm.co_occurrence import CoOccurrence
CoOccurrence.ms_co_occur_integrate(
ms_data=ms_data,
scope='UMOD_KI|WT',
use_col='author_cell_type',
res_key='co-occurrence'
)
[7]:
ms_data.plt.co_occurrence_heatmap(
mode='integrate',
scope=slice_generator[:],
res_key='co-occurrence',
cluster_res_key='author_cell_type'
)
[2024-08-29 18:01:19][Stereo][785302][MainThread][140059044030272][ms_pipeline][131][INFO]: register plot_func co_occurrence_heatmap to <class 'stereo.core.stereo_exp_data.AnnBasedStereoExpData'>-140056957462992
[7]:
[8]:
ms_data.plt.co_occurrence_plot(
mode='integrate',
scope=slice_generator[:],
res_key='co-occurrence',
groups=['Fibroblast']
)
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][ms_pipeline][131][INFO]: register plot_func co_occurrence_plot to <class 'stereo.core.stereo_exp_data.AnnBasedStereoExpData'>-140056957462992
[8]:
From Above, we can see the increment of co-occurrence of Fibroblast with other cell types except for MC and CD-IC from WT to UMOD KI samples.
Cell Community Detection of Comparative Analysis¶
Here we also used CCD to explore local cell diversity.
See also Cell Community Detection for more information.
[9]:
ms_ccd = ms_data.tl.ms_community_detection(
annotation='author_cell_type',
win_sizes='300,150',
sliding_steps='150,50',
resolution=0.2,
# scatter_thres=0.2,
cluster_algo='leiden',
n_clusters=7,
out_path="./ccd_result",
plotting=3,
hide_plots=True
)
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_0
window size: 300
sliding step: 150
cells mean: 150.62
cells median: 169.0
num horizontal windows: 16
num vertical windows: 16
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_0
window size: 150
sliding step: 50
cells mean: 40.51
cells median: 43.0
num horizontal windows: 32
num vertical windows: 32
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_1
window size: 300
sliding step: 150
cells mean: 132.18
cells median: 150.0
num horizontal windows: 16
num vertical windows: 16
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][362][INFO]: Window size info for slice: Slice_1
window size: 150
sliding step: 50
cells mean: 36.10
cells median: 38.0
num horizontal windows: 32
num vertical windows: 32
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][96][INFO]: Downsample rate is not provided by user - proceeding to calculate one based on minimal window size.
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_detection][100][INFO]: donwsample_rate = 75
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 3 cells < 36 (0.1 % of 36299)
[2024-08-29 18:01:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 19 cells < 36 (0.1 % of 36299)
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0639s
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_annotation took 0.4012s
[2024-08-29 18:01:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.1352s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.6845s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: MC cell type excluded, due to insufficient cells of that type: 2 cells < 31 (0.1 % of 31194)
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][78][INFO]: Podocyte cell type excluded, due to insufficient cells of that type: 31 cells < 31 (0.1 % of 31194)
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0699s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_annotation took 0.4077s
[2024-08-29 18:01:23][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.1180s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calc_feature_matrix took 0.6697s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0700s
[2024-08-29 18:01:24][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0667s
[2024-08-29 18:01:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_annotation took 0.7282s
[2024-08-29 18:01:25][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run neighbors...
2024-08-29 18:01:28.076298: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-08-29 18:01:28.103217: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-08-29 18:01:28.907330: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
[2024-08-29 18:01:42][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: neighbors end, consume time 16.9953s.
[2024-08-29 18:01:42][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run leiden...
[2024-08-29 18:01:45][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: leiden end, consume time 3.2460s.
[2024-08-29 18:01:45][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function cluster took 20.2484s
/root/miniforge3/envs/stereopy/lib/python3.8/site-packages/pandas/core/indexing.py:1728: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
self.obj[key] = value
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.3492s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][762][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_0.csv
... storing 'leiden' as categorical
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_0.h5ad.
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.1030s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_clustering took 0.5053s
[2024-08-29 18:01:50][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0180s
[2024-08-29 18:01:53][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_stats took 2.9624s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.3562s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0651s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0727s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0637s
[2024-08-29 18:01:55][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0816s
[2024-08-29 18:01:56][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0745s
[2024-08-29 18:01:56][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0915s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0676s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0889s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0776s
[2024-08-29 18:01:57][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0782s
[2024-08-29 18:01:58][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0846s
[2024-08-29 18:01:58][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.1013s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0723s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0834s
[2024-08-29 18:01:59][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_mixtures took 4.8307s
[2024-08-29 18:02:03][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function boxplot_stats took 3.7169s
[2024-08-29 18:02:03][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_0_stats.h5ad.
/root/miniforge3/envs/stereopy/lib/python3.8/site-packages/pandas/core/indexing.py:1728: ImplicitModificationWarning: Trying to modify attribute `.obs` of view, initializing view as actual.
self.obj[key] = value
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function community_calling_multiple_window_sizes_per_cell_multiprocessing took 4.8398s
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][sliding_window][259][INFO]: Sliding window cell mixture calculation done. Added results to adata.obs["tissue_sliding_window"]
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][762][INFO]: Saved community labels after clustering as a part of original anndata file to Slice_1.csv
... storing 'leiden' as categorical
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_1.h5ad.
[2024-08-29 18:02:08][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0867s
[2024-08-29 18:02:09][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_clustering took 0.5132s
[2024-08-29 18:02:09][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function calculate_cell_mixture_stats took 0.0364s
[2024-08-29 18:02:12][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_stats took 3.3399s
[2024-08-29 18:02:13][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.2943s
[2024-08-29 18:02:13][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0779s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0783s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0673s
[2024-08-29 18:02:14][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0784s
[2024-08-29 18:02:15][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0686s
[2024-08-29 18:02:15][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0763s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0718s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0771s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0727s
[2024-08-29 18:02:16][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0984s
[2024-08-29 18:02:17][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0734s
[2024-08-29 18:02:17][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0755s
[2024-08-29 18:02:18][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_mixtures took 4.2218s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function boxplot_stats took 3.2157s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][community_clustering_algorithm][741][INFO]: Saved clustering result tissue_Slice_1_stats.h5ad.
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0816s
[2024-08-29 18:02:21][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0762s
[2024-08-29 18:02:22][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_clustering took 0.7723s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_mixtures_total took 2.9256s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cell_abundance_total took 0.3702s
[2024-08-29 18:02:25][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_cluster_abundance_total took 0.2698s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function generate_report took 0.3498s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function _main took 64.2934s
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
<Figure size 640x480 with 0 Axes>
Visualization for CCD¶
[10]:
ms_ccd.plot('all_clustering')
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0739s
[2024-08-29 18:02:26][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_spatial took 0.0775s
[2024-08-29 18:02:27][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_all_clustering took 1.2210s
[11]:
ms_ccd.plot('cell_types_table')
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][utils][27][INFO]: Function plot_celltype_table took 1.6808s
<Figure size 640x480 with 0 Axes>
[12]:
## transfer ccd result to ms_data
ms_data.to_integrate(res_key = 'cell_communities', scope = slice_generator[:], _from=slice_generator[:], type='obs', item=['cell_communities']*ms_data.num_slice)
Manually merge cell communities to tissue¶
[13]:
anno_dict = {'0':'medulla',
'1':'cortex',
'2':'boundary',
'3':'pelvis',
'4':'other_immune',
'5':'cortex',
'6':'pelvis',
'unknown':'unknown'}
ms_data.tl.annotation(
mode='integrate',
scope=slice_generator[:],
annotation_information=anno_dict,
cluster_res_key='cell_communities',
res_key='tissue'
)
ms_data.tl.annotation(
mode='isolated',
scope=slice_generator[:],
annotation_information=anno_dict,
cluster_res_key='cell_communities',
res_key='tissue'
)
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: annotation end, consume time 0.0768s.
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run annotation
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][st_pipeline][41][INFO]: start to run annotation...
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][pipeline_utils][27][INFO]: Can not find raw data, the data which may have been normalized will be used.
[2024-08-29 18:02:29][Stereo][785302][Thread-58][140051311068736][st_pipeline][44][INFO]: annotation end, consume time 0.0716s.
[2024-08-29 18:02:29][Stereo][785302][Thread-59][140051185104448][st_pipeline][44][INFO]: annotation end, consume time 0.0712s.
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.1s finished
[14]:
ms_data.plt.cluster_scatter(
mode='integrate',
scope=slice_generator[:],
res_key='tissue',
reorganize_coordinate=2,
horizontal_offset_additional=500
)
[2024-08-29 18:02:29][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter
[14]:
Gene diversity of cell community v.s. cell type¶
Taken WT as example
[15]:
ms_data.tl.raw_checkpoint(mode='isolated')
ms_data.tl.raw_checkpoint(mode='integrate')
[2024-08-29 18:02:30][Stereo][785302][Thread-63][140051068888640][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:30][Stereo][785302][Thread-64][140051060495936][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.0s finished
Preprocessing for marker¶
[16]:
ms_data.tl.normalize_total()
ms_data.tl.log1p()
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: normalize_total end, consume time 0.1039s.
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2024-08-29 18:02:30][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run log1p...
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: log1p end, consume time 0.9183s.
Marker for tissue¶
[17]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False)
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2024-08-29 18:02:31][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run find_marker_genes...
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: find_marker_genes end, consume time 2.3076s.
[18]:
ms_data.plt.marker_genes_heatmap(res_key = 'marker_genes')
[2024-08-29 18:02:33][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_heatmap
[18]:
Plot medulla markers¶
[19]:
medulla_marker = ms_data.tl.result['scope_[0,1]']['marker_genes']['medulla.vs.rest']
medulla_marker = medulla_marker.loc[(medulla_marker['pvalues_adj'] < 0.05) & (medulla_marker['pct_rest'] < 0.2)]
medulla_marker = medulla_marker.sort_values('scores', ascending=False)
medulla_marker_list = medulla_marker['genes'][:8]
medulla_marker
[19]:
| scores | pvalues | pvalues_adj | log2fc | genes | pct | pct_rest | mean_count | |
|---|---|---|---|---|---|---|---|---|
| 0 | 114.183655 | 0.0 | 0.0 | 3.426115 | ENSMUSG00000027202 | 0.538050 | 0.184545 | 1.663084 |
| 2 | 61.116464 | 0.0 | 0.0 | 2.430726 | ENSMUSG00000022490 | 0.225488 | 0.069651 | 0.317728 |
| 3 | 54.538624 | 0.0 | 0.0 | 2.075901 | ENSMUSG00000018211 | 0.225950 | 0.088956 | 0.345794 |
| 4 | 54.307349 | 0.0 | 0.0 | 1.920096 | ENSMUSG00000030963 | 0.258017 | 0.112316 | 0.429489 |
| 5 | 46.074215 | 0.0 | 0.0 | 2.126544 | ENSMUSG00000001270 | 0.147868 | 0.050264 | 0.184752 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 22611 | -84.611139 | 0.0 | 0.0 | -5.021427 | ENSMUSG00000024131 | 0.006942 | 0.193352 | 0.007207 |
| 22612 | -84.693316 | 0.0 | 0.0 | -4.928562 | ENSMUSG00000036083 | 0.007372 | 0.195231 | 0.007736 |
| 22614 | -84.911546 | 0.0 | 0.0 | -5.308575 | ENSMUSG00000027070 | 0.005983 | 0.189646 | 0.006182 |
| 22615 | -85.269715 | 0.0 | 0.0 | -5.068659 | ENSMUSG00000026730 | 0.006777 | 0.194399 | 0.007405 |
| 22616 | -86.208055 | 0.0 | 0.0 | -5.330188 | ENSMUSG00000074254 | 0.005818 | 0.193942 | 0.005884 |
9120 rows × 8 columns
[20]:
ms_data.plt.marker_genes_scatter(res_key = 'marker_genes', groups='medulla', genes=medulla_marker_list)
[2024-08-29 18:02:34][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[20]:
Plot medulla corresponding celltype markers¶
[23]:
ms_data.plt.marker_genes_scatter(res_key='marker_genes_author', markers_num=4, groups=['EC', 'Other_Immune', 'TAL' ] )
[2024-08-29 18:02:40][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run marker_genes_scatter
[23]:
Conditional marker and constant marker¶
[24]:
ms_data.tl.raw_checkpoint(mode='isolated')
[2024-08-29 18:02:40][Stereo][785302][Thread-82][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint
[2024-08-29 18:02:40][Stereo][785302][Thread-83][140051052103232][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run raw_checkpoint
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.0s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.0s finished
[25]:
ms_data.tl.normalize_total(mode='isolated')
ms_data.tl.log1p(mode='isolated')
[2024-08-29 18:02:40][Stereo][785302][Thread-87][140050929411648][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2024-08-29 18:02:40][Stereo][785302][Thread-88][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run normalize_total
[2024-08-29 18:02:40][Stereo][785302][Thread-87][140050929411648][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:40][Stereo][785302][Thread-88][140050918397504][st_pipeline][41][INFO]: start to run normalize_total...
[2024-08-29 18:02:41][Stereo][785302][Thread-87][140050929411648][st_pipeline][44][INFO]: normalize_total end, consume time 0.0457s.
[2024-08-29 18:02:41][Stereo][785302][Thread-88][140050918397504][st_pipeline][44][INFO]: normalize_total end, consume time 0.0673s.
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run log1p
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][st_pipeline][41][INFO]: start to run log1p...
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run log1p
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][st_pipeline][41][INFO]: start to run log1p...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.1s finished
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2024-08-29 18:02:41][Stereo][785302][Thread-92][140051176711744][st_pipeline][44][INFO]: log1p end, consume time 0.0785s.
[2024-08-29 18:02:41][Stereo][785302][Thread-93][140050918397504][st_pipeline][44][INFO]: log1p end, consume time 0.0824s.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 0.1s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 0.1s finished
[26]:
ms_data.tl.find_marker_genes(cluster_res_key='tissue', use_highly_genes=False, use_raw =False, mode = 'isolated')
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][ms_pipeline][155][INFO]: data_obj(idx=0) in ms_data start to run find_marker_genes
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][ms_pipeline][155][INFO]: data_obj(idx=1) in ms_data start to run find_marker_genes
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][st_pipeline][41][INFO]: start to run find_marker_genes...
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][st_pipeline][41][INFO]: start to run find_marker_genes...
[Parallel(n_jobs=2)]: Using backend ThreadingBackend with 2 concurrent workers.
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:41][Stereo][785302][Thread-97][140051052897856][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][tool_base][122][INFO]: read group information, grouping by group column.
[2024-08-29 18:02:41][Stereo][785302][Thread-98][140051043710528][tool_base][151][INFO]: start to run...
[2024-08-29 18:02:42][Stereo][785302][Thread-97][140051052897856][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:42][Stereo][785302][Thread-98][140051043710528][tool_base][153][INFO]: end to run.
[2024-08-29 18:02:42][Stereo][785302][Thread-97][140051052897856][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.3789s.
[2024-08-29 18:02:42][Stereo][785302][Thread-98][140051043710528][st_pipeline][44][INFO]: find_marker_genes end, consume time 1.3771s.
[Parallel(n_jobs=2)]: Done 1 tasks | elapsed: 1.4s
[Parallel(n_jobs=2)]: Done 2 out of 2 | elapsed: 1.4s finished
UMOD KI conditional marker count as N_top_marker increases.
[27]:
ret = {}
for name in ms_data.merged_data.tl.result['tissue']['group'].cat.categories:
if name != 'unknown':
ret[name] = []
for N_top_marker in range(200):
ret[name].append(N_top_marker-(len(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))) )
[28]:
import matplotlib.pyplot as plt
fig = plt.figure(figsize = (5,5))
ax = fig.subplots()
#ct2color = dict(zip(adata_wt.obs['tissue_domain'].cat.categories, adata_wt.uns['tissue_domain_colors']))
for x, y in ret.items():
ax.plot(y, label=x)
ax.legend()
ax.set_title('UMOD KI conditional markers amount')
ax.set_xlabel('N top DEGs')
ax.set_ylabel('N UMOD KI conditional DEGs')
[28]:
Text(0, 0.5, 'N UMOD KI conditional DEGs')
Plot expression pattern for constant and conditional markers¶
[29]:
N_top_marker = 200
name = 'medulla'
[30]:
constant_marker = list(set([x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]) &
set([x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker]]))
WT_conditional_marker = [x for x in ms_data['WT'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
UMOD_KI_conditional_marker = [x for x in ms_data['UMOD_KI'].tl.result['marker_genes'][f'{name}.vs.rest']['genes'][:N_top_marker] if x not in constant_marker][:10]
constant_marker = constant_marker[:10]
[31]:
all_marker_list = constant_marker
all_marker_list.extend(WT_conditional_marker)
all_marker_list.extend(UMOD_KI_conditional_marker)
[32]:
ms_data.tl.scale()
[2024-08-29 18:05:04][Stereo][785302][MainThread][140059044030272][ms_pipeline][134][INFO]: data_obj(idx=0) in ms_data start to run scale
[2024-08-29 18:05:04][Stereo][785302][MainThread][140059044030272][st_pipeline][41][INFO]: start to run scale...
[2024-08-29 18:05:15][Stereo][785302][MainThread][140059044030272][st_pipeline][44][INFO]: scale end, consume time 11.0059s.
[33]:
celllist = ms_data.obs.loc[(ms_data.obs['tissue'] == 'medulla') & (ms_data.obs['batch'] == '0')].index
WT_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
celllist = ms_data.obs.loc[(ms_data.obs['tissue'] == 'medulla') & (ms_data.obs['batch'] == '1')].index
UMOD_KI_medulla_exp = ms_data.merged_data.sub_by_name(cell_name = celllist, gene_name = all_marker_list).exp_matrix
[34]:
import pandas as pd
import seaborn as sns
medulla_exp = pd.DataFrame([WT_medulla_exp.mean(axis=0), UMOD_KI_medulla_exp.mean(axis=0)])
medulla_exp.columns = all_marker_list
medulla_exp.index = ['WT', 'UMOD KI']
fig, ax = plt.subplots(1, figsize=(20,1))
sns.heatmap(medulla_exp, center = 0, ax=ax, cmap='bwr', vmin=-0.5, vmax=0.5,)
[34]:
<Axes: >
[ ]: