3D Cell-cell Communication#

Cell-cell communication (CCC) refers to the process by which cells interact with each other, using molecular signals. The signals can be transmitted through various mechanisms, such as the release of signaling molecules called ligands by one cell that bind to receptors on another cell, or through direct physical contact between cells. CCC plays a critical role in a wide range of biological processes, including development, immune responses, and tissue repair, among others.

In this section, we will provide a concise overview of the process for conducting the CCC analysis using Stereopy. Note that the CCC analysis, along with the Gene Regulatory Network analysis, form the core of the NicheReg3D pipeline [Fang23].

MSData construction#

The demo data used here is 59 samples of 2D mouse embryo heart stereo-seq data. Download example data and load them as a MSData.

Data loading#

[1]:

import os
from natsort import natsorted
import stereo as st
from stereo.core.ms_data import MSData
from stereo.core.ms_pipeline import slice_generator

import warnings
warnings.filterwarnings('ignore')

[2]:

# prepara for input directory
data_dir = '../test_data/mouse_embryo_heart_slices/'
files = []
for i in os.listdir(data_dir):
    files.append(os.path.join(data_dir, i))

# ensure data order by naming them regularly
files = natsorted(files)

# construct MSData object
ms_data = MSData(_relationship='continuous', _var_type='intersect')

# add all samples into MSData
for sample in files:
    ms_data += st.io.read_h5ad(sample, bin_type='bins', bin_size=1)

ms_data

[2]:

ms_data: {'0': (870, 30254), '1': (344, 30254), '2': (195, 30254), '3': (190, 30254), '4': (34, 30254), '5': (493, 30254), '6': (285, 30254), '7': (753, 30254), '8': (731, 30254), '9': (412, 30254), '10': (541, 30254), '11': (1103, 30254), '12': (1051, 30254), '13': (1293, 30254), '14': (1269, 30254), '15': (1503, 30254), '16': (1872, 30254), '17': (2071, 30254), '18': (2124, 30254), '19': (2221, 30254), '20': (2198, 30254), '21': (1992, 30254), '22': (1837, 30254), '23': (2010, 30254), '24': (1977, 30254), '25': (2078, 30254), '26': (2255, 30254), '27': (664, 30254), '28': (2343, 30254), '29': (2168, 30254), '30': (895, 30254), '31': (1478, 30254), '32': (1691, 30254), '33': (1370, 30254), '34': (2911, 30254), '35': (2501, 30254), '36': (2194, 30254), '37': (2023, 30254), '38': (1443, 30254), '39': (1190, 30254), '40': (1461, 30254), '41': (2760, 30254), '42': (2662, 30254), '43': (2391, 30254), '44': (2490, 30254), '45': (2104, 30254), '46': (1557, 30254), '47': (2253, 30254), '48': (1969, 30254), '49': (2448, 30254), '50': (2487, 30254), '51': (2226, 30254), '52': (2124, 30254), '53': (1282, 30254), '54': (1098, 30254), '55': (873, 30254), '56': (661, 30254), '57': (454, 30254), '58': (423, 30254)}
num_slice: 59
names: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58']
obs: []
var: []
relationship: continuous
var_type: intersect to 0
mss: []

Coordinates calibration#

The 3D analysis requires correct three-dimensional coordinates. If your single samples have not been calibrated, run paste to recalculate the x-coordinate and y-coordinate according to the expression matrix.

[3]:

# ms_data.tl.paste()

Data integration#

After the recalibration of the coordinates, the next step is to perform integration. If the data samples already contain z-coordinate information, it will be used as the primary source for the coordinate calculation. However, if this information is not available, you can set space_between acoording to the biochemical experiment records. The parameter space_between represents the distance between each adjacent two samples, which will be used to assign z-coordinate for each sample.

[4]:

ms_data.integrate()

Note

The ms_data.integrate() function should be executed after loading the data to ensure proper integration of the loaded data. The default method for subsequent multi-sample analysis is set to intersect, which means that it considers only the genes (var) that are common to all samples, taking their intersection. After performing integration, _var_type shows the intersect gene numbers. An alternative option provided is the “union” method, which considers the union of genes (var) across all samples for subsequent multi-sample analysis.

Also integrate the celltype information from single sample into the MSData.

[5]:

from stereo.core.ms_pipeline import slice_generator
ms_data.to_integrate(res_key='celltype', scope=slice_generator[:], _from=slice_generator[:], type='obs', item=['celltype'] * ms_data.num_slice)

Preprocessing#

scope and mode are two crucial parameters in basic multi-sample analysis and correlated funcitons.

scope , similar to a list, determines which samples are included for the analysis.
mode, acts as a switch, controls whether the analysis is performed on single sample (isolated) or multi samples (integrate). By specifying the desired mode, you can easily distinguish and set the processing mode for your analysis.

[6]:

ms_data.tl.cal_qc(scope=slice_generator[:],mode='integrate')

[2023-11-13 16:10:52][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cal_qc
[2023-11-13 16:10:52][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run cal_qc...
[2023-11-13 16:10:52][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: cal_qc end, consume time 0.4439s.

If necessary, you can run ms_data.tl.filter_cells() function to filter cells based on specific indicators calculated using quality control (QC) metrics. ms_data.plt.genes_count is a good option to observe the gene distribution before applying any filtration.

[7]:

ms_data.tl.filter_cells(
        min_gene=1,
        min_n_genes_by_counts=3,
        pct_counts_mt=5,
        inplace=True
        )
ms_data.tl.raw_checkpoint()

[2023-11-13 16:10:52][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run filter_cells
[2023-11-13 16:10:52][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run filter_cells...
[2023-11-13 16:10:53][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: filter_cells end, consume time 0.6616s.
[2023-11-13 16:10:53][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run raw_checkpoint

Next, we generate the spatial scatter figures that depict the distribution of QC metrics. In Multi-sample analysis, serveral parameters are added here for better visual presentation.

[8]:

ms_data.plt.spatial_scatter(
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=2,        # the width of scale
            reorganize_coordinate=8,           # the number of plots in each row
            # horizontal_offset_additional=200,  # adjustment for horizontal distance
            # vertical_offset_additional=500     # adjustment for vertical distance
            )

[2023-11-13 16:10:53][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run spatial_scatter

[8]:

../_images/Tutorials%28Multi-sample%29_3D_Cell_Cell_Communication_22_3.png

Note

Demo datasets used here are all elaborately processed beforehand, so that filtering and normalization will not be performed in this tutorial.

Cell clustering#

Before proceeding with the cell-cell communication analysis, make sure to perform the necessary cell clustering analysis if it hasn’t been completed yet.

[9]:

# clustering
ms_data.tl.pca(scope=slice_generator[:],mode='integrate', use_highly_genes=False, n_pcs=30, res_key='pca')
ms_data.tl.neighbors(scope=slice_generator[:],mode='integrate', pca_res_key='pca', res_key='neighbors')
ms_data.tl.leiden(scope=slice_generator[:],mode='integrate', neighbors_res_key='neighbors', res_key='leiden')

[2023-11-13 16:10:57][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run pca
[2023-11-13 16:10:57][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run pca...
[2023-11-13 16:10:57][Stereo][15833][MainThread][139891007416128][dim_reduce][78][WARNING]: svd_solver: auto can not be used with sparse input.
Use "arpack" (the default) instead.
[2023-11-13 16:11:15][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: pca end, consume time 17.4799s.
[2023-11-13 16:11:15][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run neighbors
[2023-11-13 16:11:15][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run neighbors...
[2023-11-13 16:15:33][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: neighbors end, consume time 258.4044s.
[2023-11-13 16:15:33][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run leiden
[2023-11-13 16:15:33][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run leiden...
[2023-11-13 16:16:19][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: leiden end, consume time 45.4781s.

[10]:

ms_data.plt.cluster_scatter(
            res_key='leiden',
            scope=slice_generator[:],
            mode='integrate',
            plotting_scale_width=2,        # the width of scale
            reorganize_coordinate=8,           # the number of plots in each row
            # horizontal_offset_additional=200,  # adjustment for horizontal distance
            # vertical_offset_additional=500     # adjustment for vertical distance
            )

[2023-11-13 16:16:19][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run cluster_scatter

[10]:

../_images/Tutorials%28Multi-sample%29_3D_Cell_Cell_Communication_27_3.png

Note

Since the above example data already have cell type labels stored in obs, we can omit the clustering step and proceed directly to the next section.

Spatial information incorperation#

To incorperate the spatial information and ensure the accuracy of the signaling model, we assume that intercellular ligand-receptor (L-R) communications routinely exist among closely neighboring cells.

Before conducting the communication analysis, it is advisable to filter the cells that are located in close proximity to each other. This step ensures that the cells considered for communication analysis are physically close enough to facilitate actual communication. Nevertheless, you can always skip this step and proceed directly to the actual CCC analysis part using the entire dataset.

In Stereopy, we provide two approaches for doing the spatial filteration.

Microenvironment#

The first approach is to input microenvironment information to the communication analysis. This approach treats each cell type as a cohesive unit, and microenvironments are formed by combining two or more closely located cell types. The microenvironments can be calculated with ms_data.tl.gen_ccc_micro_envs. We provide two approaches when calculating the microenvironment: one is using the minimum spanning tree (MST); the other is through splitting the fully connected network into multiple strongly connected components.

This function should be ran twice because it includes two parts:

Calculating how the clusters are divided into microenvironments under diffrent thresholds.
Generating micro environments by choosing an appropriate method and threshold based on the result of first part.

More details refer to API.

[11]:

ms_data.merged_data.tl.result['celltype']

[11]:

	bins	group
0	36648_2-0	endocardial/endothelial (EC)
1	36907_2-0	endocardial/endothelial (EC)
2	36908_2-0	endocardial/endothelial (EC)
3	37308_2-0	endocardial/endothelial (EC)
4	38993_2-0	endocardial/endothelial (EC)
...	...	...
90291	23593_65-58	atrial-specific CM
90292	24057_65-58	atrial-specific CM
90293	24258_65-58	atrial-specific CM
90294	24300_65-58	atrial-specific CM
90295	24681_65-58	atrial-specific CM

90296 rows × 2 columns

[12]:

ms_data.tl.gen_ccc_micro_envs(
            scope=slice_generator[:],
            mode='integrate',
            cluster_res_key='celltype',
            res_key='ccc_micro_envs'
            )

[2023-11-13 16:16:23][Stereo][15833][MainThread][139891007416128][ms_pipeline][121][INFO]: register algorithm gen_ccc_micro_envs to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488

Now, you can choose a appropriate threshold based on this function's result.

[12]:

Note

The coordinates must be stored as x, y and z in obs, or spatial in obsm.

As we have calculated the final bootstrap MST stored in mst_final and pairwise KL-divergences stored in pairwise_kl_divergence of uns['ccc_micro_envs'], you can choose a proper method and threshold to get the final microenvironments.

[13]:

ms_data.tl.gen_ccc_micro_envs(
            scope=slice_generator[:],
            mode='integrate',
            method='split',
            threshold=2,
            res_key='ccc_micro_envs'
            )

[2023-11-13 16:16:31][Stereo][15833][MainThread][139891007416128][ms_pipeline][121][INFO]: register algorithm gen_ccc_micro_envs to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488

Using the split method with theshold 2, we got two microenvironments as follows:

[14]:

ms_data.merged_data.tl.result['ccc_micro_envs']['micro_envs']

[14]:

	cell_type	microenvironment
0	atrial-specific CM	microenv_0
1	epicardial (EP)	microenv_0
2	endocardial/endothelial (EC)	microenv_0
3	ventricular-specific CM	microenv_0
4	blood	microenv_1
5	fibro-mesenchymal (FM)	microenv_1

Niche#

The alternative filteration approach constructs a niche for two given cell types at individual cell level. For each cell of type 1, only cells of type 2 that are within a certain Euclidean distance threshold, denoted as niche_distance, are retained. The niche of cell type 1 and 2 is constructed by including all cells of type 1 that have type 2 neighbors, as well as all their neighboring type 2 cells.

You can run ms_data.tl.get_niche to get a new StereoExpData object based on a pair of cell clusters, which represents a niche.

Afterwards, you can run communication analysis based on this niche. More in API.

[15]:

data_niche = ms_data.tl.get_niche(
    scope=slice_generator[:],
    mode='integrate',
    niche_distance=0.025,
    cluster_1='ventricular-specific CM',
    cluster_2='epicardial (EP)',
    cluster_res_key='celltype'
    )

data_niche

[2023-11-13 16:16:31][Stereo][15833][MainThread][139891007416128][ms_pipeline][121][INFO]: register algorithm get_niche to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488

[15]:

StereoExpData object with n_cells X n_genes = 14792 X 30254
bin_type: bins
bin_size: 1
offset_x = None
offset_y = None
cells: ['cell_name', 'batch', 'celltype', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
genes: ['gene_name', 'n_cells', 'n_counts', 'mean_umi']
cells_matrix = ['pca']
key_record: {'pca': ['pca'], 'neighbors': ['neighbors'], 'cluster': ['leiden'], 'gene_exp_cluster': ['gene_exp_leiden']}
result: ['pca', 'neighbors', 'leiden', 'gene_exp_leiden']

Communication analysis#

We suggest using normalized non-log-transformed data to do the analysis.

analysis_type can be set to simple or statistical. simple does not rely on any statistics and only provides the mean expression values for each interaction for each possible cell type pair, while statistical also estimates the statistical significance of these mean expression values using a permutation approach.

This function currently supports the species of HUMAN and MOUSE. If input the data of other species, you have to translate the genes to homologous genes of human or mouse. You can select a database from cellphonedb, liana and celltalkdb [Efremova20], or input the path of your own database.

Note

HUMAN species can not be used with celltalkdb database for the moment.

You can incorperate the spatial information by specifying microenvironments with parameter micro_envs:

[16]:

ms_data.tl.normalize_total(target_sum=None)
ms_data.tl.cell_cell_communication(
        scope=slice_generator[:],
        mode='integrate',
        analysis_type='statistical',
        cluster_res_key='celltype',
        species='MOUSE',
        database='liana',
        micro_envs='ccc_micro_envs',
        res_key='cell_cell_communication'
        )

[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][ms_pipeline][131][INFO]: data_obj(idx=0) in ms_data start to run normalize_total
[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: normalize_total end, consume time 0.2674s.
[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][ms_pipeline][121][INFO]: register algorithm cell_cell_communication to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488
[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][main][128][INFO]: species: MOUSE
[2023-11-13 16:16:45][Stereo][15833][MainThread][139891007416128][main][129][INFO]: database: liana
[2023-11-13 16:18:09][Stereo][15833][MainThread][139891007416128][main][188][INFO]: [statistical analysis] Threshold:0.1 Precision:3 Iterations:500 Threads:1
[2023-11-13 16:18:31][Stereo][15833][MainThread][139891007416128][main][218][INFO]: Running Real Analysis
[2023-11-13 16:18:31][Stereo][15833][MainThread][139891007416128][main][761][INFO]: Limiting cluster combinations using microenvironments
[2023-11-13 16:18:31][Stereo][15833][MainThread][139891007416128][main][232][INFO]: Running Statistical Analysis
statistical analysis: 100%|███████████████████████████████████████| 500/500 [26:52<00:00,  3.23s/it]
[2023-11-13 16:45:24][Stereo][15833][MainThread][139891007416128][main][1029][INFO]: Building Pvalues result
[2023-11-13 16:45:24][Stereo][15833][MainThread][139891007416128][main][1064][INFO]: Building results

Or you can directly do the analysis on the niche StereoExpData:

[17]:

data_niche.tl.normalize_total(target_sum=None)
data_niche.tl.cell_cell_communication(
        analysis_type='statistical',
        cluster_res_key='celltype',
        species='MOUSE',
        database='liana',
        threshold=0.1,
        res_key='cell_cell_communication'
)

[2023-11-13 16:45:31][Stereo][15833][MainThread][139891007416128][st_pipeline][41][INFO]: start to run normalize_total...
[2023-11-13 16:45:31][Stereo][15833][MainThread][139891007416128][st_pipeline][44][INFO]: normalize_total end, consume time 0.0372s.
[2023-11-13 16:45:31][Stereo][15833][MainThread][139891007416128][st_pipeline][77][INFO]: register algorithm cell_cell_communication to <stereo.core.st_pipeline.StPipeline object at 0x7f38cd623460>
[2023-11-13 16:45:31][Stereo][15833][MainThread][139891007416128][main][128][INFO]: species: MOUSE
[2023-11-13 16:45:31][Stereo][15833][MainThread][139891007416128][main][129][INFO]: database: liana
[2023-11-13 16:45:44][Stereo][15833][MainThread][139891007416128][main][188][INFO]: [statistical analysis] Threshold:0.1 Precision:3 Iterations:500 Threads:1
[2023-11-13 16:45:48][Stereo][15833][MainThread][139891007416128][main][218][INFO]: Running Real Analysis
[2023-11-13 16:45:48][Stereo][15833][MainThread][139891007416128][main][232][INFO]: Running Statistical Analysis
statistical analysis: 100%|███████████████████████████████████████| 500/500 [04:43<00:00,  1.76it/s]
[2023-11-13 16:50:32][Stereo][15833][MainThread][139891007416128][main][1029][INFO]: Building Pvalues result
[2023-11-13 16:50:32][Stereo][15833][MainThread][139891007416128][main][1064][INFO]: Building results

You may also set subsampling=True to enable subsampling of the cells for faster performance.

Result observation#

The results of cell-cell communication are stored in data.tl.result.

The means result shows the mean expression values of each L-R pair for each cell cluster pair.

[18]:

##### mean
ms_data.merged_data.tl.result['cell_cell_communication']['means']
# data_niche.tl.result['cell_cell_communication']['means']

[18]:

	id_cp_interaction	interacting_pair	partner_a	partner_b	gene_a	gene_b	secreted	receptor_a	receptor_b	annotation_strategy	...	endocardial/endothelial (EC)\|endocardial/endothelial (EC)	endocardial/endothelial (EC)\|ventricular-specific CM	ventricular-specific CM\|atrial-specific CM	ventricular-specific CM\|epicardial (EP)	ventricular-specific CM\|endocardial/endothelial (EC)	ventricular-specific CM\|ventricular-specific CM	blood\|blood	blood\|fibro-mesenchymal (FM)	fibro-mesenchymal (FM)\|blood	fibro-mesenchymal (FM)\|fibro-mesenchymal (FM)
1	CPI-SS0A7B487D4	KLRG2_WNT11	simple:A4D1S0	simple:O96014	KLRG2	WNT11	False	False	False	user_curated	...	0.000	0.000	0.003	0.006	0.004	0.002	0.002	0.005	0.002	0.005
2	CPI-SS0C08F5056	FZD9_WNT11	simple:O00144	simple:O96014	FZD9	WNT11	False	False	False	user_curated	...	0.005	0.003	0.004	0.007	0.005	0.003	0.003	0.006	0.003	0.006
3	CPI-SS029839DC3	MUSK_WNT11	simple:O15146	simple:O96014	MUSK	WNT11	False	False	False	user_curated	...	0.004	0.002	0.003	0.006	0.004	0.002	0.000	0.000	0.002	0.005
4	CPI-SS0F653A282	FZD6_WNT11	simple:O60353	simple:O96014	FZD6	WNT11	False	False	False	user_curated	...	0.008	0.007	0.005	0.008	0.006	0.004	0.003	0.006	0.005	0.008
5	CPI-SS070E5DE9E	FZD7_WNT11	simple:O75084	simple:O96014	FZD7	WNT11	False	False	False	user_curated	...	0.007	0.005	0.006	0.009	0.007	0.005	0.010	0.013	0.007	0.010
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4680	CPI-SS050123B31	NPFFR1_NPVF	simple:Q9GZQ6	simple:Q9HCQ7	NPFFR1	NPVF	False	False	False	user_curated	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000
4681	CPI-SS0F30C777C	LRRC4C_NTNG1	simple:Q9HCJ2	simple:Q9Y2I2	LRRC4C	NTNG1	False	False	False	user_curated	...	0.002	0.002	0.003	0.003	0.003	0.003	0.003	0.005	0.002	0.004
4682	CPI-SS0F0C92F50	RTN4_TNFRSF19	simple:Q9NQC3	simple:Q9NS68	RTN4	TNFRSF19	False	False	False	user_curated	...	0.072	0.076	0.076	0.076	0.070	0.074	0.059	0.060	0.081	0.082
4683	CPI-SS0441C0907	IL17RB_IL17B	simple:Q9NRM6	simple:Q9UHF5	IL17RB	IL17B	False	False	False	user_curated	...	0.000	0.000	0.000	0.000	0.001	0.000	0.000	0.000	0.000	0.000
4684	CPI-SS0FF51A757	TNFSF18_TNFRSF18	simple:Q9UNG2	simple:Q9Y5U5	TNFSF18	TNFRSF18	False	False	False	user_curated	...	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000	0.000

3673 rows × 31 columns

If doing statistical analysis, the significant_means results only keeps statistical significant mean values in the mean result (non-significant means have a value of -1).

[19]:

##### significant mean
ms_data.merged_data.tl.result['cell_cell_communication']['significant_means']
# data_niche.tl.result['cell_cell_communication']['significant_means'].iloc[:15]

[19]:

	id_cp_interaction	interacting_pair	partner_a	partner_b	gene_a	gene_b	secreted	receptor_a	receptor_b	annotation_strategy	...	endocardial/endothelial (EC)\|endocardial/endothelial (EC)	endocardial/endothelial (EC)\|ventricular-specific CM	ventricular-specific CM\|atrial-specific CM	ventricular-specific CM\|epicardial (EP)	ventricular-specific CM\|endocardial/endothelial (EC)	ventricular-specific CM\|ventricular-specific CM	blood\|blood	blood\|fibro-mesenchymal (FM)	fibro-mesenchymal (FM)\|blood	fibro-mesenchymal (FM)\|fibro-mesenchymal (FM)
3049	CPI-SS03FCE1B02	APP_APLP2	simple:P05067	simple:Q06481	APP	APLP2	False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
1002	CPI-SS09A44E1CE	COL1A1_CD93	simple:P02452	simple:Q9NPY3	COL1A1	CD93	False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
1003	CPI-SS040FC67D5	COL4A1_CD93	simple:P02462	simple:Q9NPY3	COL4A1	CD93	False	False	False	user_curated	...	0.257	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
1004	CPI-SS07C75B6E2	COL1A2_CD93	simple:P08123	simple:Q9NPY3	COL1A2	CD93	False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
1005	CPI-SS09AFB7612	COL4A2_CD93	simple:P08572	simple:Q9NPY3	COL4A2	CD93	False	False	False	user_curated	...	0.154	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
622	CPI-SC0BCE0B954	COL15A1_integrin_a11b1_complex	simple:P39059	complex:integrin_a11b1_complex	COL15A1		False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
623	CPI-SC0755F1632	COL18A1_integrin_a11b1_complex	simple:P39060	complex:integrin_a11b1_complex	COL18A1		False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
624	CPI-SC037038602	COL7A1_integrin_a11b1_complex	simple:Q02388	complex:integrin_a11b1_complex	COL7A1		False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
626	CPI-SC092A2325E	COL14A1_integrin_a11b1_complex	simple:Q05707	complex:integrin_a11b1_complex	COL14A1		False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0
4684	CPI-SS0FF51A757	TNFSF18_TNFRSF18	simple:Q9UNG2	simple:Q9Y5U5	TNFSF18	TNFRSF18	False	False	False	user_curated	...	-1.000	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0	-1.0

3673 rows × 32 columns

The pvalues result shows the p-values for each mean value in the means result.

[20]:

##### p-value
ms_data.merged_data.tl.result['cell_cell_communication']['pvalues']
# data_niche.tl.result['cell_cell_communication']['pvalues']

[20]:

	id_cp_interaction	interacting_pair	partner_a	partner_b	gene_a	gene_b	secreted	receptor_a	receptor_b	annotation_strategy	...	endocardial/endothelial (EC)\|endocardial/endothelial (EC)	endocardial/endothelial (EC)\|ventricular-specific CM	ventricular-specific CM\|atrial-specific CM	ventricular-specific CM\|epicardial (EP)	ventricular-specific CM\|endocardial/endothelial (EC)	ventricular-specific CM\|ventricular-specific CM	blood\|blood	blood\|fibro-mesenchymal (FM)	fibro-mesenchymal (FM)\|blood	fibro-mesenchymal (FM)\|fibro-mesenchymal (FM)
1	CPI-SS0A7B487D4	KLRG2_WNT11	simple:A4D1S0	simple:O96014	KLRG2	WNT11	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
2	CPI-SS0C08F5056	FZD9_WNT11	simple:O00144	simple:O96014	FZD9	WNT11	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
3	CPI-SS029839DC3	MUSK_WNT11	simple:O15146	simple:O96014	MUSK	WNT11	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
4	CPI-SS0F653A282	FZD6_WNT11	simple:O60353	simple:O96014	FZD6	WNT11	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
5	CPI-SS070E5DE9E	FZD7_WNT11	simple:O75084	simple:O96014	FZD7	WNT11	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
4680	CPI-SS050123B31	NPFFR1_NPVF	simple:Q9GZQ6	simple:Q9HCQ7	NPFFR1	NPVF	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
4681	CPI-SS0F30C777C	LRRC4C_NTNG1	simple:Q9HCJ2	simple:Q9Y2I2	LRRC4C	NTNG1	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
4682	CPI-SS0F0C92F50	RTN4_TNFRSF19	simple:Q9NQC3	simple:Q9NS68	RTN4	TNFRSF19	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
4683	CPI-SS0441C0907	IL17RB_IL17B	simple:Q9NRM6	simple:Q9UHF5	IL17RB	IL17B	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0
4684	CPI-SS0FF51A757	TNFSF18_TNFRSF18	simple:Q9UNG2	simple:Q9Y5U5	TNFSF18	TNFRSF18	False	False	False	user_curated	...	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0	1.0

3673 rows × 31 columns

Set parameters, output, *_filename and output_format, to save results into files, more in API.

Visualization of communication#

We provide dot plot, heatmap, circos plot and sankey plot for visualizing the CCC analysis results.

Note

Currently, only statistical analysis type is supported to visualization.

Note

On this example, you need to replace ms_data to data_niche when running the visualization functions if analysis is done on niche.

Dot plot#

Here we recommend setting interacting_pairs and clusters1 /clusters2before plotting, because the whole result might be too huge to be displayed.

[21]:

# a list of 'gene1_gene2'
interacting_pairs = [
                    'GPC3_CD81',
                    'COL1A1_CD44',
                    'FN1_CD44',
                    'DCN_ERBB4',
                    'VIM_CD44',
                    'ITGB1_VCAN'
                    ]

# interacting_pairs = None

ms_data.plt.ccc_dot_plot(
                    res_key='cell_cell_communication',
                    interacting_pairs=interacting_pairs,
                    clusters1='epicardial (EP)'
                    )

[2023-11-13 16:50:34][Stereo][15833][MainThread][139891007416128][ms_pipeline][128][INFO]: register plot_func ccc_dot_plot to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488
[2023-11-13 16:50:34][Stereo][15833][MainThread][139891007416128][plot_ccc][74][INFO]: Generating dot plot

[21]:

../_images/Tutorials%28Multi-sample%29_3D_Cell_Cell_Communication_64_3.png

Heatmap#

The heatmap displays the number/log-number of significant L-R pairs for each pair of cell types.

[22]:

ms_data.plt.ccc_heatmap(res_key='cell_cell_communication')

[2023-11-13 16:50:34][Stereo][15833][MainThread][139891007416128][ms_pipeline][128][INFO]: register plot_func ccc_heatmap to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488
[2023-11-13 16:50:34][Stereo][15833][MainThread][139891007416128][plot_ccc][160][INFO]: Generating heatmap plot

[22]:

../_images/Tutorials%28Multi-sample%29_3D_Cell_Cell_Communication_66_3.png

Circos plot#

The circos plot shows the number of ligand-receptor pairs (with direction) between each cell cluster.

[23]:

ms_data.plt.ccc_circos_plot(res_key='cell_cell_communication')

[2023-11-13 16:50:35][Stereo][15833][MainThread][139891007416128][ms_pipeline][128][INFO]: register plot_func ccc_circos_plot to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488
[2023-11-13 16:50:35][Stereo][15833][MainThread][139891007416128][plot_ccc][255][INFO]: Generating circos plot

[23]:

../_images/Tutorials%28Multi-sample%29_3D_Cell_Cell_Communication_68_3.png

3D Gene Regulatory Network (3D GRN)#

In order to display the sankey plot, you need to run 3D GRN beforehand.

[24]:

tfs_fn = '../test_data/grn/test_mm_mgi_tfs.txt'
database_fn = '../test_data/grn/mm10_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather'
motif_anno_fn = '../test_data/grn/motifs-v10nr_clust-nr.mgi-m0.001-o0.0.tbl'

ms_data.tl.regulatory_network_inference(
    database_fn,
    motif_anno_fn,
    tfs_fn,
    save_regulons=True,
    fn_prefix='3D',
    num_workers=10,
    method='hotspot',
    ThreeD_slice=True,
    use_raw=True,
    mode='integrate'
    )

[2023-11-13 16:50:36][Stereo][15833][MainThread][139891007416128][ms_pipeline][121][INFO]: register algorithm regulatory_network_inference to <class 'stereo.core.stereo_exp_data.StereoExpData'>-139890412633488
[2023-11-13 16:50:36][Stereo][15833][MainThread][139891007416128][main][93][INFO]: the raw expression matrix will be used.
[2023-11-13 16:51:29][Stereo][15833][MainThread][139891007416128][main][378][INFO]: Loading ranked database...
[2023-11-13 16:51:31][Stereo][15833][MainThread][139891007416128][main][122][INFO]: If data belongs to 3D, it only can be runned as hotspot method now
[2023-11-13 16:51:31][Stereo][15833][MainThread][139891007416128][main][221][INFO]: cached file not found, running hotspot now
[2023-11-13 16:53:05][Stereo][15833][MainThread][139891007416128][main][246][INFO]: compute_autocorrelations()
100%|██████████| 14405/14405 [01:18<00:00, 183.93it/s]
[2023-11-13 16:54:34][Stereo][15833][MainThread][139891007416128][main][248][INFO]: compute_autocorrelations() done
[2023-11-13 16:54:34][Stereo][15833][MainThread][139891007416128][main][251][INFO]: compute_local_correlations

Computing pair-wise local correlation on 2513 features...

100%|██████████| 2513/2513 [00:41<00:00, 60.18it/s]
100%|██████████| 3156328/3156328 [1:20:38<00:00, 652.38it/s]
[2023-11-13 18:17:24][Stereo][15833][MainThread][139891007416128][main][254][INFO]: Network Inference DONE
[2023-11-13 18:17:24][Stereo][15833][MainThread][139891007416128][main][255][INFO]: Hotspot: create 2513 features
[2023-11-13 18:17:24][Stereo][15833][MainThread][139891007416128][main][256][INFO]: (2513, 2513)
[2023-11-13 18:17:24][Stereo][15833][MainThread][139891007416128][main][261][INFO]: detected 63 predefined TF in data

2023-11-13 18:17:54,542 - pyscenic.utils - INFO - Calculating Pearson correlations.
INFO:pyscenic.utils:Calculating Pearson correlations.

2023-11-13 18:17:55,938 - pyscenic.utils - WARNING - Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
        Dropout masking is currently set to [False].
WARNING:pyscenic.utils:Note on correlation calculation: the default behaviour for calculating the correlations has changed after pySCENIC verion 0.9.16. Previously, the default was to calculate the correlation between a TF and target gene using only cells with non-zero expression values (mask_dropouts=True). The current default is now to use all cells to match the behavior of the R verision of SCENIC. The original settings can be retained by setting 'rho_mask_dropouts=True' in the modules_from_adjacencies function, or '--mask_dropouts' from the CLI.
        Dropout masking is currently set to [False].

2023-11-13 18:18:41,324 - pyscenic.utils - INFO - Creating modules.
INFO:pyscenic.utils:Creating modules.
[2023-11-13 18:19:06][Stereo][15833][MainThread][139891007416128][main][431][INFO]: cached file not found, running prune modules now

[########################################] | 100% Completed | 479.93 s

[2023-11-13 18:27:10][Stereo][15833][MainThread][139891007416128][main][475][INFO]: cached file not found, calculating auc_activity_level now

Create regulons from a dataframe of enriched features.
Additional columns saved: []

Sankey plot#

The sankey plot shows the ligand-receptor communications between a pair of cell clusters and the latent regulatory relationship between receptors and downstream TFs in the receiver cells.

The parameter regulons specifies the path of file saving regulons which is output of function of 3D Gene Regulatory Network.

You need to specify the path of file which contains the weighted network infomation by parameter weighted_network_path . By default, you can download the NicheNet-V2 weighted network files from here.

There are two files about weighted network infomations:

Use weighted_network_lr_sig_human.txt when the parameter species on data.tl.cell_cell_communication is set to 'HUMAN'.
Use weighted_network_lr_sig_mouse.txt when the parameter species is set to 'MOUSE'.

[ ]:

ms_data.plt.ccc_sankey_plot(
    sender_cluster='epicardial (EP)',
    receiver_cluster='ventricular-specific CM',
    homo_transfer=True,
    weighted_network_path='../test_data/weighted_network_lr_sig_mouse.txt',
    regulons='./3D_regulon_list.csv'
)

Observation of results in 3D#

You can also observe the analysis results by 3D visualization through an online website.

[ ]:

ms_data.plt.start_vt3d_browser(
    scope=slice_generator[:],
    mode='integrate',
    ccc_res_key='cell_cell_communication',
    # grn_res_key='regulatory_network_inference'
)

Note

If you don’t run the notebook locally, you need to specify the IP of host on which this notebook is ran by parameter ip.

[ ]:

# ms_data.plt.display_3d_ccc(scope=slice_generator[:], mode='integrate', ip='172.16.18.15')

ms_data.plt.display_3d_ccc(scope=slice_generator[:], mode='integrate')