Work with AnnData

AnnData is widely used in bioinformatic software because of its highly compatible design and efficient functions, further information in AnnData Docs

Simple start

To meet the needs of more users, we integrate AnnData functionality into StereoExpData through adapter mode.

Using st.io.read_h5ad to read .h5ad file into AnnBasedStereoExpData.

[2]:
import stereo as st

# spatial_key is the key to get spatial information from AnnData.obsm.
data = st.io.read_h5ad('./mouse_forebrain.anndata_075.h5ad', spatial_key='spatial')

Show AnnData infomation.

[3]:
data.adata
[3]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC'
    uns: 'sn'
    obsm: 'spatial'

And we automatically index data.exp_matrix to data.adata.X, which means you can read a .h5ad file and use it as StereoExpData.

[4]:
data.exp_matrix is data.adata.X
[4]:
True

Supported StereoExpData features

  • exp_matrix: adata.X

  • postion adata.uns

  • cells: adata.obs

  • genes: adata.var

Supported StereoExpData functions

Most tools and plot functions are supported.

[5]:
data.tl.cal_qc()
data.tl.raw_checkpoint()
data.tl.normalize_total(target_sum=1e4)
data.tl.log1p()
data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)
data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca_test', svd_solver='arpack')
data.tl.neighbors(pca_res_key='pca_test', n_pcs=30, res_key='neighbors_test', n_jobs=8)
data.tl.umap(pca_res_key='pca_test', neighbors_res_key='neighbors_test', res_key='umap_test', init_pos='spectral')
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_test')
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run cal_qc...
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: cal_qc end, consume time 0.2945s.
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run normalize_total...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: normalize_total end, consume time 0.2316s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run log1p...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: log1p end, consume time 0.1042s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run highly_variable_genes...
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: highly_variable_genes end, consume time 0.8514s.
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run pca...
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: pca end, consume time 7.2990s.
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run neighbors...
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: neighbors end, consume time 56.7120s.
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run umap...
        completed  0  /  500 epochs
        completed  50  /  500 epochs
        completed  100  /  500 epochs
        completed  150  /  500 epochs
        completed  200  /  500 epochs
        completed  250  /  500 epochs
        completed  300  /  500 epochs
        completed  350  /  500 epochs
        completed  400  /  500 epochs
        completed  450  /  500 epochs
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: umap end, consume time 49.9396s.
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:29][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.6195s.

See what self-defined results have been written to adata.

[6]:
data.adata
[6]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
    obsm: 'spatial', 'X_pca_test', 'X_umap_test'
    obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'

Get a new column leiden_test after leiden, which could be displayed through adata.obs or cells.

[7]:
data.adata.obs
[7]:
annotation celltype class leiden timepoint total_counts n_genes_by_counts pct_counts_mt leiden_test
209_102-0 Brain RglDF12 Dorsal forebrain 7 E9.5 12112.0 3527 0.908190 22
209_87-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13954.0 3785 0.716640 22
209_88-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13810.0 3766 0.782042 22
209_89-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15159.0 3925 0.864173 22
209_90-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15711.0 4024 0.668322 22
... ... ... ... ... ... ... ... ... ...
679_180-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6986.0 2124 0.844546 11
679_181-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6461.0 2063 0.665532 11
679_182-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7860.0 2265 0.725191 11
679_183-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7732.0 2172 1.138127 11
679_184-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 8195.0 2091 0.695546 11

9092 rows × 9 columns

Plot the results of UMAP.

[8]:
data.plt.umap(res_key='umap_test', cluster_key='leiden_test')
[8]:
../_images/Tutorials_Work_with_Anndata_20_2.png

Using result in .h5ad

By glancing at data.adata, we find neighbors in .uns are produced by other bioinformatical softwares. We can use it as input to Leiden clustering.

[9]:
data.adata
[9]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
    obsm: 'spatial', 'X_pca_test', 'X_umap_test'
    obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'
[10]:
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_new')
[2023-11-14 16:46:30][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:31][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.1456s.
[11]:
data.adata.obs
[11]:
annotation celltype class leiden timepoint total_counts n_genes_by_counts pct_counts_mt leiden_test leiden_new
209_102-0 Brain RglDF12 Dorsal forebrain 7 E9.5 12112.0 3527 0.908190 22 22
209_87-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13954.0 3785 0.716640 22 22
209_88-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13810.0 3766 0.782042 22 22
209_89-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15159.0 3925 0.864173 22 22
209_90-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15711.0 4024 0.668322 22 22
... ... ... ... ... ... ... ... ... ... ...
679_180-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6986.0 2124 0.844546 11 11
679_181-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6461.0 2063 0.665532 11 11
679_182-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7860.0 2265 0.725191 11 11
679_183-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7732.0 2172 1.138127 11 11
679_184-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 8195.0 2091 0.695546 11 11

9092 rows × 10 columns

Writing to .h5ad

Use AnnData’s function write_h5ad to write AnnBasedStereoExpData to .h5ad file.

[12]:
data.adata.write_h5ad('./SS200000135TL_D1.stereo.h5ad')