Work with AnnData#

AnnData is widely used in bioinformatic software because of its highly compatible design and efficient functions, further information in AnnData Docs

Simple start#

To meet the needs of more users, we integrate AnnData functionality into StereoExpData through adapter mode.

Read .h5ad file into AnnBasedStereoExpData.

[2]:
import stereo as st

data = st.io.read_h5ad('./mouse_forebrain.anndata_075.h5ad')

Show AnnData infomation.

[3]:
data._ann_data
[3]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC'
    uns: 'sn'
    obsm: 'spatial'

And we automatically index data.exp_matrix to data._ann_data.X, which means you can read a .h5ad file and use it as StereoExpData.

[4]:
data.exp_matrix is data._ann_data.X
[4]:
True

Supported StereoExpData features#

  • exp_matrix: _ann_data.X

  • postion _ann.uns

  • cells: _ann_data.obs

  • genes: _ann_data.var

Supported StereoExpData functions#

Most tools and plot functions are supported.

[5]:
data.tl.cal_qc()
data.tl.raw_checkpoint()
data.tl.normalize_total(target_sum=1e4)
data.tl.log1p()
data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)
data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca_test', svd_solver='arpack')
data.tl.neighbors(pca_res_key='pca_test', n_pcs=30, res_key='neighbors_test', n_jobs=8)
data.tl.umap(pca_res_key='pca_test', neighbors_res_key='neighbors_test', res_key='umap_test', init_pos='spectral')
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_test')
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run cal_qc...
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: cal_qc end, consume time 0.2945s.
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run normalize_total...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: normalize_total end, consume time 0.2316s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run log1p...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: log1p end, consume time 0.1042s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run highly_variable_genes...
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: highly_variable_genes end, consume time 0.8514s.
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run pca...
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: pca end, consume time 7.2990s.
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run neighbors...
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: neighbors end, consume time 56.7120s.
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run umap...
        completed  0  /  500 epochs
        completed  50  /  500 epochs
        completed  100  /  500 epochs
        completed  150  /  500 epochs
        completed  200  /  500 epochs
        completed  250  /  500 epochs
        completed  300  /  500 epochs
        completed  350  /  500 epochs
        completed  400  /  500 epochs
        completed  450  /  500 epochs
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: umap end, consume time 49.9396s.
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:29][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.6195s.

See what self-defined results have been written to _ann_data.

[6]:
data._ann_data
[6]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
    obsm: 'spatial', 'X_pca_test', 'X_umap_test'
    obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'

Get a new column leiden_test after leiden, which could be displayed through _ann_data.obs or cells.

[7]:
data._ann_data.obs
[7]:
annotation celltype class leiden timepoint total_counts n_genes_by_counts pct_counts_mt leiden_test
209_102-0 Brain RglDF12 Dorsal forebrain 7 E9.5 12112.0 3527 0.908190 22
209_87-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13954.0 3785 0.716640 22
209_88-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13810.0 3766 0.782042 22
209_89-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15159.0 3925 0.864173 22
209_90-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15711.0 4024 0.668322 22
... ... ... ... ... ... ... ... ... ...
679_180-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6986.0 2124 0.844546 11
679_181-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6461.0 2063 0.665532 11
679_182-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7860.0 2265 0.725191 11
679_183-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7732.0 2172 1.138127 11
679_184-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 8195.0 2091 0.695546 11

9092 rows × 9 columns

Plot the results of UMAP.

[8]:
data.plt.umap(res_key='umap_test', cluster_key='leiden_test')
[8]:
../_images/Tutorials_Work_with_Anndata_20_2.png

Using result in .h5ad#

By glancing at data._ann_data, we find neighbors in .uns are produced by other bioinformatical softwares. We can use it as input to Leiden clustering.

[9]:
data._ann_data
[9]:
AnnData object with n_obs × n_vars = 9092 × 10276
    obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
    var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
    obsm: 'spatial', 'X_pca_test', 'X_umap_test'
    obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'
[10]:
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_new')
[2023-11-14 16:46:30][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:31][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.1456s.
[11]:
data._ann_data.obs
[11]:
annotation celltype class leiden timepoint total_counts n_genes_by_counts pct_counts_mt leiden_test leiden_new
209_102-0 Brain RglDF12 Dorsal forebrain 7 E9.5 12112.0 3527 0.908190 22 22
209_87-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13954.0 3785 0.716640 22 22
209_88-0 Brain RglDF12 Dorsal forebrain 7 E9.5 13810.0 3766 0.782042 22 22
209_89-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15159.0 3925 0.864173 22 22
209_90-0 Brain RglDF12 Dorsal forebrain 7 E9.5 15711.0 4024 0.668322 22 22
... ... ... ... ... ... ... ... ... ... ...
679_180-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6986.0 2124 0.844546 11 11
679_181-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 6461.0 2063 0.665532 11 11
679_182-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7860.0 2265 0.725191 11 11
679_183-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 7732.0 2172 1.138127 11 11
679_184-7 Brain Neur511 Cortical or hippocampal glutamatergic 3 E16.5 8195.0 2091 0.695546 11 11

9092 rows × 10 columns

Writing to .h5ad#

Use AnnData’s function write_h5ad to write AnnBasedStereoExpData to .h5ad file.

[12]:
data._ann_data.write_h5ad('./SS200000135TL_D1.stereo.h5ad')