Work with AnnData#
AnnData
is widely used in bioinformatic software because of its highly compatible design and efficient functions, further information in AnnData Docs
Simple start#
To meet the needs of more users, we integrate AnnData
functionality into StereoExpData
through adapter mode.
Read .h5ad
file into AnnBasedStereoExpData
.
[2]:
import stereo as st
data = st.io.read_h5ad('./mouse_forebrain.anndata_075.h5ad')
Show AnnData
infomation.
[3]:
data._ann_data
[3]:
AnnData object with n_obs × n_vars = 9092 × 10276
obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint'
var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC'
uns: 'sn'
obsm: 'spatial'
And we automatically index data.exp_matrix
to data._ann_data.X
, which means you can read a .h5ad
file and use it as StereoExpData
.
[4]:
data.exp_matrix is data._ann_data.X
[4]:
True
Supported StereoExpData features#
exp_matrix:
_ann_data.X
postion
_ann.uns
cells:
_ann_data.obs
genes:
_ann_data.var
Supported StereoExpData functions#
Most tools and plot functions are supported.
[5]:
data.tl.cal_qc()
data.tl.raw_checkpoint()
data.tl.normalize_total(target_sum=1e4)
data.tl.log1p()
data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)
data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca_test', svd_solver='arpack')
data.tl.neighbors(pca_res_key='pca_test', n_pcs=30, res_key='neighbors_test', n_jobs=8)
data.tl.umap(pca_res_key='pca_test', neighbors_res_key='neighbors_test', res_key='umap_test', init_pos='spectral')
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_test')
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run cal_qc...
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: cal_qc end, consume time 0.2945s.
[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run normalize_total...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: normalize_total end, consume time 0.2316s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run log1p...
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: log1p end, consume time 0.1042s.
[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run highly_variable_genes...
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: highly_variable_genes end, consume time 0.8514s.
[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run pca...
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: pca end, consume time 7.2990s.
[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run neighbors...
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: neighbors end, consume time 56.7120s.
[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run umap...
completed 0 / 500 epochs
completed 50 / 500 epochs
completed 100 / 500 epochs
completed 150 / 500 epochs
completed 200 / 500 epochs
completed 250 / 500 epochs
completed 300 / 500 epochs
completed 350 / 500 epochs
completed 400 / 500 epochs
completed 450 / 500 epochs
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: umap end, consume time 49.9396s.
[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:29][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.6195s.
See what self-defined results have been written to _ann_data
.
[6]:
data._ann_data
[6]:
AnnData object with n_obs × n_vars = 9092 × 10276
obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
obsm: 'spatial', 'X_pca_test', 'X_umap_test'
obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'
Get a new column leiden_test
after leiden
, which could be displayed through _ann_data.obs
or cells
.
[7]:
data._ann_data.obs
[7]:
annotation | celltype | class | leiden | timepoint | total_counts | n_genes_by_counts | pct_counts_mt | leiden_test | |
---|---|---|---|---|---|---|---|---|---|
209_102-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 12112.0 | 3527 | 0.908190 | 22 |
209_87-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 13954.0 | 3785 | 0.716640 | 22 |
209_88-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 13810.0 | 3766 | 0.782042 | 22 |
209_89-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 15159.0 | 3925 | 0.864173 | 22 |
209_90-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 15711.0 | 4024 | 0.668322 | 22 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
679_180-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 6986.0 | 2124 | 0.844546 | 11 |
679_181-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 6461.0 | 2063 | 0.665532 | 11 |
679_182-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 7860.0 | 2265 | 0.725191 | 11 |
679_183-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 7732.0 | 2172 | 1.138127 | 11 |
679_184-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 8195.0 | 2091 | 0.695546 | 11 |
9092 rows × 9 columns
Plot the results of UMAP.
[8]:
data.plt.umap(res_key='umap_test', cluster_key='leiden_test')
[8]:
Using result in .h5ad
#
By glancing at data._ann_data
, we find neighbors
in .uns
are produced by other bioinformatical softwares. We can use it as input to Leiden clustering.
[9]:
data._ann_data
[9]:
AnnData object with n_obs × n_vars = 9092 × 10276
obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'
var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'
obsm: 'spatial', 'X_pca_test', 'X_umap_test'
obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'
[10]:
data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_new')
[2023-11-14 16:46:30][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...
[2023-11-14 16:46:31][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.1456s.
[11]:
data._ann_data.obs
[11]:
annotation | celltype | class | leiden | timepoint | total_counts | n_genes_by_counts | pct_counts_mt | leiden_test | leiden_new | |
---|---|---|---|---|---|---|---|---|---|---|
209_102-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 12112.0 | 3527 | 0.908190 | 22 | 22 |
209_87-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 13954.0 | 3785 | 0.716640 | 22 | 22 |
209_88-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 13810.0 | 3766 | 0.782042 | 22 | 22 |
209_89-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 15159.0 | 3925 | 0.864173 | 22 | 22 |
209_90-0 | Brain | RglDF12 | Dorsal forebrain | 7 | E9.5 | 15711.0 | 4024 | 0.668322 | 22 | 22 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
679_180-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 6986.0 | 2124 | 0.844546 | 11 | 11 |
679_181-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 6461.0 | 2063 | 0.665532 | 11 | 11 |
679_182-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 7860.0 | 2265 | 0.725191 | 11 | 11 |
679_183-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 7732.0 | 2172 | 1.138127 | 11 | 11 |
679_184-7 | Brain | Neur511 | Cortical or hippocampal glutamatergic | 3 | E16.5 | 8195.0 | 2091 | 0.695546 | 11 | 11 |
9092 rows × 10 columns
Writing to .h5ad
#
Use AnnData
’s function write_h5ad
to write AnnBasedStereoExpData
to .h5ad
file.
[12]:
data._ann_data.write_h5ad('./SS200000135TL_D1.stereo.h5ad')