{ "cells": [ { "cell_type": "markdown", "id": "d230e8e2", "metadata": {}, "source": [ "# Work with AnnData" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8f04cd10", "metadata": {}, "source": [ "`AnnData` is widely used in bioinformatic software because of its highly compatible design and efficient functions, further information in [AnnData Docs](https://anndata.readthedocs.io/en/latest/tutorials/notebooks/getting-started.html)" ] }, { "cell_type": "markdown", "id": "88205659", "metadata": {}, "source": [ "## Simple start" ] }, { "cell_type": "markdown", "id": "676b59a9", "metadata": {}, "source": [ "To meet the needs of more users, we integrate `AnnData` functionality into `StereoExpData` through adapter mode." ] }, { "attachments": {}, "cell_type": "markdown", "id": "ad08c238", "metadata": {}, "source": [ "Using `st.io.read_h5ad` to read `.h5ad` file into `AnnBasedStereoExpData`." ] }, { "cell_type": "code", "execution_count": 2, "id": "04d1f1d2", "metadata": {}, "outputs": [], "source": [ "import stereo as st\n", "\n", "# spatial_key is the key to get spatial information from AnnData.obsm.\n", "data = st.io.read_h5ad('./mouse_forebrain.anndata_075.h5ad', spatial_key='spatial')" ] }, { "cell_type": "markdown", "id": "f6d52b34", "metadata": {}, "source": [ "Show `AnnData` infomation." ] }, { "cell_type": "code", "execution_count": 3, "id": "f517a3d8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 9092 × 10276\n", " obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint'\n", " var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC'\n", " uns: 'sn'\n", " obsm: 'spatial'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.adata" ] }, { "cell_type": "markdown", "id": "12399a29", "metadata": {}, "source": [ "And we automatically index `data.exp_matrix` to `data.adata.X`, which means you can read a `.h5ad` file and use it as `StereoExpData`." ] }, { "cell_type": "code", "execution_count": 4, "id": "3616a659", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.exp_matrix is data.adata.X" ] }, { "cell_type": "markdown", "id": "90309d79", "metadata": {}, "source": [ "## Supported StereoExpData features" ] }, { "attachments": {}, "cell_type": "markdown", "id": "066fa721", "metadata": {}, "source": [ "* exp_matrix: `adata.X`\n", "* postion `adata.uns`\n", "* cells: `adata.obs`\n", "* genes: `adata.var`" ] }, { "cell_type": "markdown", "id": "dbde4899", "metadata": {}, "source": [ "## Supported StereoExpData functions" ] }, { "cell_type": "markdown", "id": "cb340d08", "metadata": {}, "source": [ "Most tools and plot functions are supported." ] }, { "cell_type": "code", "execution_count": 5, "id": "b3abdcec", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run cal_qc...\n", "[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: cal_qc end, consume time 0.2945s.\n", "[2023-11-14 16:44:32][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run normalize_total...\n", "[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: normalize_total end, consume time 0.2316s.\n", "[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run log1p...\n", "[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: log1p end, consume time 0.1042s.\n", "[2023-11-14 16:44:33][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run highly_variable_genes...\n", "[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: highly_variable_genes end, consume time 0.8514s.\n", "[2023-11-14 16:44:34][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run pca...\n", "[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: pca end, consume time 7.2990s.\n", "[2023-11-14 16:44:41][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run neighbors...\n", "[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: neighbors end, consume time 56.7120s.\n", "[2023-11-14 16:45:38][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run umap...\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\tcompleted 0 / 500 epochs\n", "\tcompleted 50 / 500 epochs\n", "\tcompleted 100 / 500 epochs\n", "\tcompleted 150 / 500 epochs\n", "\tcompleted 200 / 500 epochs\n", "\tcompleted 250 / 500 epochs\n", "\tcompleted 300 / 500 epochs\n", "\tcompleted 350 / 500 epochs\n", "\tcompleted 400 / 500 epochs\n", "\tcompleted 450 / 500 epochs\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: umap end, consume time 49.9396s.\n", "[2023-11-14 16:46:28][Stereo][77692][MainThread][140240360204096][st_pipeline][37][INFO]: start to run leiden...\n", "[2023-11-14 16:46:29][Stereo][77692][MainThread][140240360204096][st_pipeline][40][INFO]: leiden end, consume time 1.6195s.\n" ] } ], "source": [ "data.tl.cal_qc()\n", "data.tl.raw_checkpoint()\n", "data.tl.normalize_total(target_sum=1e4)\n", "data.tl.log1p()\n", "data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n", "data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca_test', svd_solver='arpack')\n", "data.tl.neighbors(pca_res_key='pca_test', n_pcs=30, res_key='neighbors_test', n_jobs=8)\n", "data.tl.umap(pca_res_key='pca_test', neighbors_res_key='neighbors_test', res_key='umap_test', init_pos='spectral')\n", "data.tl.leiden(neighbors_res_key='neighbors_test', res_key='leiden_test')" ] }, { "attachments": {}, "cell_type": "markdown", "id": "e245359f", "metadata": {}, "source": [ "See what self-defined results have been written to `adata`. " ] }, { "cell_type": "code", "execution_count": 6, "id": "f46697e7", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "AnnData object with n_obs × n_vars = 9092 × 10276\n", " obs: 'annotation', 'celltype', 'class', 'leiden', 'timepoint', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt', 'leiden_test'\n", " var: 'fuzzy_C_result', 'greater_pvalue', 'less_pvalue', 'log1p_mean_counts', 'log1p_total_counts', 'logFC', 'n_cells', 'n_counts', 'mean_umi', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'\n", " uns: 'sn', 'highly_variable_genes', 'pca_test', 'neighbors_test', 'umap_test', 'leiden_test', 'gene_exp_leiden_test'\n", " obsm: 'spatial', 'X_pca_test', 'X_umap_test'\n", " obsp: 'neighbors_test_connectivities', 'neighbors_test_distances'" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.adata" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b6ccf279", "metadata": {}, "source": [ "Get a new column `leiden_test` after `leiden`, which could be displayed through `adata.obs` or `cells`." ] }, { "cell_type": "code", "execution_count": 7, "id": "124b08b3", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | annotation | \n", "celltype | \n", "class | \n", "leiden | \n", "timepoint | \n", "total_counts | \n", "n_genes_by_counts | \n", "pct_counts_mt | \n", "leiden_test | \n", "
|---|---|---|---|---|---|---|---|---|---|
| 209_102-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "12112.0 | \n", "3527 | \n", "0.908190 | \n", "22 | \n", "
| 209_87-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "13954.0 | \n", "3785 | \n", "0.716640 | \n", "22 | \n", "
| 209_88-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "13810.0 | \n", "3766 | \n", "0.782042 | \n", "22 | \n", "
| 209_89-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "15159.0 | \n", "3925 | \n", "0.864173 | \n", "22 | \n", "
| 209_90-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "15711.0 | \n", "4024 | \n", "0.668322 | \n", "22 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 679_180-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "6986.0 | \n", "2124 | \n", "0.844546 | \n", "11 | \n", "
| 679_181-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "6461.0 | \n", "2063 | \n", "0.665532 | \n", "11 | \n", "
| 679_182-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "7860.0 | \n", "2265 | \n", "0.725191 | \n", "11 | \n", "
| 679_183-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "7732.0 | \n", "2172 | \n", "1.138127 | \n", "11 | \n", "
| 679_184-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "8195.0 | \n", "2091 | \n", "0.695546 | \n", "11 | \n", "
9092 rows × 9 columns
\n", "| \n", " | annotation | \n", "celltype | \n", "class | \n", "leiden | \n", "timepoint | \n", "total_counts | \n", "n_genes_by_counts | \n", "pct_counts_mt | \n", "leiden_test | \n", "leiden_new | \n", "
|---|---|---|---|---|---|---|---|---|---|---|
| 209_102-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "12112.0 | \n", "3527 | \n", "0.908190 | \n", "22 | \n", "22 | \n", "
| 209_87-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "13954.0 | \n", "3785 | \n", "0.716640 | \n", "22 | \n", "22 | \n", "
| 209_88-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "13810.0 | \n", "3766 | \n", "0.782042 | \n", "22 | \n", "22 | \n", "
| 209_89-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "15159.0 | \n", "3925 | \n", "0.864173 | \n", "22 | \n", "22 | \n", "
| 209_90-0 | \n", "Brain | \n", "RglDF12 | \n", "Dorsal forebrain | \n", "7 | \n", "E9.5 | \n", "15711.0 | \n", "4024 | \n", "0.668322 | \n", "22 | \n", "22 | \n", "
| ... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
| 679_180-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "6986.0 | \n", "2124 | \n", "0.844546 | \n", "11 | \n", "11 | \n", "
| 679_181-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "6461.0 | \n", "2063 | \n", "0.665532 | \n", "11 | \n", "11 | \n", "
| 679_182-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "7860.0 | \n", "2265 | \n", "0.725191 | \n", "11 | \n", "11 | \n", "
| 679_183-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "7732.0 | \n", "2172 | \n", "1.138127 | \n", "11 | \n", "11 | \n", "
| 679_184-7 | \n", "Brain | \n", "Neur511 | \n", "Cortical or hippocampal glutamatergic | \n", "3 | \n", "E16.5 | \n", "8195.0 | \n", "2091 | \n", "0.695546 | \n", "11 | \n", "11 | \n", "
9092 rows × 10 columns
\n", "