{ "cells": [ { "cell_type": "markdown", "id": "2124624c", "metadata": {}, "source": [ "# Performance" ] }, { "cell_type": "markdown", "id": "27717f7d", "metadata": {}, "source": [ "In this case, we work on clustering on several types of bin size for testing performance." ] }, { "cell_type": "markdown", "id": "5e2734dc", "metadata": {}, "source": [ "## System requirements" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6a79baa6", "metadata": {}, "source": [ " Hardware" ] }, { "cell_type": "markdown", "id": "3b2b541b", "metadata": {}, "source": [ "Using `Intel Core i5-1135G7` with `32GB` memory." ] }, { "attachments": {}, "cell_type": "markdown", "id": "77fede23", "metadata": {}, "source": [ " Software" ] }, { "cell_type": "markdown", "id": "0f71876e", "metadata": {}, "source": [ "OS: `WSL(Linux version 4.4.0-19041-Microsoft)`" ] }, { "cell_type": "markdown", "id": "d1d24373", "metadata": {}, "source": [ "Python: `Python 3.8.13`" ] }, { "cell_type": "markdown", "id": "ff3f110b", "metadata": {}, "source": [ "Stereopy: `Stereopy 0.6.0 in conda-forge`" ] }, { "cell_type": "markdown", "id": "dbbfe68d", "metadata": {}, "source": [ "## Test process" ] }, { "attachments": {}, "cell_type": "markdown", "id": "245109c3", "metadata": {}, "source": [ "Download the [example data](http://upload.dcs.cloud:8090/share/bb6fab82-2c16-46b2-a95e-6931338f31bf) of mouse brain, `SS200000135TL_D1.tissue.gef`." ] }, { "cell_type": "code", "execution_count": null, "id": "aa14cd7e", "metadata": {}, "outputs": [], "source": [ "import stereo as st\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "def test_clustering_performance(gef_file, bin_size):\n", " data = st.io.read_gef(gef_file, bin_size=bin_size)\n", " data.tl.cal_qc()\n", " data.tl.raw_checkpoint()\n", " data.tl.normalize_total(target_sum=1e4)\n", " data.tl.log1p()\n", " data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n", " data.tl.scale(zero_center=False)\n", " data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')\n", " data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)\n", " data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')\n", " data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')\n", " data.tl.find_marker_genes(cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True)\n", " return data\n", "\n", "if __name__ == '__main__':\n", " gef_file_ = './SS200000135TL_D1.tissue.gef'\n", " bin_size_ = 50 # or 100 or 200\n", " print(f'work with path: `{gef_file_}`, bin: {bin_size_}')\n", " _ = test_clustering_performance(gef_file_, bin_size_)" ] }, { "cell_type": "markdown", "id": "8eaa65ef", "metadata": {}, "source": [ "## Clustering performance" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ee81f17f", "metadata": {}, "source": [ "Test Clustering Performance with `bin50, bin100, bin200` GEF\n", "\n", "| Bin Size | Cells Num | Genes Num | Percent of CPU | Max RSS | Cost Second (m:ss) |\n", "| ---- | ---- | ---- | ---- | ---- | ---- |\n", "| 50 | 35890 | 20816 | 124% | 10.32gb | 3:01.20 |\n", "| 100 | 9111 | 20816 | 160% | 3.60gb | 0:51.45 |\n", "| 200 | 2342 | 20816 | 148% | 1.85gb | 0:22.56 |" ] }, { "cell_type": "markdown", "id": "2095afa6", "metadata": {}, "source": [ "Usually, `find_marker_genes` is the most time-consuming step during the whole task." ] }, { "cell_type": "markdown", "id": "98a8c556", "metadata": {}, "source": [ "## Memory use" ] }, { "cell_type": "markdown", "id": "85f03e6d", "metadata": {}, "source": [ "We show the memory using of the clustering process of which bin size is 50." ] }, { "attachments": {}, "cell_type": "markdown", "id": "0b340e93", "metadata": {}, "source": [ "
\n", "\n", "**Note**\n", "\n", "Without stepping `find_marker_genes`.\n", "\n", "
" ] }, { "cell_type": "markdown", "id": "6c14248d", "metadata": {}, "source": [ "Filename is test_clustering.py (test via the python module `memory_profiler`)." ] }, { "cell_type": "code", "execution_count": null, "id": "f1ebed07", "metadata": {}, "outputs": [], "source": [ " 8 592.4 MiB 592.4 MiB 1 @memory_profiler.profile(stream=open(\"/mnt/d/projects/stereopy_dev/demo_data/SS200000135TL_D1/test_stereopy_mem.log\", \"w+\"))\n", "\n", " 9 def test_clustering_performance(gef_file, bin_size):\n", "\n", "10 1162.1 MiB 569.7 MiB 1 data = st.io.read_gef(gef_file, bin_size=bin_size)\n", "\n", "11 1162.6 MiB 0.5 MiB 1 data.tl.cal_qc()\n", "\n", "12 1216.4 MiB 53.8 MiB 1 data.tl.raw_checkpoint()\n", "\n", "13 1243.3 MiB 26.9 MiB 1 data.tl.normalize_total(target_sum=1e4)\n", "\n", "14 1270.2 MiB 26.9 MiB 1 data.tl.log1p()\n", "\n", "15 1274.0 MiB 3.9 MiB 1 data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n", "\n", "16 1274.1 MiB 0.1 MiB 1 data.tl.scale(zero_center=False)\n", "\n", "17 1339.7 MiB 65.6 MiB 1 data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')\n", "\n", "18 1487.9 MiB 148.2 MiB 1 data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)\n", "\n", "19 1492.4 MiB 4.5 MiB 1 data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')\n", "\n", "20 1518.5 MiB 26.0 MiB 1 data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')\n", "\n", "21 #data.tl.find_marker_genes(cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True)\n", "\n", "22 1518.5 MiB 0.0 MiB 1 return data" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]" }, "vscode": { "interpreter": { "hash": "70dbeb2a90198859cd91b6ea0f3adc73d66939fe301617b631d99dfc954c0323" } } }, "nbformat": 4, "nbformat_minor": 5 }