{ "cells": [ { "cell_type": "markdown", "id": "2124624c", "metadata": {}, "source": [ "# Performance" ] }, { "cell_type": "markdown", "id": "27717f7d", "metadata": {}, "source": [ "In this case, we work on clustering on several types of bin size for testing performance." ] }, { "cell_type": "markdown", "id": "5e2734dc", "metadata": {}, "source": [ "## System requirements" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6a79baa6", "metadata": {}, "source": [ " Hardware" ] }, { "cell_type": "markdown", "id": "3b2b541b", "metadata": {}, "source": [ "Using `Intel Core i5-1135G7` with `32GB` memory." ] }, { "attachments": {}, "cell_type": "markdown", "id": "77fede23", "metadata": {}, "source": [ " Software" ] }, { "cell_type": "markdown", "id": "0f71876e", "metadata": {}, "source": [ "OS: `WSL(Linux version 4.4.0-19041-Microsoft)`" ] }, { "cell_type": "markdown", "id": "d1d24373", "metadata": {}, "source": [ "Python: `Python 3.8.13`" ] }, { "cell_type": "markdown", "id": "ff3f110b", "metadata": {}, "source": [ "Stereopy: `Stereopy 0.6.0 in conda-forge`" ] }, { "cell_type": "markdown", "id": "dbbfe68d", "metadata": {}, "source": [ "## Test process" ] }, { "attachments": {}, "cell_type": "markdown", "id": "245109c3", "metadata": {}, "source": [ "Download the [example data](http://upload.dcs.cloud:8090/share/bb6fab82-2c16-46b2-a95e-6931338f31bf) of mouse brain, `SS200000135TL_D1.tissue.gef`." ] }, { "cell_type": "code", "execution_count": null, "id": "aa14cd7e", "metadata": {}, "outputs": [], "source": [ "import stereo as st\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "\n", "def test_clustering_performance(gef_file, bin_size):\n", " data = st.io.read_gef(gef_file, bin_size=bin_size)\n", " data.tl.cal_qc()\n", " data.tl.raw_checkpoint()\n", " data.tl.normalize_total(target_sum=1e4)\n", " data.tl.log1p()\n", " data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n", " data.tl.scale(zero_center=False)\n", " data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')\n", " data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)\n", " data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')\n", " data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')\n", " data.tl.find_marker_genes(cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True)\n", " return data\n", "\n", "if __name__ == '__main__':\n", " gef_file_ = './SS200000135TL_D1.tissue.gef'\n", " bin_size_ = 50 # or 100 or 200\n", " print(f'work with path: `{gef_file_}`, bin: {bin_size_}')\n", " _ = test_clustering_performance(gef_file_, bin_size_)" ] }, { "cell_type": "markdown", "id": "8eaa65ef", "metadata": {}, "source": [ "## Clustering performance" ] }, { "attachments": {}, "cell_type": "markdown", "id": "ee81f17f", "metadata": {}, "source": [ "Test Clustering Performance with `bin50, bin100, bin200` GEF\n", "\n", "| Bin Size | Cells Num | Genes Num | Percent of CPU | Max RSS | Cost Second (m:ss) |\n", "| ---- | ---- | ---- | ---- | ---- | ---- |\n", "| 50 | 35890 | 20816 | 124% | 10.32gb | 3:01.20 |\n", "| 100 | 9111 | 20816 | 160% | 3.60gb | 0:51.45 |\n", "| 200 | 2342 | 20816 | 148% | 1.85gb | 0:22.56 |" ] }, { "cell_type": "markdown", "id": "2095afa6", "metadata": {}, "source": [ "Usually, `find_marker_genes` is the most time-consuming step during the whole task." ] }, { "cell_type": "markdown", "id": "98a8c556", "metadata": {}, "source": [ "## Memory use" ] }, { "cell_type": "markdown", "id": "85f03e6d", "metadata": {}, "source": [ "We show the memory using of the clustering process of which bin size is 50." ] }, { "attachments": {}, "cell_type": "markdown", "id": "0b340e93", "metadata": {}, "source": [ "