{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "2124624c",
   "metadata": {},
   "source": [
    "# Performance"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "27717f7d",
   "metadata": {},
   "source": [
    "In this case, we work on clustering on several types of bin size for testing performance."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5e2734dc",
   "metadata": {},
   "source": [
    "## System requirements"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6a79baa6",
   "metadata": {},
   "source": [
    "    Hardware"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b2b541b",
   "metadata": {},
   "source": [
    "Using `Intel Core i5-1135G7` with `32GB` memory."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "77fede23",
   "metadata": {},
   "source": [
    "    Software"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0f71876e",
   "metadata": {},
   "source": [
    "OS: `WSL(Linux version 4.4.0-19041-Microsoft)`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d1d24373",
   "metadata": {},
   "source": [
    "Python: `Python 3.8.13`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ff3f110b",
   "metadata": {},
   "source": [
    "Stereopy: `Stereopy 0.6.0 in conda-forge`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dbbfe68d",
   "metadata": {},
   "source": [
    "## Test process"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "245109c3",
   "metadata": {},
   "source": [
    "Download the [example data](http://upload.dcs.cloud:8090/share/bb6fab82-2c16-46b2-a95e-6931338f31bf) of mouse brain, `SS200000135TL_D1.tissue.gef`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "aa14cd7e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import stereo as st\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "def test_clustering_performance(gef_file, bin_size):\n",
    "    data = st.io.read_gef(gef_file, bin_size=bin_size)\n",
    "    data.tl.cal_qc()\n",
    "    data.tl.raw_checkpoint()\n",
    "    data.tl.normalize_total(target_sum=1e4)\n",
    "    data.tl.log1p()\n",
    "    data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n",
    "    data.tl.scale(zero_center=False)\n",
    "    data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')\n",
    "    data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)\n",
    "    data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')\n",
    "    data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')\n",
    "    data.tl.find_marker_genes(cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True)\n",
    "    return data\n",
    "\n",
    "if __name__ == '__main__':\n",
    "    gef_file_ = './SS200000135TL_D1.tissue.gef'\n",
    "    bin_size_ = 50 # or 100 or 200\n",
    "    print(f'work with path: `{gef_file_}`, bin: {bin_size_}')\n",
    "    _ = test_clustering_performance(gef_file_, bin_size_)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8eaa65ef",
   "metadata": {},
   "source": [
    "## Clustering performance"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "ee81f17f",
   "metadata": {},
   "source": [
    "Test Clustering Performance with `bin50, bin100, bin200` GEF\n",
    "\n",
    "|  Bin Size  | Cells Num | Genes Num | Percent of CPU | Max RSS | Cost Second (m:ss) |\n",
    "| ---- | ---- | ----  | ----  | ---- | ---- |\n",
    "| 50  | 35890  | 20816 | 124% | 10.32gb | 3:01.20 |\n",
    "| 100 | 9111 | 20816 | 160% | 3.60gb | 0:51.45 |\n",
    "| 200 | 2342 | 20816 | 148% | 1.85gb | 0:22.56 |"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2095afa6",
   "metadata": {},
   "source": [
    "Usually, `find_marker_genes` is the most time-consuming step during the whole task."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98a8c556",
   "metadata": {},
   "source": [
    "## Memory use"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85f03e6d",
   "metadata": {},
   "source": [
    "We show the memory using of the clustering process of which bin size is 50."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "0b340e93",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\">\n",
    "\n",
    "**Note**\n",
    "\n",
    "Without stepping `find_marker_genes`.\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c14248d",
   "metadata": {},
   "source": [
    "Filename is test_clustering.py (test via the python module `memory_profiler`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f1ebed07",
   "metadata": {},
   "outputs": [],
   "source": [
    " 8    592.4 MiB    592.4 MiB           1   @memory_profiler.profile(stream=open(\"/mnt/d/projects/stereopy_dev/demo_data/SS200000135TL_D1/test_stereopy_mem.log\", \"w+\"))\n",
    "\n",
    " 9                                         def test_clustering_performance(gef_file, bin_size):\n",
    "\n",
    "10   1162.1 MiB    569.7 MiB           1       data = st.io.read_gef(gef_file, bin_size=bin_size)\n",
    "\n",
    "11   1162.6 MiB      0.5 MiB           1       data.tl.cal_qc()\n",
    "\n",
    "12   1216.4 MiB     53.8 MiB           1       data.tl.raw_checkpoint()\n",
    "\n",
    "13   1243.3 MiB     26.9 MiB           1       data.tl.normalize_total(target_sum=1e4)\n",
    "\n",
    "14   1270.2 MiB     26.9 MiB           1       data.tl.log1p()\n",
    "\n",
    "15   1274.0 MiB      3.9 MiB           1       data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)\n",
    "\n",
    "16   1274.1 MiB      0.1 MiB           1       data.tl.scale(zero_center=False)\n",
    "\n",
    "17   1339.7 MiB     65.6 MiB           1       data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')\n",
    "\n",
    "18   1487.9 MiB    148.2 MiB           1       data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)\n",
    "\n",
    "19   1492.4 MiB      4.5 MiB           1       data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')\n",
    "\n",
    "20   1518.5 MiB     26.0 MiB           1       data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')\n",
    "\n",
    "21                                             #data.tl.find_marker_genes(cluster_res_key='leiden', method='t_test', use_highly_genes=False, use_raw=True)\n",
    "\n",
    "22   1518.5 MiB      0.0 MiB           1       return data"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2 (v3.6.2:5fd33b5, Jul  8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]"
  },
  "vscode": {
   "interpreter": {
    "hash": "70dbeb2a90198859cd91b6ea0f3adc73d66939fe301617b631d99dfc954c0323"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}