Clustering by GPU#

After trying dozens of technologies, we find that GPU acceleration is the most efficient way to speed up clustering currently.

Requirements#

CUDA installation#

Linux

Linux users follow the guide, NVIDIA CUDA Installation Guide for Linux to install CUDA.

Windows

Installation of CUDA on Windows is a bit more complicated, because Stereopy is not supported on Windows now. Following the guide CUDA on WSL User Guide, you can run Stereopy with GPU option on WSL.

RAPIDS on Anaconda#

Select the correct version on the homepage of RAPIDS’ official website. Run following command to build up a specific environment:

conda create -y -n stereopy-rapids -c rapidsai -c conda-forge -c nvidia python=3.8 rapids=23.04.01 cuda-version=11.8

Note

My real experience installing CUDA on WSL successfully with NVIDIA Studio Driver WHQL 522.30 according to this bug reporter’s advice. By the way, this is my personal PC environment with CUDA: * Intel Core i7-7700k * NVIDIA-GeForce-RTX-3060（NVIDIA-SMI 522.30; Driver Version: 522.30; CUDA Version: 11.8） * WSL2 on Windows10(21H2)

Stereopy installation#

Installation through conda command fails in the environment with GPU acceleration, only PyPI command will succeed.

pip install stereopy

After installing stereopy, you will get some warnings about dependency conflicts, two of them must be reinstalled to correct version and others can be ignored.

pip install dask==2023.3.2 distributed==2023.3.2.1

Clutsering with GPU#

Start with common pipeline:

[ ]:

import stereo as st

# reading data
file_path = './stereopy/test/xujunhao/data/SS200000135TL_D1/SS200000135TL_D1.gef'
bin_size = 50
data = st.io.read_gef(file_path, bin_size=bin_size)

# preprocessing
data.tl.cal_qc()
print(data.exp_matrix.shape)
data.tl.normalize_total(target_sum=1e4)
data.tl.log1p()

# get highly variable genes
data.tl.highly_variable_genes(min_mean=0.0125, max_mean=3, min_disp=0.5, res_key='highly_variable_genes', n_top_genes=None)

# PCA
data.tl.pca(use_highly_genes=True, hvg_res_key='highly_variable_genes', n_pcs=20, res_key='pca', svd_solver='arpack')

Cluster demo by CPU:

[ ]:

data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8)
data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral')
data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')

Cluster demo by GPU:

[ ]:

# note that the parameter method is set to rapids

data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8, method='rapids')
data.tl.umap(pca_res_key='pca', neighbors_res_key='neighbors', res_key='umap', init_pos='spectral', method='rapids')
data.tl.leiden(neighbors_res_key='neighbors', res_key='leiden', method='rapids')

GPU setting#

How to use GPU acceleration on Neighbors, UMAP and Leiden, is shown above, setting the parameter method to rapids.

When cluster by Louvain using GPU acceleration, set flavor to rapids, as below:

[ ]:

data.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors', n_jobs=8, method='rapids')
data.tl.louvain(neighbors_res_key='neighbors', res_key='louvain', flavor='rapids', use_weights=True)