scTransform

As a single-cell RNA sequencing transform method, scTransform uses regularized negative binomial regression to normalize the express matrix of UMI [Hafemeister19].

Differences between methods

Before exploring scTransform, let’s review what classic normalization does.

[1]:
import sys
import stereo as st
import pandas as pd
import numpy as np

# read data
data1 = st.io.read_gef('./SS200000135TL_D1.tissue.gef')
data1.sparse2array()

gmean = np.exp(np.log(data1.exp_matrix.T + 1).mean(1)) - 1

# preprocessing
data1.tl.raw_checkpoint()
data1.tl.normalize_total(target_sum=1e4)
data1.tl.log1p()

log_normalize_result = pd.DataFrame([gmean, data1.exp_matrix.T.var(1)], index=['gmean', 'log_normalize_variance'], columns=data1.gene_names).T

from stereo.algorithm.sctransform.plotting import plot_log_normalize_var

fig1=plot_log_normalize_var(log_normalize_result)
../_images/Tutorials_scTransform_4_0.png

After log1p normalization, it is apparently observed that lowly expressed genes contribute just a little variance in this sample.

[2]:
data2 = st.io.read_gef('./SS200000135TL_D1.tissue.gef')
data2.tl.sctransform(res_key='sctransform', inplace=True, filter_hvgs=True)

from stereo.algorithm.sctransform.plotting import plot_residual_var

fig2=plot_residual_var(data2.tl.result['sctransform'])
../_images/Tutorials_scTransform_6_0.png

Whereas, after scTransform, gene express matrix is transformed from raw counts to Pearson residual. Different with 1og1p normalization, scTransform balances variance distribution of all genes, which means that not only highly expressed genes make sense, so do the lowly expressed genes.

Let us take some genes from a real dataset after normalization via scTransform, and compare their variance distribution to that normalized by log1p.

[3]:
data3 = st.io.read_gef('./SS200000135TL_D1.tissue.gef')
data3.tl.cal_qc()
data3.plt.spatial_scatter_by_gene(gene_name='Th')

from stereo.algorithm.sctransform.plotting import plot_genes_var_contribution

fig3=plot_genes_var_contribution(data3, gene_names=['Ptgds','Hbb-bs', 'Kcnip4', 'Gm28928', 'Trpm3', 'Th'])