High Resolution Matrix ExportΒΆ
In some cases, we need to export part of selected high resolution area in expression matrix while we read them in low resolution.
Please download our example data.
Here we task GEF file as example, you can change st.io.read_gef to st.io.read_gem to manipulate GEM file.
[1]:
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
import stereo as st
# read GEF file in CGEF
# data_path = '../data/SS200000135TL_D1.cellbin.gef'
# data = st.io.read_gef(data_path, bin_type='cell_bins')
# read GEF file in BGEF
data_path = '../data/SS200000135TL_D1.tissue.gef'
data = st.io.read_gef(data_path, bin_type='bins', bin_size=20)
data.tl.cal_qc()
[2024-04-03 10:22:03][Stereo][251729][MainThread][140209575724864][reader][1090][INFO]: read_gef begin ...
path:../data/SS200000135TL_D1.tissue.gef bin:20
generateBinInfo - 30.440000 cpu sec
Warning:your source data file only has gene column,but you specify return gene_names and gene ids which is invalid!so we will return an empty array which has same size with gene_names for the gene_ids
[2024-04-03 10:22:28][Stereo][251729][MainThread][140209575724864][reader][1268][INFO]: the matrix has 223053 cells, and 24302 genes.
[2024-04-03 10:22:28][Stereo][251729][MainThread][140209575724864][reader][1269][INFO]: read_gef end.
[2024-04-03 10:22:28][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run cal_qc...
[2024-04-03 10:22:29][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: cal_qc end, consume time 1.2868s.
We have two selection mode, one is polygon selection and another one is box selection.
An example of polygon selection:
An example of box selection:
Area SelectionΒΆ
Setting ploy_select to True to enter the polygon selection mode while setting to False to box selection mode.
In polygon selection mode, you need to click twice to start and end the selection, afterwards, for each selected area, you have to click it and click the add button so that all the selected areas are added into the queue to be exported.
In box selection mode, you only need to click the export button once after completing all selection, all the selected areas will be added into the queue.
[2]:
# plot and show rendering
# polygon selection when poly_select is True
# box selection when poly_select is False
ins = data.plt.interact_spatial_scatter(width=500, height=500, poly_select=True)
ins.show()
[2]:
Exporting as a GEF/GEM file.ΒΆ
The type of output file is determined by input file, that is say, when input file is GEM, it will still export data as GEM despite you set the extension of output file as .gef.
[4]:
ins.export_high_res_area(
data_path,
'./SS200000135TL_D1.tissue.high_res.gef',
# drop=True
)
processing selected 3 area
minx:0 miny:0 maxx:26458 maxy:26458
[4]:
'./SS200000135TL_D1.tissue.high_res.gef'
create bgef file: ./SS200000135TL_D1.tissue.high_res.gef
bin 1 matrix: min_x=0 len_x=26459 min_y=0 len_y=26459 matrix_len=700078681
createRegionGef - elapsed time: 20739.21585 ms
Exporting a sub-imageΒΆ
We can also exporting the sub-image corresponding to the selected areas.
[5]:
ins.export_roi_image(
origin_file_path='../data/SS200000135TL_D1_regist.tif',
output_path='./SS200000135TL_D1_regist_selected.tif',
# drop=True
)
processing selected 3 area
Note
In polygon selection mode, both export_high_res_area and export_roi_image, you can also set the parameter drop to True to exclude all selected areas.
In box selection mode, whether to drop selected areas is determined by the options beside the export button.
Getting a sub-objectΒΆ
In polygon selection mode, after clicking a selected area, you can also use the export button to get a new sub-object of StereoExpData conrresponding to the clicked area, note that only one selected area can be exported as sub-object.
In box selection mode, after clicking the export button, all selected areas will be exported as a sub-object.
You can get the sub-object by property ins.selected_exp_data.
[6]:
selected_exp_data = ins.selected_exp_data
selected_exp_data
If you want to get a sub-object containing all selected areas in polygon selection mode, you can run the code as below after adding all selected area to queue by clicking add button, similar to export_high_res_area and export_roi_image, the parameter drop determines whether to drop the selected areas.
[7]:
selected_exp_data_all = ins.get_selected_areas(drop=False)
selected_exp_data_all
processing selected 3 area
[7]:
StereoExpData object with n_cells X n_genes = 29847 X 20193
bin_type: bins
bin_size: 20
offset_x = 0
offset_y = 0
cells: ['cell_name', 'total_counts', 'n_genes_by_counts', 'pct_counts_mt']
genes: ['gene_name', 'n_cells', 'n_counts', 'mean_umi']
result: []
Analysis on exported GEF fileΒΆ
Reading exported GEF file.
[8]:
# read exported BGEF file
path_high_res = './SS200000135TL_D1.tissue.high_res.gef'
data_high_res = st.io.read_gef(path_high_res, bin_type='bins', bin_size=100)
# read exported CGEF file
# path_high_res = './SS200000135TL_D1.cellbin.high_res.gef'
# data_high_res = st.io.read_gef(path_high_res, bin_type='cell_bins')
data_high_res.tl.cal_qc()
[2024-04-03 10:46:42][Stereo][251729][MainThread][140209575724864][reader][1090][INFO]: read_gef begin ...
[2024-04-03 10:46:43][Stereo][251729][MainThread][140209575724864][reader][1268][INFO]: the matrix has 1327 cells, and 20193 genes.
[2024-04-03 10:46:43][Stereo][251729][MainThread][140209575724864][reader][1269][INFO]: read_gef end.
[2024-04-03 10:46:43][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run cal_qc...
[2024-04-03 10:46:43][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: cal_qc end, consume time 0.0669s.
path:./SS200000135TL_D1.tissue.high_res.gef bin:100
generateBinInfo - 2.940000 cpu sec
Warning:your source data file only has gene column,but you specify return gene_names and gene ids which is invalid!so we will return an empty array which has same size with gene_names for the gene_ids
Check the exported area.
[9]:
ins_high_res = data_high_res.plt.interact_spatial_scatter(width=500, height=500, poly_select=True)
ins_high_res.show()
[9]:
Clustering
[10]:
data_high_res.tl.raw_checkpoint()
data_high_res.tl.normalize_total()
data_high_res.tl.log1p()
data_high_res.tl.pca(use_highly_genes=False, n_pcs=30, res_key='pca')
data_high_res.tl.neighbors(pca_res_key='pca', n_pcs=30, res_key='neighbors')
data_high_res.tl.leiden(neighbors_res_key='neighbors', res_key='leiden')
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run normalize_total...
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: normalize_total end, consume time 0.0337s.
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run log1p...
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: log1p end, consume time 0.0097s.
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run pca...
[2024-04-03 10:46:47][Stereo][251729][MainThread][140209575724864][dim_reduce][78][WARNING]: svd_solver: auto can not be used with sparse input.
Use "arpack" (the default) instead.
[2024-04-03 10:47:07][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: pca end, consume time 19.2166s.
[2024-04-03 10:47:07][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run neighbors...
[2024-04-03 10:47:48][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: neighbors end, consume time 41.3319s.
[2024-04-03 10:47:48][Stereo][251729][MainThread][140209575724864][st_pipeline][41][INFO]: start to run leiden...
[2024-04-03 10:47:48][Stereo][251729][MainThread][140209575724864][st_pipeline][44][INFO]: leiden end, consume time 0.5397s.
[11]:
data_high_res.plt.cluster_scatter(res_key='leiden')
[11]: