stereo.algorithm.gen_ccc_micro_envs.GenCccMicroEnvs.main

GenCccMicroEnvs.main(cluster_res_key='cluster', n_boot=20, boot_prop=0.8, dimension=3, fill_rare=True, min_num=30, binsize=2, eps=1e-20, show_dividing_by_thresholds=True, method='split', threshold=None, output_path=None, res_key='ccc_micro_envs')[source]

Generate the micro-environment used for the CCC analysis.

This function should be ran twice because it includes two parts:

  1. Calculating how the diffrent clusters are divided into diffrent micro environments under diffrent thresholds. You can choose an appropriate threshold based on the divided result. In order to run this part, you need to set the parameter threshold to None. The output is a dataframe like below:

    threshold

    subgroup_result

    0.44298617727504136

    [{‘1’}, {‘2’}, {‘3’}]

    0.625776310617184

    [{‘1’, ‘2’}, {‘3’}]

    The column subgroup_result is a list of sets, each set contains some groups and represents a micro-environment.

  2. Generating the micro environments by setting an appropriate method and threshold based on the result of first part. On this part, all the parameters before method are ignored. The output is a dataframe like below:

    cell_type

    microenviroment

    NKcells_1

    microenv_0

    NKcells_0

    microenv_0

    Tcells

    microenv_1

    Myeloid

    microenv_2

Parameters:
  • cluster_res_key (str) – the key which specifies the clustering result in data.tl.result.

  • n_boot (int) – number of bootstrap samples, default = 100.

  • boot_prop (float) – proportion of each bootstrap sample, default = 0.8.

  • dimension (int) – 2 or 3.

  • fill_rare (bool) – bool, whether simulate cells for rare cell type when calculating kde.

  • min_num (int) – if a cell type has cells < min_num, it is considered rare.

  • binsize (float) – grid size used for kde, it is used for gridding the space. For example, a sample from square chip is gridded into mesh grids that have 100 intersections(determined by the given binsize), For each cell type, fit the KDE according to the coordinates of all cells of this type and calculate KDE values of the 100 intersections. Then KL divergence between each pair of cell types is calculated based on the calculated KDE values, which is then used to construct the microenvironments.

  • eps (float) – fill eps to zero kde to avoid inf KL divergence.

  • show_dividing_by_thresholds (bool) – whether to display the result while running the first part of this function.

  • method (str) – define micro environments using two methods: 1) minimum spanning tree, or 2) pruning the fully connected tree based on a given threshold of KL, then split the graph into multiple strongly connected component.

  • threshold (Optional[float]) – the threshold to divide micro environment. 1) set it to None to run the first part of this function. 1) set it to an appropriate value to run the second part.

  • output_path (Optional[str]) – the directory to save the result, if set it to None, the result is only stored in memory.

  • res_key (str) – set a key to store the result to data.tl.result, in second part, it must be set the same as first part.