rnalysis.filtering.CountFilter.normalize_to_tpmď
- CountFilter.normalize_to_tpm(gtf_file: str | Path, feature_type: Literal['gene', 'transcript'] = 'gene', method: Literal['mean', 'median', 'max', 'min', 'geometric_mean', 'merged_exons'] = 'mean', inplace: bool = True, return_scaling_factors: bool = False)ď
Normalizes the count matrix to Transcripts Per Million (TPM). First, normalizes each gene to Reads Per Kilobase (RPK) by dividing each gene in the count matrix by its length in Kbp (gene length / 1000). Then, divides each column in the RPK matrix by (total RPK in column)*10^-6. This calculation is similar to that of Reads Per Kilobase Million (RPKM), but in the opposite order: the âper millionâ normalization factors are calculated after normalizing to gene lengths, not before.
- Parameters:
gtf_file â Path to a GTF/GFF3 annotation file. This file will be used to determine the length of each gene/transcript. The gene/transcript names in this annotation file should match the ones in count matrix. :type gtf_file: str or Path :param feature_type: the type of features in your count matrix. if feature_type is âtranscriptâ, lengths will be calculated per-transcript, and the âmethodâ parameter is ignored. Otherwise, lengths will be aggregated per gene according to the method specified in the âmethodâ parameter. :type feature_type: âgeneâ or âtranscriptâ (default=âgeneâ) :param method: if feature_type=âgeneâ, this determines the aggregation method to calculate gene lengths. âmeanâ, âmedianâ, âminâ, and âmaxâ will calculate the mean/median/min/max of all transcriptsâ lengths of the given gene. âgeometric_meanâ will calculate the goemetric mean of all transcriptsâ lengths of the given gene. âmerged_exonsâ will calculate the total lengths of all exons of a gene across all of its transcripts, while counting overlapping exons/regions exactly once. :type method: âmeanâ, âmedianâ, âminâ, âmaxâ, âgeometric_meanâ, or âmerged_exonsâ (deafult=âmeanâ) :type inplace: bool (default=True) :param inplace: If True (default), filtering will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected. :param return_scaling_factors: if True, return a DataFrame containing the calculated scaling factors. :type return_scaling_factors: bool (default=False) :return: If inplace is False, returns a new instance of the Filter object.