rnalysis.filtering.CountFilter.normalize_to_tpm

CountFilter.normalize_to_tpm(gtf_file: str | Path, feature_type: Literal['gene', 'transcript'] = 'gene', method: Literal['mean', 'median', 'max', 'min', 'geometric_mean', 'merged_exons'] = 'mean', inplace: bool = True, return_scaling_factors: bool = False)

Normalizes the count matrix to Transcripts Per Million (TPM). First, normalizes each gene to Reads Per Kilobase (RPK) by dividing each gene in the count matrix by its length in Kbp (gene length / 1000). Then, divides each column in the RPK matrix by (total RPK in column)*10^-6. This calculation is similar to that of Reads Per Kilobase Million (RPKM), but in the opposite order: the “per million” normalization factors are calculated after normalizing to gene lengths, not before.

Parameters:

gtf_file – Path to a GTF/GFF3 annotation file. This file will be used to determine the length of each gene/transcript. The gene/transcript names in this annotation file should match the ones in count matrix. :type gtf_file: str or Path :param feature_type: the type of features in your count matrix. if feature_type is ‘transcript’, lengths will be calculated per-transcript, and the ‘method’ parameter is ignored. Otherwise, lengths will be aggregated per gene according to the method specified in the ‘method’ parameter. :type feature_type: ‘gene’ or ‘transcript’ (default=’gene’) :param method: if feature_type=’gene’, this determines the aggregation method to calculate gene lengths. ‘mean’, ‘median’, ‘min’, and ‘max’ will calculate the mean/median/min/max of all transcripts’ lengths of the given gene. ‘geometric_mean’ will calculate the goemetric mean of all transcripts’ lengths of the given gene. ‘merged_exons’ will calculate the total lengths of all exons of a gene across all of its transcripts, while counting overlapping exons/regions exactly once. :type method: ‘mean’, ‘median’, ‘min’, ‘max’, ‘geometric_mean’, or ‘merged_exons’ (deafult=’mean’) :type inplace: bool (default=True) :param inplace: If True (default), filtering will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected. :param return_scaling_factors: if True, return a DataFrame containing the calculated scaling factors. :type return_scaling_factors: bool (default=False) :return: If inplace is False, returns a new instance of the Filter object.