rnalysis.filtering.CountFilter.normalize_to_rpkm

CountFilter.normalize_to_rpkm(gtf_file: str | Path, feature_type: Literal['gene', 'transcript'] = 'gene', method: Literal['mean', 'median', 'max', 'min', 'geometric_mean', 'merged_exons'] = 'mean', inplace: bool = True, return_scaling_factors: bool = False)

Normalizes the count matrix to Reads Per Kilobase Million (RPKM). Divides each column in the count matrix by (total reads)*(gene length / 1000)*10^-6.

Parameters:

gtf_file – Path to a GTF/GFF3 annotation file. This file will be used to determine the length of each gene/transcript. The gene/transcript names in this annotation file should match the ones in count matrix. :type gtf_file: str or Path :param feature_type: the type of features in your count matrix. if feature_type is ‘transcript’, lengths will be calculated per-transcript, and the ‘method’ parameter is ignored. Otherwise, lengths will be aggregated per gene according to the method specified in the ‘method’ parameter. :type feature_type: ‘gene’ or ‘transcript’ (default=’gene’) :param method: if feature_type=’gene’, this determines the aggregation method to calculate gene lengths. ‘mean’, ‘median’, ‘min’, and ‘max’ will calculate the mean/median/min/max of all transcripts’ lengths of the given gene. ‘geometric_mean’ will calculate the goemetric mean of all transcripts’ lengths of the given gene. ‘merged_exons’ will calculate the total lengths of all exons of a gene across all of its transcripts, while counting overlapping exons/regions exactly once. :type method: ‘mean’, ‘median’, ‘min’, ‘max’, ‘geometric_mean’, or ‘merged_exons’ (deafult=’mean’) :type inplace: bool (default=True) :param inplace: If True (default), filtering will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected. :param return_scaling_factors: if True, return a DataFrame containing the calculated scaling factors. :type return_scaling_factors: bool (default=False) :return: If inplace is False, returns a new instance of the Filter object.