rnalysis.filtering.CountFilter.normalize_tmm

CountFilter.normalize_tmm(log_ratio_trim: float = 0.3, sum_trim: float = 0.05, a_cutoff: float | None = -10000000000, ref_column: Literal['auto'] | ColumnName = 'auto', inplace: bool = True, return_scaling_factors: bool = False)

Normalizes the count matrix using the ‘trimmed mean of M values’ (TMM) method (Robinson and Oshlack 2010). This is the default normalization method used by R’s edgeR. To calculate the Trimmed Mean of M Values scaling factors, you first calculate the M-values of each gene between each sample and the reference sample (log2 of each sample Minus log2 of the reference sample), and the A-values of each gene between each sample and the reference sample (log2 of each sample Added to log2 of the reference sample). You then trim out genes with extreme values that are likely to be differentially expressed or non-indicative, by trimming the top and bottom X% of M-values, the top and bottom Y% of A-values, all A-values which are smaller than the specified cutuff, and all genes with 0 reads (to avoid log2 values of inf or -inf). Next, a weighted mean is calculated on the filtered M-values, with the weights being an inverse of an approximation of variance of each gene, which gives out the scaling factors for each sample. Finally, the scaling factors are adjusted, for symmetry, so that they multiply to 1.

Parameters:

log_ratio_trim (float between 0 and 0.5 (default=0.3)) – the fraction of M-values that should be trimmed from each direction (top and bottom X%).
sum_trim (float between 0 and 0.5 (default=0.05)) – the fraction of A-values that should be trimmed from each direction (top and bottom Y%).
a_cutoff (float or None (default = -1e10)) – a lower bound on the A-values that should be included in the trimmed mean. If set to None, no lower bound will be used.
ref_column (name of a column or 'auto' (default='auto')) – the column to be used as reference for normalization. If ‘auto’, then the reference column will be chosen automatically to be the column whose upper quartile is closest to the mean upper quartile.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current CountFilter object. If False, the function will return a new CountFilter instance and the current instance will not be affected.
return_scaling_factors (bool (default=False)) – if True, return a DataFrame containing the calculated scaling factors.

Returns:

If inplace is False, returns a new instance of the Filter object.

Examples:

>>> from rnalysis import filtering
>>> c = filtering.CountFilter("tests/test_files/counted.csv")
>>> c.normalize_tmm()

Normalized 22 features. Normalized inplace.