rnalysis.filtering.CountFilter.filter_percentile

CountFilter.filter_percentile(percentile: Fraction, column: ColumnName, interpolate: 'nearest', 'higher', 'lower', 'midpoint', 'linear' = 'linear', opposite: bool = False, inplace: bool = True)

Removes all entries above the specified percentile in the specified column. For example, if the column were ‘pvalue’ and the percentile was 0.5, then all features whose pvalue is above the median pvalue will be filtered out.

Parameters:

percentile (float between 0 and 1) – The percentile that all features above it will be filtered out.
column (str) – Name of the DataFrame column according to which the filtering will be performed.
interpolate ('nearest', 'higher', 'lower', 'midpoint' or 'linear' (default='linear')) – interpolation method to use when the desired quantile lies between two data points.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.

Returns:

If inplace is False, returns a new and filtered instance of the Filter object.

Examples:

>>> from rnalysis import filtering
>>> d = filtering.Filter("tests/test_files/test_deseq.csv")
>>> # keep only the rows whose value in the column 'log2FoldChange' is below the 75th percentile
>>> d.filter_percentile(0.75,'log2FoldChange')
Filtered 7 features, leaving 21 of the original 28 features. Filtered inplace.

>>> d = filtering.Filter("tests/test_files/test_deseq.csv")
>>> # keep only the rows vulse value in the column 'log2FoldChange' is above the 25th percentile
>>> d.filter_percentile(0.25,'log2FoldChange',opposite=True)
Filtered 7 features, leaving 21 of the original 28 features. Filtered inplace.