rnalysis.filtering.Filter.filter_biotype_from_ref_table

Filter.filter_biotype_from_ref_table(biotype: Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'] | str | List[str] = 'protein_coding', ref: str | Path | Literal['predefined'] = 'predefined', opposite: bool = False, inplace: bool = True)

Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a Biotype Reference Table supplied by the user.

Parameters:

biotype (string or list of strings) – the biotypes which will not be filtered out.
ref – Name of the biotype reference file used to determine biotypes. Default is the path defined by the user in the settings.yaml file.
opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.
inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.

Examples:

>>> from rnalysis import filtering
>>> counts = filtering.Filter('tests/test_files/counted.csv')
>>> # keep only rows whose biotype is 'protein_coding'
>>> counts.filter_biotype_from_ref_table('protein_coding',ref='tests/biotype_ref_table_for_tests.csv')
Filtered 9 features, leaving 13 of the original 22 features. Filtered inplace.

>>> counts = filtering.Filter('tests/test_files/counted.csv')
>>> # keep only rows whose biotype is 'protein_coding' or 'pseudogene'
>>> counts.filter_biotype_from_ref_table(['protein_coding','pseudogene'],ref='tests/biotype_ref_table_for_tests.csv')
Filtered 0 features, leaving 22 of the original 22 features. Filtered inplace.