rnalysis.filtering.CountFilter.filter_biotype_from_gtf

CountFilter.filter_biotype_from_gtf(gtf_path: Union[str, Path], biotype: Union[Literal['protein_coding', 'pseudogene', 'lincRNA', 'miRNA', 'ncRNA', 'piRNA', 'rRNA', 'snoRNA', 'snRNA', 'tRNA'], str, List[str]] = 'protein_coding', attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', opposite: bool = False, inplace: bool = True)

Filters out all features that do not match the indicated biotype/biotypes (for example: ‘protein_coding’, ‘ncRNA’, etc). The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.

Parameters
  • gtf_path (str or Path) – Path to your GTF (Gene transfer format) file. The file should match the type of gene names/IDs you use in your table, and should contain an attribute describing biotype.

  • biotype (str or list of strings) – the biotypes which will not be filtered out.

  • attribute_name (str (default='gene_biotype')) – name of the attribute in your GTF file that describes feature biotype.

  • feature_type ('gene' or 'transcript' (default='gene')) – determined whether the features/rows in your data table describe individual genes or transcripts.

  • opposite (bool) – If True, the output of the filtering will be the OPPOSITE of the specified (instead of filtering out X, the function will filter out anything BUT X). If False (default), the function will filter as expected.

  • inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.