rnalysis.filtering.CountFilter.biotypes_from_gtf

CountFilter.biotypes_from_gtf(gtf_path: Union[str, Path], attribute_name: Union[Literal['biotype', 'gene_biotype', 'transcript_biotype', 'gene_type', 'transcript_type'], str] = 'gene_biotype', feature_type: Literal['gene', 'transcript'] = 'gene', long_format: bool = False) DataFrame

Returns a DataFrame describing the biotypes in the table and their count. The data about feature biotypes is drawn from a GTF (Gene transfer format) file supplied by the user.

Parameters
  • gtf_path (str or Path) – Path to your GTF (Gene transfer format) file. The file should match the type of gene names/IDs you use in your table, and should contain an attribute describing biotype.

  • attribute_name (str (default='gene_biotype')) – name of the attribute in your GTF file that describes feature biotype.

  • feature_type ('gene' or 'transcript' (default='gene')) – determined whether the features/rows in your data table describe individual genes or transcripts.

:param long_format:if True, returns a short-form DataFrame, which states the biotypes in the Filter object and their count. Otherwise, returns a long-form DataFrame, which also provides descriptive statistics of each column per biotype. :rtype: pandas.DataFrame :returns: a pandas DataFrame showing the number of values belonging to each biotype, as well as additional descriptive statistics of format==’long’.