rnalysis.fastq.featurecounts_single_end

rnalysis.fastq.featurecounts_single_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]

Assign mapped single-end sequencing reads to specified genomic features using RSubread featureCounts. Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.

Parameters
  • input_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to quantfy.

  • output_folder (str or Path) – Path to a folder in which the quantified results, as well as the log files, will be saved.

  • gtf_file (str or Path) – Path to a GTF annotation file. This file will be used to map reads to features. The chromosome names in the GTF files should match the ones in the index file with which you aligned the reads.

  • gtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.

  • gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).

  • r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’

  • new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the alphabetical order of the files in the directory.

  • stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the reads align to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the reads align to the reverse strand of a transcript.

  • min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted.

  • count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).

  • count_multi_overlapping_reads (bool (default=False)) – indicating if a read is allowed to be assigned to more than one feature (or meta-feature) if it is found to overlap with more than one feature (or meta-feature).

  • ignore_secondary (bool (default=True)) – indicating if only primary alignments should be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. If True, all primary alignments in a dataset will be counted no matter they are from multi-mapping reads or not.

  • count_fractionally (bool (default=False)) – indicating if fractional counts are produced for multi-mapping reads and/or multi-overlapping reads.

  • is_long_read (bool (default=False)) – indicating if input data contain long reads. This option should be set to True if counting Nanopore or PacBio long reads.

  • report_read_assignment ('bam', 'sam', 'core', or None (default=None)) – if not None, featureCounts will generated detailed read assignment results for each read. These results can be saved in one of three formats: BAM, SAM, or CORE.

  • threads (int > 0 (default=1)) – number of threads to run bowtie2-build on. More threads will generally make index building faster.

Returns

a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.

Return type

(filtering.CountFilter, pd.DataFrame, pd.DataFrame)