rnalysis.fastq.featurecounts_paired_end

rnalysis.fastq.featurecounts_paired_end(input_folder: Union[str, Path], output_folder: Union[str, Path], gtf_file: Union[str, Path], gtf_feature_type: str = 'exon', gtf_attr_name: str = 'gene_id', r_installation_folder: Union[str, Path, Literal['auto']] = 'auto', new_sample_names: Union[List[str], Literal['auto']] = 'auto', stranded: Literal['no', 'forward', 'reverse'] = 'no', min_mapping_quality: int = 0, count_multi_mapping_reads: bool = False, count_multi_overlapping_reads: bool = False, ignore_secondary: bool = True, count_fractionally: bool = False, is_long_read: bool = False, require_both_mapped: bool = True, count_chimeric_fragments: bool = False, min_fragment_length: NonNegativeInt = 50, max_fragment_length: Optional[PositiveInt] = 600, report_read_assignment: Optional[Literal['bam', 'sam', 'core']] = None, threads: PositiveInt = 1) Tuple[CountFilter, DataFrame, DataFrame]

Assign mapped paired-end sequencing reads to specified genomic features using RSubread featureCounts. Returns a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.

Parameters
  • input_folder (str or Path) – Path to the folder containing the SAM/BAM files you want to quantfy.

  • output_folder (str or Path) – Path to a folder in which the quantified results, as well as the log files and R script used to generate them, will be saved.

  • gtf_file (str or Path) – Path to a GTF annotation file. This file will be used to map reads to features. The chromosome names in the GTF files should match the ones in the index file with which you aligned the reads.

  • gtf_feature_type (str (default='exon')) – the feature type or types used to select rows in the GTF annotation which will be used for read summarization.

  • gtf_attr_name (str (default='gene_id')) – the attribute type in the GTF annotation which will be used to group features (eg. exons) into meta-features (eg. genes).

  • r_installation_folder (str, Path, or 'auto' (default='auto')) – Path to the installation folder of R. For example: ‘C:/Program Files/R/R-4.2.1’

  • new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each quantified sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.

  • stranded ('no', 'forward', 'reverse' (default='no')) – Indicates the strandedness of the data. ‘no’ indicates the data is not stranded. ‘forward’ indicates the data is stranded, where the first read in the pair aligns to the forward strand of a transcript. ‘reverse’ indicates the data is stranded, where the first read in the pair aligns to the reverse strand of a transcript.

  • min_mapping_quality (int >= 0 (default=0)) – the minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criteria.

  • count_multi_mapping_reads (bool (default=True)) – indicating if multi-mapping reads/fragments should be counted (‘NH’ tag in BAM/SAM files).

  • count_multi_overlapping_reads (bool (default=False)) – indicating if a read is allowed to be assigned to more than one feature (or meta-feature) if it is found to overlap with more than one feature (or meta-feature).

  • ignore_secondary (bool (default=True)) – indicating if only primary alignments should be counted. Primary and secondary alignments are identified using bit 0x100 in the Flag field of SAM/BAM files. If True, all primary alignments in a dataset will be counted no matter they are from multi-mapping reads or not.

  • count_fractionally (bool (default=False)) – indicating if fractional counts are produced for multi-mapping reads and/or multi-overlapping reads.

  • is_long_read (bool (default=False)) – indicating if input data contain long reads. This option should be set to True if counting Nanopore or PacBio long reads.

  • report_read_assignment ('bam', 'sam', 'core', or None (default=None)) – if not None, featureCounts will generated detailed read assignment results for each read pair. These results can be saved in one of three formats: BAM, SAM, or CORE.

  • require_both_mapped (bool (default=True)) – indicating if both ends from the same fragment are required to be successfully aligned before the fragment can be assigned to a feature or meta-feature.

  • count_chimeric_fragments (bool(default=False)) – indicating whether a chimeric fragment, which has its two reads mapped to different chromosomes, should be counted or not.

  • min_fragment_length (int >= 0 (default=50)) – The minimum fragment length for valid paired-end alignments. Read pairs with shorter fragments will not be counted.

  • max_fragment_length (int > 0 or None (default=600)) – The maximum fragment length for valid paired-end alignments. Read pairs with longer fragments will not be counted.

  • threads (int > 0 (default=1)) – number of threads to run bowtie2-build on. More threads will generally make index building faster.

Returns

a count matrix (CountFilter) containing feature counts for all input files, a DataFrame summarizing the features reads were aligned to, and a DataFrame summarizing the alignment statistics.

Return type

(filtering.CountFilter, pd.DataFrame, pd.DataFrame)