rnalysis.fastq.trim_adapters_paired_end

rnalysis.fastq.trim_adapters_paired_end(r1_files: List[str | Path], r2_files: List[str | Path], output_folder: str | Path, three_prime_adapters_r1: None | str | List[str], three_prime_adapters_r2: None | str | List[str], five_prime_adapters_r1: None | str | List[str] = None, five_prime_adapters_r2: None | str | List[str] = None, any_position_adapters_r1: None | str | List[str] = None, any_position_adapters_r2: None | str | List[str] = None, new_sample_names: List[str] | Literal['auto'] = 'auto', quality_trimming: NonNegativeInt | None = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: PositiveInt | None = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False)

Trim adapters from paired-end reads using CutAdapt.

Parameters:
  • r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.

  • r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.

  • output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.

  • three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.

  • three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.

  • five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.

  • five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.

  • any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.

  • any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.

  • quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.

  • trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence β€˜NNACGTACGTNNNN’ will be trimmed down to β€˜ACGTACGT’. This occurs after adapter trimming.

  • minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.

  • maximum_read_length (int or None (default=None)) – if specified, discard processed reads that are shorter than minimum_read_length.

  • discard_untrimmed_reads (bool (default=True)) – if True, discards reads in which no adapter was found.

  • pair_filter_if ('both', 'any', or 'first' (default='both')) – Cutadapt always discards both reads of a pair if it determines that the pair should be discarded. This parameter determines how to combine the filters for Read#1 and Read#2 into a single decision about the read pair. When the value is β€˜both’, you require that filtering criteria must apply to both reads in order for a read pair to be discarded. When the value is β€˜any’, you require that at least one of the reads (R1 or R2) fulfills the filtering criterion in order to discard them. When the value is β€˜first’, only the first read in each pair determines whether to discard the pair or not.

  • error_tolerance (float between 0 and 1 (default=0.1)) – The level of error tolerance permitted when searching for adapters, with the lowest value being 0 (no error tolerance) and the maximum being 1 (100% error tolerance). Allowed errors are mismatches, insertions and deletions.

  • minimum_overlap (int >= 0 (default=3)) – the minimum number of nucleotides that must match exactly to the adapter sequence in order to trim it.

  • allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.

  • parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.

  • gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.

  • new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.