rnalysis.fastq.trim_adapters_paired_end

rnalysis.fastq.trim_adapters_paired_end(r1_files: List[Union[str, Path]], r2_files: List[Union[str, Path]], output_folder: Union[str, Path], three_prime_adapters_r1: Union[None, str, List[str]], three_prime_adapters_r2: Union[None, str, List[str]], five_prime_adapters_r1: Union[None, str, List[str]] = None, five_prime_adapters_r2: Union[None, str, List[str]] = None, any_position_adapters_r1: Union[None, str, List[str]] = None, any_position_adapters_r2: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, pair_filter_if: Literal['both', 'any', 'first'] = 'both', error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False, return_new_filenames: bool = False)

Trim adapters from paired-end reads using CutAdapt.

Parameters
  • r1_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#1 files. The files should be sorted in tandem with r2_files, so that they line up to form pairs of R1 and R2 files.

  • r2_files (list of str/Path to existing FASTQ files) – a list of paths to your Read#2 files. The files should be sorted in tandem with r1_files, so that they line up to form pairs of R1 and R2 files.

  • output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.

  • three_prime_adapters_r1 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#1 files.

  • three_prime_adapters_r2 (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads in Read#2 files.

  • five_prime_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#1 files.

  • five_prime_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads in Read#2 files.

  • any_position_adapters_r1 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#1 files.

  • any_position_adapters_r2 (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or the middle) of the reads in Read#2 files.

  • quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.

  • trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.

  • minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.

  • maximum_read_length (int or None (default=None)) – if specified, discard processed reads that are shorter than minimum_read_length.

  • discard_untrimmed_reads (bool (default=True)) – if True, discards reads in which no adapter was found.

  • pair_filter_if ('both', 'any', or 'first' (default='both')) – Cutadapt always discards both reads of a pair if it determines that the pair should be discarded. This parameter determines how to combine the filters for Read#1 and Read#2 into a single decision about the read pair. When the value is ‘both’, you require that filtering criteria must apply to both reads in order for a read pair to be discarded. When the value is ‘any’, you require that at least one of the reads (R1 or R2) fulfills the filtering criterion in order to discard them. When the value is ‘first’, only the first read in each pair determines whether to discard the pair or not.

  • error_tolerance (float between 0 and 1 (default=0.1)) – The level of error tolerance permitted when searching for adapters, with the lowest value being 0 (no error tolerance) and the maximum being 1 (100% error tolerance). Allowed errors are mismatches, insertions and deletions.

  • minimum_overlap (int >= 0 (default=3)) – the minimum number of nucleotides that must match exactly to the adapter sequence in order to trim it.

  • allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.

  • parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.

  • gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.

  • new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the order of the file pairs.