rnalysis.fastq.trim_adapters_single_end

rnalysis.fastq.trim_adapters_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], three_prime_adapters: Union[None, str, List[str]], five_prime_adapters: Union[None, str, List[str]] = None, any_position_adapters: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)

Trim adapters from single-end reads using CutAdapt.

Parameters
  • fastq_folder (str/Path to an existing folder) – Path to the folder containing your untrimmed FASTQ files

  • output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.

  • three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.

  • five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.

  • any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.

  • quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.

  • trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.

  • minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.

  • maximum_read_length (int or None (default=None)) – if specified, discard processed reads that are shorter than minimum_read_length.

  • discard_untrimmed_reads (bool (default=True)) – if True, discards reads in which no adapter was found.

  • error_tolerance (float between 0 and 1 (default=0.1)) – The level of error tolerance permitted when searching for adapters, with the lowest value being 0 (no error tolerance) and the maximum being 1 (100% error tolerance). Allowed errors are mismatches, insertions and deletions.

  • minimum_overlap (int >= 0 (default=3)) – the minimum number of nucleotides that must match exactly to the adapter sequence in order to trim it.

  • allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.

  • parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.

  • gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.

  • new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.