rnalysis.fastq.trim_adapters_single_end
- rnalysis.fastq.trim_adapters_single_end(fastq_folder: Union[str, Path], output_folder: Union[str, Path], three_prime_adapters: Union[None, str, List[str]], five_prime_adapters: Union[None, str, List[str]] = None, any_position_adapters: Union[None, str, List[str]] = None, new_sample_names: Union[List[str], Literal['auto']] = 'auto', quality_trimming: Optional[NonNegativeInt] = 20, trim_n: bool = True, minimum_read_length: NonNegativeInt = 10, maximum_read_length: Optional[PositiveInt] = None, discard_untrimmed_reads: bool = True, error_tolerance: Fraction = 0.1, minimum_overlap: NonNegativeInt = 3, allow_indels: bool = True, parallel: bool = True, gzip_output: bool = False)
Trim adapters from single-end reads using CutAdapt.
- Parameters
fastq_folder (str/Path to an existing folder) – Path to the folder containing your untrimmed FASTQ files
output_folder (str/Path to an existing folder) – Path to a folder in which the trimmed FASTQ files, as well as the log files, will be saved.
three_prime_adapters (str, list of str, or None) – the sequence of the adapter/adapters to trim from the 3’ end of the reads.
five_prime_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from the 5’ end of the reads.
any_position_adapters (str, list of str, or None (default=None)) – the sequence of the adapter/adapters to trim from either end (or from the middle) of the reads.
quality_trimming (int or None (default=20)) – if specified, trim low-quality 3’ end from the reads. Any bases with quality score below the specified value will be trimmed from the 3’ end of the read.
trim_n (bool (default=True)) – if True, removem flanking N bases from each read. For example, a read with the sequence ‘NNACGTACGTNNNN’ will be trimmed down to ‘ACGTACGT’. This occurs after adapter trimming.
minimum_read_length (int or None (default=10)) – if specified (default), discard processed reads that are shorter than minimum_read_length.
maximum_read_length (int or None (default=None)) – if specified, discard processed reads that are shorter than minimum_read_length.
discard_untrimmed_reads (bool (default=True)) – if True, discards reads in which no adapter was found.
error_tolerance (float between 0 and 1 (default=0.1)) – The level of error tolerance permitted when searching for adapters, with the lowest value being 0 (no error tolerance) and the maximum being 1 (100% error tolerance). Allowed errors are mismatches, insertions and deletions.
minimum_overlap (int >= 0 (default=3)) – the minimum number of nucleotides that must match exactly to the adapter sequence in order to trim it.
allow_indels (bool (default=True)) – if False, insertions and deletions in the adapter sequence are not allowed - only mismatches.
parallel (bool (default=True)) – if True, runs CutAdapt on all available cores in parallel. Otherwise, run CutAdapt on a single processor only.
gzip_output (bool (default=False)) – if True, gzips the output FASTQ files.
new_sample_names (list of str or 'auto' (default='auto')) – Give a new name to each trimmed sample (optional). If sample_names=’auto’, sample names will be given automatically. Otherwise, sample_names should be a list of new names, with the order of the names matching the alphabetical order of the input files.