rnalysis.filtering.CountFilter.from_folder_htseqcount

classmethod CountFilter.from_folder_htseqcount(folder_path: str, norm_to_rpm: bool = False, save_csv: bool = False, counted_fname: str = None, uncounted_fname: str = None, input_format: str = '.txt') CountFilter

Iterates over HTSeq count .txt files in a given folder and combines them into a single CountFilter table. Can also save the count data table and the uncounted data table to .csv files, and normalize the CountFilter table to reads per million (RPM). Note that the saved data will always be count data, and not normalized data, regardless if the CountFilter table was normalized or not.

Parameters:
  • folder_path – str or pathlib.Path. Full path of the folder that contains individual htcount .txt files.

  • norm_to_rpm – bool. If True, the CountFilter table will be automatically normalized to reads per million (RPM). If False (defualt), the CountFilter object will not be normalized, and will instead contain absolute count data (as in the original htcount .txt files). Note that if save_csv is True, the saved .csv fill will contain ABSOLUTE COUNT DATA, as in the original htcount .txt files, and NOT normalized data.

  • save_csv – bool. If True, the joint DataFrame of count data and uncounted data will be saved to two separate .csv files. The files will be saved in ‘folder_path’, and named according to the parameters ‘counted_fname’ for the count data, and ‘uncounted_fname’ for the uncounted data (unaligned, alignment not unique, etc).

  • counted_fname – str. Name under which to save the combined count data table. Does not need to include the ‘.csv’ suffix.

  • uncounted_fname – counted_fname: str. Name under which to save the combined uncounted data. Does not need to include the ‘.csv’ suffix.

  • input_format – the file format of the input files. Default is ‘.txt’.

Returns:

an CountFilter object containing the combined count data from all individual htcount .txt files in the specified folder.

Examples:
>>> from rnalysis import filtering
>>> c = filtering.CountFilter.from_folder_htseqcount('tests/test_files/test_count_from_folder')
>>> c = filtering.CountFilter.from_folder_htseqcount('tests/test_files/test_count_from_folder', norm_to_rpm=True) # This will also normalize the CountFilter to reads-per-million (RPM).

Normalized 10 features. Normalized inplace.

>>> c = filtering.CountFilter.from_folder_htseqcount('tests/test_files/test_count_from_folder', save_csv=True, counted_fname='name_for_reads_csv_file', uncounted_fname='name_for_uncounted_reads_csv_file') # This will also save the counted reads and uncounted reads as separate .csv files