rnalysis.filtering.DESeqFilter

class rnalysis.filtering.DESeqFilter(fname: str | Path | tuple, drop_columns: str | List[str] = None, log2fc_col: str | Literal['log2FoldChange', 'logFC'] = 'log2FoldChange', padj_col: str | Literal['padj', 'adj.P.Val'] = 'padj', pval_col: str | Literal['pvalue', 'P.Value'] = 'pvalue', suppress_warnings: bool = False)

A class that receives a DESeq output file and can filter it according to various characteristics.

Attributes

df: pandas DataFrame: A DataFrame that contains the DESeq output file contents. The DataFrame is modified upon usage of filter operations.
shape: tuple (rows, columns): The dimensions of df.
columns: list: The columns of df.
fname: pathlib.Path: The path and filename for the purpose of saving df as a csv file. Updates automatically when filter operations are applied.
index_set: set: All of the indices in the current DataFrame (which were not removed by previously used filter methods) as a set.
index_string: string: A string of all feature indices in the current DataFrame separated by newline.

__init__(fname: str | Path | tuple, drop_columns: str | List[str] = None, log2fc_col: str | Literal['log2FoldChange', 'logFC'] = 'log2FoldChange', padj_col: str | Literal['padj', 'adj.P.Val'] = 'padj', pval_col: str | Literal['pvalue', 'P.Value'] = 'pvalue', suppress_warnings: bool = False)

Load a differential expression table. A valid differential expression table should have a column containing log2(fold change) values for each gene, and another column containing adjusted p-values for each gene.

Parameters:

fname (Union[str, Path]) – full path/filename of the .csv file to be loaded into the Filter object
drop_columns (str, list of str, or None (default=None)) – if a string or list of strings are specified, the columns of the same name/s will be dropped from the loaded table.
log2fc_col (str (default='Log2FoldChange')) – name of the table column containing log2(fold change) values.
padj_col (str (default='padj')) – name of the table column containing adjusted p-values.
suppress_warnings (bool (default=False)) – if True, RNAlysis will not issue warnings about the loaded table’s structure or content.

`DESeqFilter.biotypes_from_gtf`(gtf_path[, ...])	Returns a DataFrame describing the biotypes in the table and their count.
`DESeqFilter.biotypes_from_ref_table`([...])	Returns a DataFrame describing the biotypes in the table and their count.
`DESeqFilter.concatenate`(other)
`DESeqFilter.describe`([percentiles])	Generate descriptive statistics that summarize the central tendency, dispersion and shape of the dataset's distribution, excluding NaN values.
`DESeqFilter.difference`(*others[, ...])	Keep only the features that exist in the first Filter object/set but NOT in the others.
`DESeqFilter.drop_columns`(columns[, inplace])	Drop specific columns from the table.
`DESeqFilter.filter_abs_log2_fold_change`([...])	Filters out all features whose absolute log2 fold change is below the indicated threshold.
`DESeqFilter.filter_biotype_from_gtf`(gtf_path)	Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).
`DESeqFilter.filter_biotype_from_ref_table`([...])	Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).
`DESeqFilter.filter_by_attribute`([...])	Filters features according to user-defined attributes from an Attribute Reference Table.
`DESeqFilter.filter_by_go_annotations`(go_ids)	Filters genes according to GO annotations, keeping only genes that are annotated with a specific GO term.
`DESeqFilter.filter_by_kegg_annotations`(kegg_ids)	Filters genes according to KEGG pathways, keeping only genes that belong to specific KEGG pathway.
`DESeqFilter.filter_by_row_name`(row_names[, ...])	Filter out specific rows from the table by their name (index).
`DESeqFilter.filter_duplicate_ids`([keep, ...])	Filter out rows with duplicate names/IDs (index).
`DESeqFilter.filter_fold_change_direction`([...])	Filters out features according to the direction in which they changed between the two conditions.
`DESeqFilter.filter_missing_values`([columns, ...])	Remove all rows whose values in the specified columns are missing (NaN).
`DESeqFilter.filter_percentile`(percentile, column)	Removes all entries above the specified percentile in the specified column.
`DESeqFilter.filter_significant`([alpha, ...])	Removes all features which did not change significantly, according to the provided alpha.
`DESeqFilter.filter_top_n`(by[, n, ascending, ...])	Sort the rows by the values of specified column or columns, then keep only the top 'n' rows.
`DESeqFilter.find_paralogs_ensembl`([...])	Find paralogs within the same species using the Ensembl database.
`DESeqFilter.find_paralogs_panther`([...])	Find paralogs within the same species using the PantherDB database.
`DESeqFilter.from_dataframe`(df, name[, ...])
`DESeqFilter.head`([n])	Return the first n rows of the Filter object.
`DESeqFilter.histogram`(column[, bins, ...])
`DESeqFilter.intersection`(*others[, ...])	Keep only the features that exist in ALL of the given Filter objects/sets.
`DESeqFilter.majority_vote_intersection`(*others)	Returns a set/string of the features that appear in at least (majority_threhold * 100)% of the given Filter objects/sets.
`DESeqFilter.map_orthologs_ensembl`(...[, ...])	Map genes to their nearest orthologs in a different species using the Ensembl database.
`DESeqFilter.map_orthologs_orthoinspector`(...)	Map genes to their nearest orthologs in a different species using the OrthoInspector database.
`DESeqFilter.map_orthologs_panther`(...[, ...])	Map genes to their nearest orthologs in a different species using the PantherDB database.
`DESeqFilter.map_orthologs_phylomedb`(...[, ...])	Map genes to their nearest orthologs in a different species using the PhylomeDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.
`DESeqFilter.number_filters`(column, operator, ...)	Applay a number filter (greater than, equal, lesser than) on a particular column in the Filter object.
`DESeqFilter.print_features`()	Print the feature indices in the Filter object, sorted by their current order in the FIlter object, and separated by newline.
`DESeqFilter.pval_histogram`([adjusted_pvals, ...])	Plots a histogram of the p-values in the DESeqFilter object.
`DESeqFilter.save_csv`([alt_filename])	Saves the current filtered data to a .csv file.
`DESeqFilter.save_parquet`([alt_filename])	Saves the current filtered data to a .parquet file.
`DESeqFilter.save_table`([suffix, alt_filename])	Save the current filtered data table.
`DESeqFilter.sort`(by[, ascending, ...])	Sort the rows by the values of specified column or columns.
`DESeqFilter.split_by_attribute`(attributes[, ref])	Splits the features in the Filter object into multiple Filter objects, each corresponding to one of the specified Attribute Reference Table attributes.
`DESeqFilter.split_by_percentile`(percentile, ...)	Splits the features in the Filter object into two non-overlapping Filter objects: one containing features below the specified percentile in the specfieid column, and the other containing features about the specified percentile in the specified column.
`DESeqFilter.split_fold_change_direction`()	Splits the features in the DESeqFilter object into two non-overlapping DESeqFilter objects, based on the direction of their log2foldchange.
`DESeqFilter.symmetric_difference`(other[, ...])	Returns a set/string of the WBGene indices that exist either in the first Filter object/set OR the second, but NOT in both (set symmetric difference).
`DESeqFilter.tail`([n])	Return the last n rows of the Filter object.
`DESeqFilter.text_filters`(column, operator, value)	Applay a text filter (equals, contains, starts with, ends with) on a particular column in the Filter object.
`DESeqFilter.transform`(function[, columns, ...])	Transform the values in the Filter object with the specified function.
`DESeqFilter.translate_gene_ids`(translate_to)	Translates gene names/IDs from one type to another.
`DESeqFilter.union`(*others[, return_type])	Returns a set/string of the union of features between multiple Filter objects/sets (the features that exist in at least one of the Filter objects/sets).
`DESeqFilter.volcano_plot`([alpha, ...])	Plots a volcano plot (log2(fold change) vs -log10(adj.