
class rnalysis.filtering.DESeqFilter(fname: Union[str, Path, tuple], drop_columns: Union[str, List[str]] = None, log2fc_col: str = 'log2FoldChange', padj_col: str = 'padj', suppress_warnings: bool = False)

A class that receives a DESeq output file and can filter it according to various characteristics.


df: pandas DataFrame

A DataFrame that contains the DESeq output file contents. The DataFrame is modified upon usage of filter operations.

shape: tuple (rows, columns)

The dimensions of df.

columns: list

The columns of df.

fname: pathlib.Path

The path and filename for the purpose of saving df as a csv file. Updates automatically when filter operations are applied.

index_set: set

All of the indices in the current DataFrame (which were not removed by previously used filter methods) as a set.

index_string: string

A string of all feature indices in the current DataFrame separated by newline.

__init__(fname: Union[str, Path, tuple], drop_columns: Union[str, List[str]] = None, log2fc_col: str = 'log2FoldChange', padj_col: str = 'padj', suppress_warnings: bool = False)

Load a differential expression table. A valid differential expression table should have a column containing log2(fold change) values for each gene, and another column containing adjusted p-values for each gene.

  • fname (Union[str, Path]) – full path/filename of the .csv file to be loaded into the Filter object

  • drop_columns (str, list of str, or None (default=None)) – if a string or list of strings are specified, the columns of the same name/s will be dropped from the loaded table.

  • log2fc_col (str (default='Log2FoldChange')) – name of the table column containing log2(fold change) values.

  • padj_col (str (default='padj')) – name of the table column containing adjusted p-values.

  • suppress_warnings (bool (default=False)) – if True, RNAlysis will not issue warnings about the loaded table’s structure or content.

DESeqFilter.biotypes_from_gtf(gtf_path[, ...])

Returns a DataFrame describing the biotypes in the table and their count.


Returns a DataFrame describing the biotypes in the table and their count.


Generate descriptive statistics that summarize the central tendency, dispersion and shape of the dataset’s distribution, excluding NaN values.

DESeqFilter.difference(*others[, ...])

Keep only the features that exist in the first Filter object/set but NOT in the others.

DESeqFilter.drop_columns(columns[, inplace])

Drop specific columns from the table.


Filters out all features whose absolute log2 fold change is below the indicated threshold.


Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).


Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).


Filters features according to user-defined attributes from an Attribute Reference Table.


Filters genes according to GO annotations, keeping only genes that are annotated with a specific GO term.


Filters genes according to KEGG pathways, keeping only genes that belong to specific KEGG pathway.

DESeqFilter.filter_by_row_name(row_names[, ...])

Filter out specific rows from the table by their name (index).

DESeqFilter.filter_duplicate_ids([keep, ...])

Filter out rows with duplicate names/IDs (index).


Filters out features according to the direction in which they changed between the two conditions.

DESeqFilter.filter_missing_values([columns, ...])

Remove all rows whose values in the specified columns are missing (NaN).

DESeqFilter.filter_percentile(percentile, column)

Removes all entries above the specified percentile in the specified column.

DESeqFilter.filter_significant([alpha, ...])

Removes all features which did not change significantly, according to the provided alpha.

DESeqFilter.filter_top_n(by[, n, ascending, ...])

Sort the rows by the values of specified column or columns, then keep only the top 'n' rows.


Find paralogs within the same species using the Ensembl database.


Find paralogs within the same species using the PantherDB database.

DESeqFilter.from_dataframe(df, name[, ...])


Return the first n rows of the Filter object.

DESeqFilter.intersection(*others[, ...])

Keep only the features that exist in ALL of the given Filter objects/sets.


Returns a set/string of the features that appear in at least (majority_threhold * 100)% of the given Filter objects/sets.

DESeqFilter.map_orthologs_ensembl(...[, ...])

Map genes to their nearest orthologs in a different species using the Ensembl database.


Map genes to their nearest orthologs in a different species using the OrthoInspector database.

DESeqFilter.map_orthologs_panther(...[, ...])

Map genes to their nearest orthologs in a different species using the PantherDB database.

DESeqFilter.map_orthologs_phylomedb(...[, ...])

Map genes to their nearest orthologs in a different species using the PhylomeDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.

DESeqFilter.number_filters(column, operator, ...)

Applay a number filter (greater than, equal, lesser than) on a particular column in the Filter object.


Print the feature indices in the Filter object, sorted by their current order in the FIlter object, and separated by newline.


Saves the current filtered data to a .csv file.


Saves the current filtered data to a .parquet file.

DESeqFilter.save_table([suffix, alt_filename])

Save the current filtered data table.

DESeqFilter.sort(by[, ascending, ...])

Sort the rows by the values of specified column or columns.

DESeqFilter.split_by_attribute(attributes[, ref])

Splits the features in the Filter object into multiple Filter objects, each corresponding to one of the specified Attribute Reference Table attributes.

DESeqFilter.split_by_percentile(percentile, ...)

Splits the features in the Filter object into two non-overlapping Filter objects: one containing features below the specified percentile in the specfieid column, and the other containing features about the specified percentile in the specified column.


Splits the features in the DESeqFilter object into two non-overlapping DESeqFilter objects, based on the direction of their log2foldchange.

DESeqFilter.symmetric_difference(other[, ...])

Returns a set/string of the WBGene indices that exist either in the first Filter object/set OR the second, but NOT in both (set symmetric difference).


Return the last n rows of the Filter object.

DESeqFilter.text_filters(column, operator, value)

Applay a text filter (equals, contains, starts with, ends with) on a particular column in the Filter object.

DESeqFilter.transform(function[, columns, ...])

Transform the values in the Filter object with the specified function.


Translates gene names/IDs from one type to another.

DESeqFilter.union(*others[, return_type])

Returns a set/string of the union of features between multiple Filter objects/sets (the features that exist in at least one of the Filter objects/sets).

DESeqFilter.volcano_plot([alpha, ...])

Plots a volcano plot (log2(fold change) vs -log10(adj.