rnalysis.filtering.FoldChangeFilter

class rnalysis.filtering.FoldChangeFilter(fname: Union[str, Path, tuple], numerator_name: str, denominator_name: str, suppress_warnings: bool = False)
A class that contains a single column, representing the gene-specific fold change between two conditions.

this class does not support ‘inf’ and ‘0’ values, and importing a file with such values could lead to incorrect filtering and statistical analyses.

Attributes

df: pandas Series

A Series that contains the fold change values. The Series is modified upon usage of filter operations.

shape: tuple (rows, columns)

The dimensions of df.

columns: list

The columns of df.

fname: pathlib.Path

The path and filename for the purpose of saving df as a csv file. Updates automatically when filter operations are applied.

index_set: set

All of the indices in the current DataFrame (which were not removed by previously used filter methods) as a set.

index_string: string

A string of all feature indices in the current DataFrame separated by newline.

numerator: str

Name of the numerator used to calculate the fold change.

denominator: str

Name of the denominator used to calculate the fold change.

__init__(fname: Union[str, Path, tuple], numerator_name: str, denominator_name: str, suppress_warnings: bool = False)

Load a fold-change table. Valid fold-change tables should contain exactly two columns: the first column containing gene names/indices, and the second column containing log2(fold change) values.

Parameters
  • fname (Union[str, Path]) – full path/filename of the .csv file to be loaded into the Filter object

  • numerator_name (str) – name of the numerator condition in the fold-change table

  • denominator_name (str) – name of the denominator condition in the fold-change table

  • suppress_warnings (bool (default=False)) – if True, RNAlysis will not issue warnings about the loaded table’s structure or content.

FoldChangeFilter.biotypes_from_gtf(gtf_path)

Returns a DataFrame describing the biotypes in the table and their count.

FoldChangeFilter.biotypes_from_ref_table([...])

Returns a DataFrame describing the biotypes in the table and their count.

FoldChangeFilter.describe([percentiles])

Generate descriptive statistics that summarize the central tendency, dispersion and shape of the dataset’s distribution, excluding NaN values.

FoldChangeFilter.difference(*others[, ...])

Keep only the features that exist in the first Filter object/set but NOT in the others.

FoldChangeFilter.drop_columns(columns[, inplace])

Drop specific columns from the table.

FoldChangeFilter.filter_abs_log2_fold_change([...])

Filters out all features whose absolute log2 fold change is below the indicated threshold.

FoldChangeFilter.filter_biotype_from_gtf(...)

Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).

FoldChangeFilter.filter_biotype_from_ref_table([...])

Filters out all features that do not match the indicated biotype/biotypes (for example: 'protein_coding', 'ncRNA', etc).

FoldChangeFilter.filter_by_attribute([...])

Filters features according to user-defined attributes from an Attribute Reference Table.

FoldChangeFilter.filter_by_go_annotations(go_ids)

Filters genes according to GO annotations, keeping only genes that are annotated with a specific GO term.

FoldChangeFilter.filter_by_kegg_annotations(...)

Filters genes according to KEGG pathways, keeping only genes that belong to specific KEGG pathway.

FoldChangeFilter.filter_by_row_name(row_names)

Filter out specific rows from the table by their name (index).

FoldChangeFilter.filter_duplicate_ids([...])

Filter out rows with duplicate names/IDs (index).

FoldChangeFilter.filter_fold_change_direction([...])

Filters out features according to the direction in which they changed between the two conditions.

FoldChangeFilter.filter_missing_values([...])

Remove all rows with missing values.

FoldChangeFilter.filter_percentile(percentile)

Removes all entries above the specified percentile.

FoldChangeFilter.filter_top_n(by[, n, ...])

Sort the rows by the values of specified column or columns, then keep only the top 'n' rows.

FoldChangeFilter.find_paralogs_ensembl([...])

Find paralogs within the same species using the Ensembl database.

FoldChangeFilter.find_paralogs_panther([...])

Find paralogs within the same species using the PantherDB database.

FoldChangeFilter.from_dataframe(df, name[, ...])

FoldChangeFilter.head([n])

Return the first n rows of the Filter object.

FoldChangeFilter.intersection(*others[, ...])

Keep only the features that exist in ALL of the given Filter objects/sets.

FoldChangeFilter.majority_vote_intersection(*others)

Returns a set/string of the features that appear in at least (majority_threhold * 100)% of the given Filter objects/sets.

FoldChangeFilter.map_orthologs_ensembl(...)

Map genes to their nearest orthologs in a different species using the Ensembl database.

FoldChangeFilter.map_orthologs_orthoinspector(...)

Map genes to their nearest orthologs in a different species using the OrthoInspector database.

FoldChangeFilter.map_orthologs_panther(...)

Map genes to their nearest orthologs in a different species using the PantherDB database.

FoldChangeFilter.map_orthologs_phylomedb(...)

Map genes to their nearest orthologs in a different species using the PhylomeDB database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.

FoldChangeFilter.number_filters(column, ...)

Applay a number filter (greater than, equal, lesser than) on a particular column in the Filter object.

FoldChangeFilter.print_features()

Print the feature indices in the Filter object, sorted by their current order in the FIlter object, and separated by newline.

FoldChangeFilter.randomization_test(ref[, ...])

Perform a randomization test to examine whether the fold change of a group of specific genomic features is significantly different than the fold change of a background set of genomic features.

FoldChangeFilter.save_csv([alt_filename])

Saves the current filtered data to a .csv file.

FoldChangeFilter.save_parquet([alt_filename])

Saves the current filtered data to a .parquet file.

FoldChangeFilter.save_table([suffix, ...])

Save the current filtered data table.

FoldChangeFilter.sort(by[, ascending, ...])

Sort the rows by the values of specified column or columns.

FoldChangeFilter.split_by_attribute(attributes)

Splits the features in the Filter object into multiple Filter objects, each corresponding to one of the specified Attribute Reference Table attributes.

FoldChangeFilter.split_by_percentile(percentile)

Splits the features in the table into two non-overlapping tables: one containing features below the specified percentile, and the other containing features about the specified percentile.

FoldChangeFilter.split_fold_change_direction()

Splits the features in the FoldChangeFilter object into two non-overlapping FoldChangeFilter objects, based on the direction of their log2(fold change).

FoldChangeFilter.symmetric_difference(other)

Returns a set/string of the WBGene indices that exist either in the first Filter object/set OR the second, but NOT in both (set symmetric difference).

FoldChangeFilter.tail([n])

Return the last n rows of the Filter object.

FoldChangeFilter.text_filters(column, ...[, ...])

Applay a text filter (equals, contains, starts with, ends with) on a particular column in the Filter object.

FoldChangeFilter.transform(function[, inplace])

Transform the values in the Filter object with the specified function.

FoldChangeFilter.translate_gene_ids(translate_to)

Translates gene names/IDs from one type to another.

FoldChangeFilter.union(*others[, return_type])

Returns a set/string of the union of features between multiple Filter objects/sets (the features that exist in at least one of the Filter objects/sets).