API Reference

pairwisedist.pairwisedist

pairwisedist.pairwisedist.jackknife_distance(data: ndarray, rowvar: bool = True, similarity: bool = False) → ndarray

Calculates the pairwise Jackknife-correlation distance matrix for a given array of n samples by p features, as described in (Heyer et al. 1999, Genome Res.). The Jackknife-correlation distance ranges between 0 and 1. The Jackknife-correlation coefficient is meant to reduce the number of false positives observed in Pearson linear correlation. This reduction is achieved by calculating the Pearson correlation coefficient p times, leaving out a single feature every time, and picking the minimal Pearson coefficient as the Jackknife coefficient. The Jackknife correlation coefficient for X,Y is formally defined as min(Pearson(X[idx != i],Y[idx != i]) for i in range(p)).

Parameters

data (np.ndarray) – an n-by-p numpy array of n samples by p features, to calculate pairwise distance on.
rowvar (bool (default=True)) – If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’.
similarity (bool (default=False)) – If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different).

Returns

an n-by-n numpy array of pairwise Jackknife dissimilarity scores.

Return type

np.ndarray

pairwisedist.pairwisedist.pearson_distance(data: ndarray, rowvar: bool = True, similarity: bool = False) → ndarray: Calculates the pairwise Pearson-correlation distance matrix for a given array of n samples by p features. The Pearson-correlation distance ranges between 0 (linear correlation coefficient is 1) and 1 (linear correlation coefficient is -1). :param data: an n-by-p numpy array of n samples by p features, to calculate pairwise distance on. :type data: np.ndarray :param rowvar: If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’. :type rowvar: bool (default=True) :param similarity: If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different). :type similarity: bool (default=False) :return: an n-by-n numpy array of pairwise Pearson-correlation dissimilarity scores. :rtype: np.ndarray

pairwisedist.pairwisedist.sharpened_cosine_distance(data: ndarray, sharpen_exponent: float = 16, exp_noise_floor: float = 0.1, rowvar: bool = True, similarity: bool = False) → ndarray: Calculates the pairwise sharpened cosine distance matrix for a given array of n samples by p features, as described in a since-deleted tweet by Brandon Rohrer. You can read more about sharpened cosine distance `here <https://github.com/brohrer/sharpened-cosine-similarity>_. The sharpened cosine distance ranges between 0 (highest similarity) and 1 (highest dissimilarity). :param data: an n-by-p numpy array of n samples by p features, to calculate pairwise distance on. :type data: np.ndarray :param sharpen_exponent: :type sharpen_exponent: float (default=16) :param exp_noise_floor: :type exp_noise_floor: float (default=0.1) :param rowvar: If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’. :type rowvar: bool (default=True) :param similarity: If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different). :type similarity: bool (default=False) :return: an n-by-n numpy array of pairwise sharpened cosine distance scores. :rtype: np.ndarray

pairwisedist.pairwisedist.spearman_distance(data: ndarray, rowvar: bool = True, similarity: bool = False) → ndarray: Calculates the pairwise Spearman-correlation distance matrix for a given array of n samples by p features. The Spearman-correlation distance ranges between 0 (correlation coefficient is 1) and 1 (correlation coefficient is -1). :param data: an n-by-p numpy array of n samples by p features, to calculate pairwise distance on. :type data: np.ndarray :param rowvar: If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’. :type rowvar: bool (default=True) :param similarity: If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different). :type similarity: bool (default=False) :return: an n-by-n numpy array of pairwise Spearman-correlation dissimilarity scores. :rtype: np.ndarray

pairwisedist.pairwisedist.yr1_distance(data, omega1: float = 0.5, omega2: float = 0.25, omega3: float = 0.25, rowvar: bool = True, similarity: bool = False) → ndarray

Calculates the pairwise YR1 distance matrix for a given array of n samples by p features, as described in (Son YS, Baek J 2008, Pattern Recognition Letters). The YS1 dissimilarity ranges between 0 and 1. The YS1 dissimilarity is a metric that takes into account the Pearson linear correlation between the samples (R* i,j), the positon of the minimal and maximal values of each sample (M i,j), and the agreement of their slopes (A i,j). The final score (Ys1 i,j) is a weighted average of these three paremeters: YS1 i,j = omega1 * (R* i,j) + omega2 * (A i,j) + omega3 * (M i,j)

Parameters

data (np.ndarray) – an n-by-p numpy array of n samples by p features, to calculate pairwise distance on.
omega1 (float between 0 and 1) – Relative weight of the correlation (R* i,j) component of the YR1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
omega2 (float between 0 and 1) – Relative weight of the slope concordance (A i,j) component of the YR1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
omega3 (float between 0 and 1) – Relative weight of the minimum-maximum similarity (M i,j) component of the YR1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
rowvar (bool (default=True)) – If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’.
similarity (bool (default=False)) – If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different).

Returns

an n-by-n numpy array of pairwise YR1 dissimilarity scores.

Return type

np.ndarray

pairwisedist.pairwisedist.ys1_distance(data: ndarray, omega1: float = 0.5, omega2: float = 0.25, omega3: float = 0.25, rowvar: bool = True, similarity: bool = False) → ndarray

Calculates the pairwise YS1 distance matrix for a given array of n samples by p features, as described in (Son YS, Baek J 2008, Pattern Recognition Letters). The YS1 dissimilarity ranges between 0 and 1. The YS1 dissimilarity is a metric that takes into account the Spearman rank correlation between the samples (S* i,j), the positon of the minimal and maximal values of each sample (M i,j), and the agreement of their slopes (A i,j). The final score (Ys1 i,j) is a weighted average of these three paremeters: YS1 i,j = omega1 * (S* i,j) + omega2 * (A i,j) + omega3 * (M i,j)

Parameters

data (np.ndarray) – an n-by-p numpy array of n samples by p features, to calculate pairwise distance on.
omega1 (float between 0 and 1) – Relative weight of the correlation (S* i,j) component of the YS1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
omega2 (float between 0 and 1) – Relative weight of the slope concordance (A i,j) component of the YS1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
omega3 (float between 0 and 1) – Relative weight of the minimum-maximum similarity (M i,j) component of the YS1 distance. All three relative weights (omega1-3) must add up to exactly 1.0.
rowvar (bool (default=True)) – If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’.
similarity (bool (default=False)) – If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different).

Returns

an n-by-n numpy array of pairwise YS1 dissimilarity scores.

Return type

np.ndarray