pairwisedist.pairwisedist.jackknife_distance
- pairwisedist.pairwisedist.jackknife_distance(data: ndarray, rowvar: bool = True, similarity: bool = False) ndarray
Calculates the pairwise Jackknife-correlation distance matrix for a given array of n samples by p features, as described in (Heyer et al. 1999, Genome Res.). The Jackknife-correlation distance ranges between 0 and 1. The Jackknife-correlation coefficient is meant to reduce the number of false positives observed in Pearson linear correlation. This reduction is achieved by calculating the Pearson correlation coefficient p times, leaving out a single feature every time, and picking the minimal Pearson coefficient as the Jackknife coefficient. The Jackknife correlation coefficient for X,Y is formally defined as min(Pearson(X[idx != i],Y[idx != i]) for i in range(p)).
- Parameters
data (np.ndarray) – an n-by-p numpy array of n samples by p features, to calculate pairwise distance on.
rowvar (bool (default=True)) – If True, calculates the pairwise distance between the rows of ‘data’. If False, calculate the pairwise distance between the columns of ‘data’.
similarity (bool (default=False)) – If False, returns a pairwise distance matrix (0 means closest, 1 means furthest). If True, returns a pairwise similarity matrix (1 means most similar, 0 means most different).
- Returns
an n-by-n numpy array of pairwise Jackknife dissimilarity scores.
- Return type
np.ndarray