rnalysis.filtering.CountFilter.split_by_principal_components

CountFilter.split_by_principal_components(components: PositiveInt | List[PositiveInt], gene_fraction: Fraction = 0.1, power_transform: bool = True) → Tuple[CountFilter, CountFilter] | Tuple[Tuple[CountFilter, CountFilter], ...]

Performs Principal Component Analysis (PCA), and split the table based on the contribution (loadings) of genes to specific Principal Components. For each Principal Component specified, RNAlysis will find the X% most influential genes on the Principal Component based on their loadings (where X is gene_fraction), (X/2)% from the top and (X/2)% from the bottom. This type of analysis can help you understand which genes contribute the most to each principal component.

Parameters:

components (int or list of integers) – the Principal Components the table should be filtered by. Each Principal Component will be analyzed separately.
gene_fraction (float between 0 and 1 (default=0.1)) – the total fraction of top influential genes that will be returned. For example, if gene_fraction=0.1, RNAlysis will return the top and bottom 5% of genes based on their loadings for any principal component.
power_transform (bool (default=True)) – if True, RNAlysis will apply a power transform (Box-Cox) to the data prior to standartization and principal component analysis.