rnalysis.filtering.CountFilter.map_orthologs_orthoinspector

CountFilter.map_orthologs_orthoinspector(map_to_organism: str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'], map_from_organism: Literal['auto'] | str | int | Literal['Amborella trichopoda', 'Anolis carolinensis', 'Anopheles gambiae', 'Aquifex aeolicus', 'Arabidopsis thaliana', 'Bacillus cereus', 'Bacillus subtilis', 'Bacteroides thetaiotaomicron', 'Batrachochytrium dendrobatidis', 'Bos taurus', 'Brachypodium distachyon', 'Bradyrhizobium diazoefficiens', 'Branchiostoma floridae', 'Brassica campestris', 'Brassica napus', 'Caenorhabditis briggsae', 'Caenorhabditis elegans', 'Candida albicans', 'Canis lupus familiaris', 'Capsicum annuum', 'Chlamydia trachomatis', 'Chlamydomonas reinhardtii', 'Chloroflexus aurantiacus', 'Ciona intestinalis', 'Citrus sinensis', 'Clostridium botulinum', 'Coxiella burnetii', 'Cryptococcus neoformans', 'Cucumis sativus', 'Danio rerio', 'Daphnia pulex', 'Deinococcus radiodurans', 'Dictyoglomus turgidum', 'Dictyostelium discoideum', 'Dictyostelium purpureum', 'Drosophila melanogaster', 'Emericella nidulans', 'Entamoeba histolytica', 'Equus caballus', 'Eremothecium gossypii', 'Erythranthe guttata', 'Escherichia coli', 'Eucalyptus grandis', 'Felis catus', 'Fusobacterium nucleatum', 'Gallus gallus', 'Geobacter sulfurreducens', 'Giardia intestinalis', 'Gloeobacter violaceus', 'Glycine max', 'Gorilla gorilla gorilla', 'Gossypium hirsutum', 'Haemophilus influenzae', 'Halobacterium salinarum', 'Helianthus annuus', 'Helicobacter pylori', 'Homo sapiens', 'Hordeum vulgare subsp. vulgare', 'Ixodes scapularis', 'Juglans regia', 'Klebsormidium nitens', 'Korarchaeum cryptofilum', 'Lactuca sativa', 'Leishmania major', 'Leptospira interrogans', 'Listeria monocytogenes', 'Macaca mulatta', 'Manihot esculenta', 'Marchantia polymorpha', 'Medicago truncatula', 'Methanocaldococcus jannaschii', 'Methanosarcina acetivorans', 'Monodelphis domestica', 'Monosiga brevicollis', 'Mus musculus', 'Musa acuminata subsp. malaccensis', 'Mycobacterium tuberculosis', 'Neisseria meningitidis serogroup b', 'Nelumbo nucifera', 'Nematostella vectensis', 'Neosartorya fumigata', 'Neurospora crassa', 'Nicotiana tabacum', 'Nitrosopumilus maritimus', 'Ornithorhynchus anatinus', 'Oryza sativa', 'Oryzias latipes', 'Pan troglodytes', 'Paramecium tetraurelia', 'Phaeosphaeria nodorum', 'Physcomitrella patens', 'Phytophthora ramorum', 'Plasmodium falciparum', 'Populus trichocarpa', 'Pristionchus pacificus', 'Prunus persica', 'Pseudomonas aeruginosa', 'Puccinia graminis', 'Pyrobaculum aerophilum', 'Rattus norvegicus', 'Rhodopirellula baltica', 'Ricinus communis', 'Saccharomyces cerevisiae', 'Salmonella typhimurium', 'Schizosaccharomyces japonicus', 'Schizosaccharomyces pombe', 'Sclerotinia sclerotiorum', 'Selaginella moellendorffii', 'Setaria italica', 'Shewanella oneidensis', 'Solanum lycopersicum', 'Solanum tuberosum', 'Sorghum bicolor', 'Spinacia oleracea', 'Staphylococcus aureus', 'Streptococcus pneumoniae', 'Streptomyces coelicolor', 'Strongylocentrotus purpuratus', 'Sulfolobus solfataricus', 'Sus scrofa', 'Synechocystis', 'Thalassiosira pseudonana', 'Theobroma cacao', 'Thermococcus kodakaraensis', 'Thermodesulfovibrio yellowstonii', 'Thermotoga maritima', 'Tribolium castaneum', 'Trichomonas vaginalis', 'Trichoplax adhaerens', 'Triticum aestivum', 'Trypanosoma brucei', 'Ustilago maydis', 'Vibrio cholerae', 'Vitis vinifera', 'Xanthomonas campestris', 'Xenopus laevis', 'Xenopus tropicalis', 'Yarrowia lipolytica', 'Yersinia pestis', 'Zea mays', 'Zostera marina', 'helobdella robusta', 'lepisosteus oculatus', 'mycoplasma genitalium'] = 'auto', gene_id_type: str | Literal['auto'] | Literal['UniProtKB AC/ID', 'UniParc', 'UniRef50', 'UniRef90', 'UniRef100', 'Gene Name', 'CRC64', 'Ensembl', 'Ensembl Genomes', 'Ensembl Genomes Protein', 'Ensembl Genomes Transcript', 'Ensembl Protein', 'Ensembl Transcript', 'GeneID', 'KEGG', 'PATRIC', 'UCSC', 'WBParaSite', 'WBParaSite Transcript/Protein', 'ArachnoServer', 'Araport', 'CGD', 'ConoServer', 'dictyBase', 'EchoBASE', 'euHCVdb', 'FlyBase', 'GeneCards', 'GeneReviews', 'HGNC', 'LegioList', 'Leproma', 'MaizeGDB', 'MGI', 'MIM', 'neXtProt', 'OpenTargets', 'Orphanet', 'PharmGKB', 'PomBase', 'PseudoCAP', 'RGD', 'SGD', 'TubercuList', 'VEuPathDB', 'VGNC', 'WormBase', 'WormBase Protein', 'WormBase Transcript', 'Xenbase', 'ZFIN', 'eggNOG', 'GeneTree', 'HOGENOM', 'OMA', 'OrthoDB', 'TreeFam', 'CCDS', 'EMBL/GenBank/DDBJ', 'EMBL/GenBank/DDBJ CDS', 'GI number', 'PIR', 'RefSeq Nucleotide', 'RefSeq Protein', 'ChiTaRS', 'GeneWiki', 'GenomeRNAi', 'PHI-base', 'CollecTF', 'BioCyc', 'PlantReactome', 'Reactome', 'UniPathway', 'CPTAC', 'ProteomicsDB'] = 'auto', non_unique_mode: Literal['first', 'last', 'random', 'none'] = 'first', remove_unmapped_genes: bool = False, inplace: bool = True)

Map genes to their nearest orthologs in a different species using the OrthoInspector database. This function generates a table describing all matching discovered ortholog pairs (both unique and non-unique) and returns it, and can also translate the genes in this data table into their nearest ortholog, as well as remove unmapped genes.

Parameters:
  • map_to_organism (str or int) – organism name or NCBI taxon ID of the target species for ortholog mapping.

  • map_from_organism (str or int) – organism name or NCBI taxon ID of the input genes’ source species.

  • gene_id_type (str or 'auto' (default='auto')) – the identifier type of the genes/features in the FeatureSet object (for example: ‘UniProtKB’, ‘WormBase’, ‘RNACentral’, ‘Entrez Gene ID’). If the annotations fetched from the KEGG server do not match your gene_id_type, RNAlysis will attempt to map the annotations’ gene IDs to your identifier type. For a full list of legal ‘gene_id_type’ names, see the UniProt website: https://www.uniprot.org/help/api_idmapping

  • non_unique_mode ('first', 'last', 'random', or 'none' (default='first')) – How to handle non-unique mappings. ‘first’ will keep the first mapping found for each gene; ‘last’ will keep the last; ‘random’ will keep a random mapping; and ‘none’ will discard all non-unique mappings.

  • remove_unmapped_genes (bool (default=False)) – if True, rows with gene names/IDs that could not be mapped to an ortholog will be dropped from the table. Otherwise, they will remain in the table with their original gene name/ID.

  • inplace (bool (default=True)) – If True (default), filtering will be applied to the current Filter object. If False, the function will return a new Filter instance and the current instance will not be affected.

Returns:

DataFrame describing all discovered mappings (unique and otherwise). If inplace=True, returns a filtered instance of the Filter object as well.