Predicting functions of uncharacterized gene products from microbial communities

Armet, A. M. et al. Rethinking healthy eating in light of the gut microbiome. Cell Host Microbe 30, 764–785 (2022).
Google Scholar
Sharon, G. et al. Specialized metabolites from the microbiome in health and disease. Cell Metab 20, 719–730 (2014).
Google Scholar
Baldrian, P. et al. Active and total microbial communities in forest soil are largely different and highly stratified during decomposition. ISME J. 6, 248–258 (2012).
Google Scholar
Singleton, C. M. et al. Methanotrophy across a natural permafrost thaw environment. ISME J. 12, 2544–2558 (2018).
Google Scholar
Salazar, G. et al. Gene expression changes and community turnover differentially shape the global ocean metatranscriptome. Cell 179, 1068–1083 (2019).
Google Scholar
Joice, R., Yasuda, K., Shafquat, A., Morgan, X. C. & Huttenhower, C. Determining microbial products and identifying molecular targets in the human microbiome. Cell Metab. 20, 731–741 (2014).
Google Scholar
Zhang, Y. et al. Discovery of bioactive microbial gene products in inflammatory bowel disease. Nature 606, 754–760 (2022).
Google Scholar
Pasolli, E. et al. Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle. Cell 176, 649–662 (2019).
Google Scholar
Almeida, A. et al. A unified catalog of 204,938 reference genomes from the human gut microbiome. Nat. Biotechnol. 39, 105–114 (2021).
Google Scholar
Peisl, B. Y. L., Schymanski, E. L. & Wilmes, P. Dark matter in host–microbiome metabolomics: tackling the unknowns—a review. Anal. Chim. Acta 1037, 13–27 (2018).
Google Scholar
Vanni, C. et al. Unifying the known and unknown microbial coding sequence space. eLife 11, e67667 (2022).
Google Scholar
Pavlopoulos, G. A. et al. Unraveling the functional dark matter through global metagenomics. Nature 622, 594–602 (2023).
Google Scholar
Browne, H. P. et al. Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation. Nature 533, 543–546 (2016).
Google Scholar
Lagier, J. C. et al. Culture of previously uncultured members of the human gut microbiota by culturomics. Nat. Microbiol. 1, 16203 (2016).
Google Scholar
Almeida, A. et al. A new genomic blueprint of the human gut microbiota. Nature 568, 499–504 (2019).
Google Scholar
Schnoes, A. M., Ream, D. C., Thorman, A. W., Babbitt, P. C. & Friedberg, I. Biases in the experimental annotations of protein function and their effect on our understanding of protein function space. PLoS Comput. Biol. 9, e1003063 (2013).
Google Scholar
Rost, B., Liu, J., Nair, R., Wrzeszczynski, K. O. & Ofran, Y. Automatic prediction of protein function. Cell Mol. Life Sci. 60, 2637–2650 (2003).
Google Scholar
Lee, D., Redfern, O. & Orengo, C. Predicting protein function from sequence and structure. Nat. Rev. Mol. Cell Biol. 8, 995–1005 (2007).
Google Scholar
Xin, F. & Radivojac, P. Computational methods for identification of functional residues in protein structures. Curr. Protein Pept. Sci. 12, 456–469 (2011).
Google Scholar
Radivojac, P. et al. A large-scale evaluation of computational protein function prediction. Nat. Methods 10, 221–227 (2013).
Google Scholar
Jiang, Y. et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biol. 17, 184 (2016).
Google Scholar
Zhou, N. et al. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol. 20, 244 (2019).
Google Scholar
Jensen, L. J. et al. Prediction of human protein function from post-translational modifications and localization features. J. Mol. Biol. 319, 1257–1265 (2002).
Google Scholar
Wass, M. N. & Sternberg, M. J. ConFunc—functional annotation in the twilight zone. Bioinformatics 24, 798–806 (2008).
Google Scholar
Clark, W. T. & Radivojac, P. Analysis of protein function and its prediction from amino acid sequence. Proteins 79, 2086–2096 (2011).
Google Scholar
You, R. et al. GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics 34, 2465–2473 (2018).
Google Scholar
Korbel, J. O., Jensen, L. J., von Mering, C. & Bork, P. Analysis of genomic context: prediction of functional associations from conserved bidirectionally transcribed gene pairs. Nat. Biotechnol. 22, 911–917 (2004).
Google Scholar
Enault, F., Suhre, K. & Claverie, J. M. Phydbac ‘Gene Function Predictor’: a gene annotation tool based on genomic context analysis. BMC Bioinformatics 6, 247 (2005).
Google Scholar
Pellegrini, M., Marcotte, E. M., Thompson, M. J., Eisenberg, D. & Yeates, T. O. Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc. Natl Acad. Sci. USA 96, 4285–4288 (1999).
Google Scholar
Engelhardt, B. E., Jordan, M. I., Muratore, K. E. & Brenner, S. E. Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput. Biol. 1, e45 (2005).
Google Scholar
Pazos, F. & Sternberg, M. J. Automated prediction of protein function and detection of functional sites from structure. Proc. Natl Acad. Sci. USA 101, 14754–14759 (2004).
Google Scholar
Deng, M., Zhang, K., Mehta, S., Chen, T. & Sun, F. Prediction of protein function using protein–protein interaction data. J. Comput. Biol. 10, 947–960 (2003).
Google Scholar
Nabieva, E., Jim, K., Agarwal, A., Chazelle, B. & Singh, M. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, i302–i310 (2005).
Google Scholar
Wells, J. A. & McClendon, C. L. Reaching for high-hanging fruit in drug discovery at protein–protein interfaces. Nature 450, 1001–1009 (2007).
Google Scholar
Brown, M. P. et al. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl Acad. Sci. USA 97, 262–267 (2000).
Google Scholar
van Noort, V., Snel, B. & Huynen, M. A. Predicting gene function by conserved co-expression. Trends Genet. 19, 238–242 (2003).
Google Scholar
Guan, Y. et al. Predicting gene function in a hierarchical context with an ensemble of classifiers. Genome Biol. 9, S3 (2008).
Google Scholar
Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9, S4 (2008).
Google Scholar
Piovesan, D. & Tosatto, S. C. E. INGA 2.0: improving protein function prediction for the dark proteome. Nucleic Acids Res. 47, W373–w378 (2019).
Google Scholar
Bashiardes, S., Zilberman-Schapira, G. & Elinav, E. Use of metatranscriptomics in microbiome research. Bioinform. Biol. Insights 10, 19–25 (2016).
Google Scholar
Franzosa, E. A. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).
Google Scholar
Franzosa, E. A. et al. Relating the metatranscriptome and metagenome of the human gut. Proc. Natl Acad. Sci. USA 111, E2329–E2338 (2014).
Google Scholar
Heintz-Buschart, A. et al. Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes. Nat. Microbiol. 2, 16180 (2016).
Google Scholar
Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569, 655–662 (2019).
Google Scholar
Coolen, M. J. & Orsi, W. D. The transcriptional response of microbial communities in thawing Alaskan permafrost soils. Front. Microbiol. 6, 197 (2015).
Google Scholar
Vorobev, A. et al. Transcriptome reconstruction and functional analysis of eukaryotic marine plankton communities via high-throughput metagenomics and metatranscriptomics. Genome Res. 30, 647–659 (2020).
Google Scholar
Lee, H. K., Hsu, A. K., Sajdak, J., Qin, J. & Pavlidis, P. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14, 1085–1094 (2004).
Google Scholar
Gaiteri, C., Ding, Y., French, B., Tseng, G. C. & Sibille, E. Beyond modules and hubs: the potential of gene coexpression networks for investigating molecular mechanisms of complex brain disorders. Genes Brain Behav. 13, 13–24 (2014).
Google Scholar
Stuart, J. M., Segal, E., Koller, D. & Kim, S. K. A gene-coexpression network for global discovery of conserved genetic modules. Science 302, 249–255 (2003).
Google Scholar
van Dam, S., Vosa, U., van der Graaf, A., Franke, L. & de Magalhaes, J. P. Gene co-expression analysis for functional classification and gene–disease predictions. Brief. Bioinform. 19, 575–592 (2018).
Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Google Scholar
Zhang, Y. & Maharjan, S. biobakery/fugassem: FUGAsseM v0.3.8. Zenodo https://doi.org/10.5281/zenodo.16477039 (2025).
Hvidsten, T. R., Komorowski, J., Sandvik, A. K. & Laegreid, A. Predicting gene function from gene expressions and ontologies. In Proceedings of the Pacific Symposium on Biocomputing (eds Altman, R. B., Dunker, A. K., Hunter, L., Lauderdale, K. & Klein, T. E.) (World Scientific, 2001).
Zhou, X., Kao, M. C. & Wong, W. H. Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl Acad. Sci. USA 99, 12783–12788 (2002).
Google Scholar
Myers, C. L., Barrett, D. R., Hibbs, M. A., Huttenhower, C. & Troyanskaya, O. G. Finding function: evaluation methods for functional genomic data. BMC Genomics 7, 187 (2006).
Google Scholar
Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43, D213–D221 (2015).
Google Scholar
von Mering, C. et al. STRING: known and predicted protein–protein associations, integrated and transferred across organisms. Nucleic Acids Res. 33, D433–D437 (2005).
Google Scholar
Suzek, B. E., Huang, H., McGarvey, P., Mazumder, R. & Wu, C. H. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 1282–1288 (2007).
Google Scholar
Yellaboina, S., Tasneem, A., Zaykin, D. V., Raghavachari, B. & Jothi, R. DOMINE: a comprehensive collection of known and predicted domain-domain interactions. Nucleic Acids Res. 39, D730–D735 (2011).
Google Scholar
Kanehisa, M. & Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 (2000).
Google Scholar
Caspi, R. et al. The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases. Nucleic Acids Res. 42, D459–D471 (2014).
Google Scholar
Yao, S. et al. NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information. Nucleic Acids Res. 49, W469–w475 (2021).
Google Scholar
Kulmanov, M. & Hoehndorf, R. DeepGOPlus: improved protein function prediction from sequence. Bioinformatics 37, 1187 (2021).
Google Scholar
Rodríguez Del Río, Á. et al. Functional and evolutionary significance of unknown genes from uncultivated taxa. Nature 626, 377–384 (2024).
Google Scholar
Mirdita, M. et al. ColabFold: making protein folding accessible to all. Nat. Methods 19, 679–682 (2022).
Google Scholar
Varadi, M. et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Res. 50, D439–d444 (2022).
Google Scholar
van Kempen, M. et al. Fast and accurate protein structure search with Foldseek. Nat. Biotechnol. 42, 243–246 (2024).
Google Scholar
Koppel, N., Maini Rekdal, V. & Balskus, E. P. Chemical transformation of xenobiotics by the human gut microbiota. Science 356, eaag2770 (2017).
Google Scholar
Das, N. K. et al. Microbial metabolite signaling is required for systemic iron homeostasis. Cell Metab. 31, 115–130.e116 (2020).
Google Scholar
Seyoum, Y., Baye, K. & Humblot, C. Iron homeostasis in host and gut bacteria—a complex interrelationship. Gut Microbes 13, 1–19 (2021).
Google Scholar
Chen, Z. et al. The role of intestinal bacteria and gut–brain axis in hepatic encephalopathy. Front. Cell. Infect. Microbiol. 10, 595759 (2020).
Google Scholar
Galdiero, S. et al. Microbe–host interactions: structure and role of gram-negative bacterial porins. Curr. Protein Pept. Sci. 13, 843–854 (2012).
Google Scholar
Hogbom, M. & Ihalin, R. Functional and structural characteristics of bacterial proteins that bind host cytokines. Virulence 8, 1592–1601 (2017).
Google Scholar
Jaehme, M. & Slotboom, D. J. Diversity of membrane transport proteins for vitamins in bacteria and archaea. Biochim. Biophys. Acta 1850, 565–576 (2015).
Google Scholar
Fujita, M. et al. A TonB-dependent receptor constitutes the outer membrane transport system for a lignin-derived aromatic compound. Commun. Biol. 2, 432 (2019).
Google Scholar
Connors, J., Dawe, N. & van Limbergen, J. The role of succinate in the regulation of intestinal inflammation. Nutrients 11, 25 (2018).
Google Scholar
Boudreau, M. A., Fisher, J. F. & Mobashery, S. Messenger functions of the bacterial cell wall-derived muropeptides. Biochemistry 51, 2974–2990 (2012).
Google Scholar
Hosaka, H., Kawamura, M., Hirano, T., Hakamata, W. & Nishio, T. Utilization of sucrose and analog disaccharides by human intestinal bifidobacteria and lactobacilli: search of the bifidobacteria enzymes involved in the degradation of these disaccharides. Microbiol. Res. 240, 126558 (2020).
Google Scholar
Rawat, P. S., Li, Y., Zhang, W., Meng, X. & Liu, W. Hungatella hathewayi, an efficient glycosaminoglycan-degrading Firmicutes from human gut and its chondroitin ABC exolyase with high activity and broad substrate specificity. Appl. Environ. Microbiol. 88, e0154622 (2022).
Google Scholar
Cullender, T. C. et al. Innate and adaptive immunity interact to quench microbiome flagellar motility in the gut. Cell Host Microbe 14, 571–581 (2013).
Google Scholar
Lopez-Siles, M., Duncan, S. H., Garcia-Gil, L. J. & Martinez-Medina, M. Faecalibacterium prausnitzii: from microbiology to diagnostics and prognostics. ISME J. 11, 841–852 (2017).
Google Scholar
Cornuault, J. K. et al. Phages infecting Faecalibacterium prausnitzii belong to novel viral genera that help to decipher intestinal viromes. Microbiome 6, 65 (2018).
Google Scholar
Bai, Z. et al. Comprehensive analysis of 84 Faecalibacterium prausnitzii strains uncovers their genetic diversity, functional characteristics, and potential risks. Front. Cell. Infect. Microbiol. 12, 919701 (2022).
Google Scholar
Koropatkin, N. M. & Smith, T. J. SusG: a unique cell-membrane-associated α-amylase from a prominent human gut symbiont targets complex starch molecules. Structure 18, 200–215 (2010).
Google Scholar
Martens, E. C. et al. Recognition and degradation of plant cell wall polysaccharides by two human gut symbionts. PLoS Biol. 9, e1001221 (2011).
Google Scholar
Wu, M. et al. Genetic determinants of in vivo fitness and diet responsiveness in multiple human gut Bacteroides. Science 350, aac5992 (2015).
Google Scholar
Terrapon, N. et al. PULDB: the expanded database of polysaccharide utilization loci. Nucleic Acids Res. 46, D677–d683 (2018).
Google Scholar
Pavarina, G. C., Lemos, E. G. M., Lima, N. S. M. & Pizauro, J. M. Jr. Characterization of a new bifunctional endo-1,4-β-xylanase/esterase found in the rumen metagenome. Sci. Rep. 11, 10440 (2021).
Google Scholar
Carneiro, L. et al. Selective xyloglucan oligosaccharide hydrolysis by a GH31 α-xylosidase from Escherichia coli. Carbohydr. Polym. 284, 119150 (2022).
Google Scholar
Lin, H. et al. Multiomics study reveals Enterococcus and Subdoligranulum are beneficial to necrotizing enterocolitis. Front. Microbiol. 12, 752102 (2021).
Google Scholar
Shi, T. T. et al. Comparative assessment of gut microbial composition and function in patients with Graves’ disease and Graves’ orbitopathy. J. Endocrinol. Invest. 44, 297–310 (2021).
Google Scholar
Girardin, S. E. et al. Nod1 detects a unique muropeptide from gram-negative bacterial peptidoglycan. Science 300, 1584–1587 (2003).
Google Scholar
Hasegawa, M. et al. Differential release and distribution of Nod1 and Nod2 immunostimulatory molecules among bacterial species and environments. J. Biol. Chem. 281, 29054–29063 (2006).
Google Scholar
Elshorbagy, A. et al. Amino acid changes during transition to a vegan diet supplemented with fish in healthy humans. Eur. J. Nutr. 56, 1953–1962 (2017).
Google Scholar
Dong, Z., Sinha, R. & Richie, J. P. Jr. Disease prevention and delayed aging by dietary sulfur amino acid restriction: translational implications. Ann. N. Y. Acad. Sci. 1418, 44–55 (2018).
Google Scholar
Whisstock, J. C. & Lesk, A. M. Prediction of protein function from protein sequence and structure. Q. Rev. Biophys. 36, 307–340 (2003).
Google Scholar
Sleator, R. D. & Walsh, P. An overview of in silico protein function prediction. Arch. Microbiol. 192, 151–155 (2010).
Google Scholar
Overbeek, R., Fonstein, M., D’Souza, M., Pusch, G. D. & Maltsev, N. The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA 96, 2896–2901 (1999).
Google Scholar
Teichmann, S. A. & Babu, M. M. Conservation of gene co-regulation in prokaryotes and eukaryotes. Trends Biotechnol. 20, 407–410 (2002).
Google Scholar
Eisenberg, D., Marcotte, E. M., Xenarios, I. & Yeates, T. O. Protein function in the post-genomic era. Nature 405, 823–826 (2000).
Google Scholar
Sharan, R., Ulitsky, I. & Shamir, R. Network-based prediction of protein function. Mol. Syst. Biol. 3, 88 (2007).
Google Scholar
Wang, P. I. & Marcotte, E. M. It’s the machine that matters: predicting gene function and phenotype from protein networks. J. Proteomics 73, 2277–2289 (2010).
Google Scholar
Ryan, C. J. et al. High-resolution network biology: connecting sequence with function. Nat. Rev. Genet. 14, 865–879 (2013).
Google Scholar
Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).
Google Scholar
Costanzo, M. et al. Environmental robustness of the global yeast genetic interaction network. Science 372, eabf8424 (2021).
Google Scholar
Serin, E. A., Nijveen, H., Hilhorst, H. W. & Ligterink, W. Learning from co-expression networks: possibilities and challenges. Front. Plant Sci. 7, 444 (2016).
Google Scholar
Southard, J. N. Protein analysis using real-time PCR instrumentation: incorporation in an integrated, inquiry-based project. Biochem. Mol. Biol. Educ. 42, 142–151 (2014).
Google Scholar
Cantalapiedra, C. P., Hernández-Plaza, A., Letunic, I., Bork, P. & Huerta-Cepas, J. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
Google Scholar
Schirmer, M. et al. Dynamics of metatranscription in the inflammatory bowel disease gut microbiome. Nat. Microbiol. 3, 337–346 (2018).
Google Scholar
Zhang, Y., Thompson, K. N., Huttenhower, C. & Franzosa, E. A. Statistical approaches for differential expression analysis in metatranscriptomics. Bioinformatics 37, i34–i41 (2021).
Google Scholar
Parrow, N. L., Fleming, R. E. & Minnick, M. F. Sequestration and scavenging of iron in infection. Infect. Immun. 81, 3503–3514 (2013).
Google Scholar
Sanchez-Jimenez, A., Marcos-Torres, F. J. & Llamas, M. A. Mechanisms of iron homeostasis in Pseudomonas aeruginosa and emerging therapeutics directed to disrupt this vital process. Microb. Biotechnol. 16, 1475–1491 (2023).
Google Scholar
Kim, C. S. et al. Seasonal and spatial environmental influence on Opisthorchis viverrini intermediate hosts, abundance, and distribution: insights on transmission dynamics and sustainable control. PLoS Negl. Trop. Dis. 10, e0005121 (2016).
Google Scholar
Isobe, K. & Ohte, N. Ecological perspectives on microbes involved in N-cycling. Microbes Environ. 29, 4–16 (2014).
Google Scholar
Yi, M. et al. Temporal changes of microbial community structure and nitrogen cycling processes during the aerobic degradation of phenanthrene. Chemosphere 286, 131709 (2022).
Google Scholar
Davila, A. M. et al. Intestinal luminal nitrogen metabolism: role of the gut microbiota and consequences for the host. Pharmacol. Res. 68, 95–107 (2013).
Google Scholar
Hou, K. et al. Microbiota in health and diseases. Signal. Transduct. Target. Ther. 7, 135 (2022).
Google Scholar
Fitzgerald, C. B. et al. Comparative analysis of Faecalibacterium prausnitzii genomes shows a high level of genome plasticity and warrants separation into new species-level taxa. BMC Genomics 19, 931 (2018).
Google Scholar
Silas, S. et al. Type III CRISPR–Cas systems can provide redundancy to counteract viral escape from type I systems. eLife 6, e27601 (2017).
Google Scholar
Cuiv, P. O. et al. Isolation of genetically tractable most-wanted bacteria by metaparental mating. Sci. Rep. 5, 13282 (2015).
Google Scholar
Deutscher, M. P. Degradation of RNA in bacteria: comparison of mRNA and stable RNA. Nucleic Acids Res. 34, 659–666 (2006).
Google Scholar
Reck, M. et al. Stool metatranscriptomics: a technical guideline for mRNA stabilisation and isolation. BMC Genomics 16, 494 (2015).
Google Scholar
Sczyrba, A. et al. Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nat. Methods 14, 1063–1071 (2017).
Google Scholar
Yue, Q. et al. Functional operons in secondary metabolic gene clusters in Glarea lozoyensis (Fungi, Ascomycota, Leotiomycetes). mBio 6, e00703 (2015).
Google Scholar
Friedberg, I. Automated protein function prediction—the genomic challenge. Brief. Bioinform. 7, 225–242 (2006).
Google Scholar
Jeffery, C. J. Current successes and remaining challenges in protein function prediction. Front. Bioinform. 3, 1222182 (2023).
Google Scholar
Abu-Ali, G. S. et al. Metatranscriptome of human faecal microbial communities in a cohort of adult men. Nat. Microbiol. 3, 356–366 (2018).
Google Scholar
Qin, J. et al. A human gut microbial gene catalogue established by metagenomic sequencing. Nature 464, 59–65 (2010).
Google Scholar
Li, J. et al. An integrated catalog of reference genes in the human gut microbiome. Nat. Biotechnol. 32, 834–841 (2014).
Google Scholar
Franzosa, E. A. et al. Species-level functional profiling of metagenomes and metatranscriptomes. Nat. Methods 15, 962–968 (2018).
Google Scholar
Beghini, F. et al. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3. eLife 10, e65088 (2021).
Google Scholar
Huerta-Cepas, J. et al. eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 47, D309–d314 (2019).
Google Scholar
Thomas, A. M. et al. Metagenomic analysis of colorectal cancer datasets identifies cross-cohort microbial diagnostic signatures and a link with choline degradation. Nat. Med. 25, 667–678 (2019).
Google Scholar
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31, 1674–1676 (2015).
Google Scholar
Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010).
Google Scholar
Li, W., Jaroszewski, L. & Godzik, A. Clustering of highly homologous sequences to reduce the size of large protein databases. Bioinformatics 17, 282–283 (2001).
Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Google Scholar
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Google Scholar
Zhang, Y. et al. Metatranscriptomics for the human microbiome and microbial community functional profiling. Annu. Rev. Biomed. Data Sci. 4, 279–311 (2021).
Google Scholar
Klingenberg, H. & Meinicke, P. How to normalize metatranscriptomic count data for differential expression analysis. PeerJ 5, e3859 (2017).
Google Scholar
Yu, G., Wang, L. G., Han, Y. & He, Q. Y. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS 16, 284–287 (2012).
Google Scholar




