The growing interest in the role of the gut virome in human health and disease, has led to several recent large-scale viral catalogue projects mining human gut metagenomes each using varied computational tools and quality control criteria. Importantly, there has been to date no consistent comparison of these catalogues’ quality, diversity, and overlap. In this project, we therefore systematically surveyed nine previously published human gut viral catalogues. While these catalogues collectively screened >40,000 human fecal metagenomes, 82% of the recovered 345,613 viral sequences were unique to one catalogue, highlighting limited redundancy between the ressources and suggesting the need for an aggregated resource bringing these viral sequences together. We further expanded these viral catalogues by mining 7,867 infant gut metagenomes from 12 large-scale infant studies collected in 9 different countries. From these datasets, we constructed the Aggregated Gut Viral Catalogue (AVrC), a unified modular resource containing 1,018,941 dereplicated viral sequences (449,859 species-level vOTUs). Using computational inference tools, annotations were obtained for each vOTU representative sequence quality, viral taxonomy, predicted viral lifestyle, and putative host. This project aims to facilitate the reuse of previously published viral catalogues by the research community and follows a modular framework to enable future expansions as novel data becomes available.
The Aggregated Gut Viral Catalogue (AVrC): A unified resource for exploring the viral diversity of the human gut / Galperina, A.; Lugli, G. A.; Milani, C.; De Vos, W. M.; Ventura, M.; Salonen, A.; Hurwitz, B.; Ponsero, A. J.. - In: PLOS COMPUTATIONAL BIOLOGY. - ISSN 1553-734X. - 21:5(2025). [10.1371/journal.pcbi.1012268]
The Aggregated Gut Viral Catalogue (AVrC): A unified resource for exploring the viral diversity of the human gut
Lugli G. A.;Milani C.;Ventura M.;
2025-01-01
Abstract
The growing interest in the role of the gut virome in human health and disease, has led to several recent large-scale viral catalogue projects mining human gut metagenomes each using varied computational tools and quality control criteria. Importantly, there has been to date no consistent comparison of these catalogues’ quality, diversity, and overlap. In this project, we therefore systematically surveyed nine previously published human gut viral catalogues. While these catalogues collectively screened >40,000 human fecal metagenomes, 82% of the recovered 345,613 viral sequences were unique to one catalogue, highlighting limited redundancy between the ressources and suggesting the need for an aggregated resource bringing these viral sequences together. We further expanded these viral catalogues by mining 7,867 infant gut metagenomes from 12 large-scale infant studies collected in 9 different countries. From these datasets, we constructed the Aggregated Gut Viral Catalogue (AVrC), a unified modular resource containing 1,018,941 dereplicated viral sequences (449,859 species-level vOTUs). Using computational inference tools, annotations were obtained for each vOTU representative sequence quality, viral taxonomy, predicted viral lifestyle, and putative host. This project aims to facilitate the reuse of previously published viral catalogues by the research community and follows a modular framework to enable future expansions as novel data becomes available.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


