Anthony, S. J. et al. A strategy to estimate unknown viral diversity in mammals. MBio 4, e00598-13 (2013).
Jones, K. E. et al. Global trends in emerging infectious diseases. Nature 451, 990–993 (2008).
Mollentze, N., Babayan, S. A. & Streicker, D. G. Identifying and prioritizing potential human-infecting viruses from their genome sequences. PLoS Biol. 19, e3001390 (2021).
Schiller, J. T. & Lowy, D. R. An introduction to virus infections and human cancer. Recent Results Cancer Res. 217, 1–11 (2021).
Martens, C. R. & Accornero, F. Viruses in the heart: direct and indirect routes to myocarditis and heart failure. Viruses 13, 1924 (2021).
Bjornevik, K. et al. Longitudinal analysis reveals high prevalence of Epstein–Barr virus associated with multiple sclerosis. Science 375, 296–301 (2022).
Levine, K. S. et al. Virus exposure and neurodegenerative disease risk across national biobanks. Neuron 111, 1086–1093.e2 (2023).
Cairns, D. M., Itzhaki, R. F. & Kaplan, D. L. Potential involvement of varicella zoster virus in Alzheimer’s disease via reactivation of quiescent herpes simplex virus type 1. J Alzheimers Dis. 88, 1189–1200 (2022).
Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).
Edgar, R. C. et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).
Babaian, A. & Edgar, R. Ribovirus classification by a polymerase barcode sequence. PeerJ 10, e14055 (2022).
Chang, J.-T., Liu, L.-B., Wang, P.-G. & An, J. Single-cell RNA sequencing to understand host‒virus interactions. Virol. Sin. 39, 1–8 (2024).
Hill, V. et al. Toward a global virus genomic surveillance network. Cell Host Microbe 31, 861–873 (2023).
Ren, J. et al. Identifying viruses from metagenomic data using deep learning. Quant. Biol. 8, 64–77 (2020).
Tithi, S. S., Aylward, F. O., Jensen, R. V. & Zhang, L. FastViromeExplorer: a pipeline for virus and phage identification and abundance profiling in metagenomics data. PeerJ 6, e4227 (2018).
Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2024).
Amgarten, D., Braga, L. P. P., da Silva, A. M. & Setubal, J. C. MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins. Front. Genet. 9, 304 (2018).
Starikova, E. V. et al. Phigaro: high-throughput prophage sequence annotation. Bioinformatics 36, 3882–3884 (2020).
Kieft, K., Zhou, Z. & Anantharaman, K. VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences. Microbiome 8, 90 (2020).
Antipov, D., Raiko, M., Lapidus, A. & Pevzner, P. A. Metaviral SPAdes: assembly of viruses from metagenomic data. Bioinformatics 36, 4126–4129 (2020).
Ren, J., Ahlgren, N. A., Lu, Y. Y., Fuhrman, J. A. & Sun, F. VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 5, 69 (2017).
Guo, J. et al. VirSorter2: a multi-classifier, expert-guided approach to detect diverse DNA and RNA viruses. Microbiome 9, 37 (2021).
Xia, Y., Liu, Y., Deng, M. & Xi, R. Detecting virus integration sites based on multiple related sequencing data by VirTect. BMC Med. Genomics 12, 19 (2019).
Bost, P. et al. Host–viral infection maps reveal signatures of severe COVID-19 patients. Cell 181, 1475–1488.e12 (2020).
Lee, C. Y. et al. Venus: an efficient virus infection detection and fusion site discovery method using single-cell and bulk RNA-seq data. PLoS Comput. Biol. 18, e1010636 (2022).
Yasumizu, Y., Hara, A., Sakaguchi, S. & Ohkura, N. VIRTUS: a pipeline for comprehensive virus analysis from conventional RNA-seq data. Bioinformatics 37, 1465–1467 (2021).
Lu, J. & Salzberg, S. L. Ultrafast and accurate 16S rRNA microbial community analysis using Kraken 2. Microbiome 8, 124 (2020).
Wood, D. E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol. 20, 257 (2019).
Hou, X. et al. Using artificial intelligence to document the hidden RNA virosphere. Cell https://doi.org/10.1016/j.cell.2024.09.027 (2024).
Melsted, P. et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat. Biotechnol. 39, 813–818 (2021).
Sullivan, D. K. et al. kallisto, bustools and kb-python for quantifying bulk, single-cell and single-nucleus RNA-seq. Nat. Protoc. https://doi.org/10.1038/s41596-024-01057-0 (2024).
Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).
Ramsköld, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).
Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).
Gierahn, T. M. et al. Seq-Well: portable, low-cost RNA sequencing of single cells at high throughput. Nat. Methods 14, 395–398 (2017).
Desai, N. et al. Temporal and spatial heterogeneity of host response to SARS-CoV-2 pulmonary infection. Nat. Commun. 11, 6319 (2020).
Viloria Winnett, A. et al. Morning SARS-CoV-2 testing yields better detection of infection due to higher viral loads in saliva and nasal swabs upon waking. Microbiol. Spectr. 10, e0387322 (2022).
Viloria Winnett, A. et al. Extreme differences in SARS-CoV-2 viral loads among respiratory specimen types during presumed pre-infectious and infectious periods. PNAS Nexus 2, gad033 (2023).
Kotliar, D. et al. Single-cell profiling of Ebola virus disease in vivo reveals viral and host dynamics. Cell 183, 1383–1401.e19 (2020).
Sharma, A. et al. Human iPSC-derived cardiomyocytes are susceptible to SARS-CoV-2 infection. Cell Rep. Med. 1, 100052 (2020).
Peck, K. M. & Lauring, A. S. Complexities of viral mutation rates. J. Virol. 92, e0103117 (2018).
Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. mBio 14, e0160723 (2023).
Breitwieser, F. P., Pertea, M., Zimin, A. V. & Salzberg, S. L. Human contamination in bacterial genomes has created thousands of spurious proteins. Genome Res. 29, 954–960 (2019).
Steinegger, M. & Salzberg, S. L. Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank. Genome Biol. 21, 115 (2020).
Wang, J. & Han, G.-Z. Genome mining shows that retroviruses are pervasively invading vertebrate genomes. Nat. Commun. 14, 4968 (2023).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Warren, W. C. et al. Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility. Science 370, eabc6617 (2020).
Traag, V. A., Waltman, L. & van Eck, N. J. From Louvain to Leiden: guaranteeing well-connected communities. Sci. Rep. 9, 5233 (2019).
Wachtman, L. & Mansfield, K. Viral diseases of nonhuman primates. In Nonhuman Primates in Biomedical Research 2nd edn (eds. Abee, C. R. et al.) Ch. 1 (Academic Press, 2012).
Porter, A. F., Cobbin, J., Li, C.-X., Eden, J.-S. & Holmes, E. C. Metagenomic identification of viral sequences in laboratory reagents. Viruses 13, 2122 (2021).
Callanan, J. et al. Expansion of known ssRNA phage genomes: from tens to over a thousand. Sci. Adv. 6, eaay5981 (2020).
Cohen, J. I. Herpesvirus latency. J. Clin. Invest. 130, 3361–3369 (2020).
Woźniakowski, G. & Samorek-Salamonowicz, E. Animal herpesviruses and their zoonotic potential for cross-species infection. Ann. Agric. Environ. Med. 22, 191–194 (2015).
Yao, X. et al. In vitro infection dynamics of wuxiang virus in different cell lines. Viruses 14, 2383 (2022).
Melsted, P., Ntranos, V. & Pachter, L. The barcode, UMI, set format and BUStools. Bioinformatics 35, 4472–4473 (2019).
Sakaguchi, S., Nakano, T. & Nakagawa, S. NeoRdRp2 with improved seed data, annotations, and scoring. Front. Virol. 4, 1378695 (2024).
Sakaguchi, S. et al. NeoRdRp: a comprehensive dataset for identifying RNA-dependent RNA polymerases of various RNA viruses from metatranscriptomic data. Microbes Environ. 37, ME22001 (2022).
Pirtskhalava, M. et al. DBAASP v3: database of antimicrobial/cytotoxic activity and structure of peptides as a resource for development of new therapeutics. Nucleic Acids Res. 49, D288–D297 (2021).
Abdill, R. J. et al. Integration of 168,000 samples reveals global patterns of the human gut microbiome. Cell 188, 1100–1118.e17 (2025).
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res. 49, W293–W296 (2021).
Buchfink, B., Reuter, K. & Drost, H.-G. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods 18, 366–368 (2021).
Kühl, M. A., Stich, B. & Ries, D. C. Mutation-simulator: fine-grained simulation of random mutations in any genome. Bioinformatics 37, 568–569 (2021).
Golomb, S. W., Gordon, B. & Welch, L. R. Comma-free codes. Canad. J. Math. 10, 202–209 (1958).
Hauser, M., Steinegger, M. & Söding, J. MMseqs software suite for fast and deep clustering and searching of large protein sequence sets. Bioinformatics 32, 1323–1330 (2016).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Lu, J. et al. Metagenome analysis using the Kraken software suite. Nat. Protoc. 17, 2815–2839 (2022).
Kuznetsov, A. & Bollin, C. J. NCBI genome workbench: desktop software for comparative genomics, visualization, and genbank data submission. Methods Mol. Biol. 2231, 261–295 (2021).
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Luebbert, L. & Pachter, L. Efficient querying of genomic reference databases with gget. Bioinformatics 39, btac836 (2023).
Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).
Zulkower, V. & Rosser, S. DNA Chisel, a versatile sequence optimizer. Bioinformatics 36, 4508–4509 (2020).
Hughes, T. K. et al. Second-strand synthesis-based massively parallel scRNA-seq reveals cellular states and molecular features of human inflammatory skin pathologies. Immunity 53, 878–894.e7 (2020).
Gálvez-Merchán, Á., Min, K. H. J., Pachter, L. & Booeshaghi, A. S. Metadata retrieval from sequence databases with ffq. Bioinformatics 39, bta2667 (2023).
Wolf, F. A., Angerer, P. & Theis, F. J. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 19, 15 (2018).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://doi.org/10.48550/arXiv.1303.3997 (2013).
Svensson, V., da Veiga Beltrame, E. & Pachter, L. Quantifying the tradeoff between sequencing depth and cell number in single-cell RNA-seq. Preprint at https://doi.org/10.1101/762773 (2019).
Booeshaghi, A. S. & Pachter, L. Normalization of single-cell RNA-seq counts by log(x + 1) or log(1 + x). Bioinformatics 37, 2223–2224 (2021).
Ondov, B. D., Bergman, N. H. & Phillippy, A. M. Interactive metagenomic visualization in a web browser. BMC. Bioinformatics 12, 385 (2011).
Kuleshov, M. V. et al. Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Res. 44, W90–W97 (2016).
Gene Ontology Consortium. et al. The Gene Ontology knowledgebase in 2023. Genetics 224, iyad031 (2023).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol. 57, 289–300 (1995).
Ostendorf, B. N. et al. Common human genetic variants of APOE impact murine COVID-19 mortality. Nature 611, 346–351 (2022).
Edgar, R., Domrachev, M. & Lash, A. E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Luebbert, L. & Pachter, L. S. Efficient and accurate detection of viral sequences at single-cell resolution reveals novel viruses perturbing host gene expression. CaltechDATA https://doi.org/10.22002/KRQMP-5HY81 (2024).
Luebbert, L. & Pachter, L. Efficient and accurate detection of viral sequences at single-cell resolution reveals novel viruses perturbing host gene expression (continued). CaltechDATA https://doi.org/10.22002/K7XQW-88D74 (2023).
Luebbert, L. et al. GitHub repository containing the source code for the manuscript ‘Detection of viral sequences at single-cell resolution identifies novel viruses associated with host gene expression changes’. Github https://github.com/pachterlab/LSCHWCP_2023 (2023).
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).