Skip to main content

New Data Portal Empowers Microbiome Researchers

The NMDC Data Portal applies FAIR principles to microbiome data

Sarah Wong |
Image of an outstretched hand with tiny, spiky, multicolored objects hovering overhead. The National Microbiome Data Collaborative supports multi-omics data.
The National Microbiome Data Collaborative supports multi-omics data exploration across diverse microbiomes. (Image by sdecoret | shutterstock.com)

Microorganisms exist across all environments. They can be found in the greatest depths of the oceans, frozen in ancient permafrost, and inhabiting volcanic lakes. They even thrive within the human body. Each of us harbors roughly 10–100 trillion helpful microbes within us.

With such a diversity in organisms, it’s no wonder that the research around them is also varied. Scientists may treat their biological samples differently, store their data in different formats, or analyze the data using different methods. This variation can make it difficult for one researcher to understand or interpret the microbiome data produced by another.

Researchers from the Environmental Molecular Sciences Laboratory (EMSL), the Joint Genome Institute (JGI), Los Alamos National Laboratory, and Oak Ridge National Laboratory created the National Microbiome Data Collaborative (NMDC) Data Portal to address this problem. Details of its development were published in Nucleic Acids Research.

“We wanted to democratize microbiome data,” said Lee Ann McCue, computational scientist and the chief data and analytics officer at EMSL. “We realized we needed a cross-cutting and integrative solution to share our microbiome data with the scientific community at large.”

The data portal provides ways to standardize and integrate microbiome data. By doing so, it accelerates researchers’ efforts to understand how microbiomes respond to and modify their environments and how they can be harnessed for sustainable bioenergy solutions.

Illustration of long orange and blue microbes. Microbes exist in almost all environments across the globe.
Microbes exist in almost all environments across the globe. Some form symbiotic relationships with plants. (Illustration by Cortland Johnson and Natalie Sadler | Pacific Northwest National Laboratory)

Cross-laboratory collaboration

As user facilities, EMSL (located at Pacific Northwest National Laboratory) and JGI (located at Lawrence Berkeley National Laboratory) share their resources with the scientific community through a competitive peer-reviewed process. The two facilities synergized their research capabilities by creating the Facilities Integrating Collaborations for User Science (FICUS) program. FICUS allows users to pursue a research project across multiple Department of Energy user facilities.

EMSL has exceptional strengths in proteomics, metabolomics, and chemical characterization of organic matter. JGI specializes in genomics and transcriptomics. The JGI-EMSL FICUS program applies the powers of each user facility onto one proposed research project.

Each facility maintains their respective data through different online portals. These include the JGI Genome Portal, the Integrated Microbial Genomes and Microbiomes, and EMSL’s NEXUS data repository. Additionally, published research arising from the FICUS program may distribute data through a suite of public data repositories, such as PRIDE, MetaboLights, and GenBank.

Authors of a particular study should be deeply knowledgeable about their data. They know exactly how to access and analyze it. However, when data is dispersed across these different repositories, it may be difficult for others to find and access. Additionally, when it is analyzed using different methods, the data may be hard to interpret or understand.

The NMDC Data Portal solves this problem by integrating and standardizing the data while making it accessible from a single location. By doing so, the data portal brings the FAIR principles—findability, accessibility, interoperability, and reusability—to microbiome data.

graphic with bluish purple bar graphs and a map of the United States. The NMDC Data Portal brings together metagenome, metatranscriptome, proteomics, metabolomics, and organic matter data in one place.
The NMDC Data Portal brings together metagenome, metatranscriptome, proteomics, metabolomics, and organic matter data in one convenient location. (Image from data.microbiomedata.org)

Making microbiome data FAIR

Using FICUS projects as a starting point, the NMDC Data Portal was created as a resource for users to access all the microbiome data for a particular biological sample in one place. As researchers were developing the data portal, they realized that they needed to establish protocols around sample management and data housing as well. This provides consistency between the two user facilities. As a result, the data portal supports the tracking, integration, and reuse of samples between EMSL and JGI.

The NMDC Data Portal was designed to directly engage and support the scientific user base. The NMDC team conducted extensive interviews with the scientific community to uncover the best ways to design the portal and display the data. “We wanted to make sure that the features of the data portal met our goal: making microbiome data findable, accessible, interoperable, and reusable,” said McCue.

As a result, the NMDC Data Portal features a convenient and easy-to-use interface that enables users to search, access, analyze, and download a wealth of microbiome data. This includes metagenome, metatranscriptome, proteomics, metabolomics, and organic matter chemistry data.

The team also recruited ambassadors to spread the word among their respective research communities. The current 2021–2022 NMDC ambassadors include PhD students, postdoctoral researchers, and assistant professors. “It was especially important for us to reach out to early career researchers regarding this data portal,” said McCue. “This resource empowers researchers to go in and start making queries immediately. It removes some of the barriers to working with microbiomes, such as needing to know how to process ‘omics data, or how to first find and then combine different data types.”

Though the NMDC Data Portal currently only contains data from EMSL and JGI, the researchers behind it have plans to expand it. Data from the Earth Microbiome Project will be included in a future release, and the team plans to continually add data from other resources.

“This is an evolving process,” said McCue. “We will continue to refine the data portal to meet the needs of researchers. We want this resource to be developed both with and for the community.”

This work is supported by the Genomic Science Program in the Department of Energy, Office of Science, Biological and Environmental Research program.