Skip to main content

Integrated Data Models

Full Campaign Name:

Enabling AI-Ready Data on the Anaerobic Microbial Phenotyping Platform

Rows of workstations, each containing 18 devices housed in a modular anaerobic chamber to process and analyze microbial samples.

Gaps in the quality, consistency, and framework of standardized metadata present major challenges for supporting the development of biotechnology. These gaps need to be resolved to maximize scientific discovery possible on the Environmental Molecular Sciences Laboratory's (EMSL's) new Anaerobic Microbial Phenotyping Platform (AMP2), an automated phenotyping platform for anaerobic microbiology research.

EMSL's Laboratory Information Management System (LIMS) baseline capabilities for AMP2 are scheduled for completion in January 2026. These capabilities will require extensions to existing EMSL data services to include detailed sample provenance, anaerobe-specific experimental conditions, and standardized metadata enabling integration with external ontologies and legacy or landmark datasets.

Through this Computing, Analytics, and Modeling Community Science Campaign, EMSL is partnering with invited research community members to identify additional needed extensions and use cases for analysis-API to support AMP2. EMSL will implement the extensions expected to have the highest impacts.

After these extensions and use cases are developed, they will be made available to the research community for further feedback and through ongoing use of the AMP2 platform.         

By participating in this campaign, you will:

Participation

How researchers have been invited to participate

A panel of researchers (comprising both users and researchers from external organizations) were invited and selected to participate in a campaign planning meeting, based on their domain expertise and experience.

Required participant background
  • Possession of AMP2-relevant sample and experimental metadata, especially those that require detailed provenance, anaerobe-specific conditions, and ontology alignments, and/or          
     
  • Familiarity with the collection and stewardship of experimental metadata from high-throughput, autonomous laboratories.
How will you contribute?
  • Feedback through Community Science Meetings         
    Participants in the AMP2 campaign contributed by providing vital feedback during a kick-off community science meeting, which took place in December 2025. Discussions in that meeting highlighted the gaps in the current AMP2 metadata schema and suggested key extensions. Their input has led to additional priorities, helped refine workflows, clarified plans to validate the schema against representative use cases, and informed EMSL's capability development for broader scientific impact.         
     
  • Draft AMP2 Sample and Experimental Metadata Schema         
    Participants actively contribute to developing and testing a draft AMP2 metadata schema by ensuring its alignment with EMSL's LIMS, Molecular Observation Network (MONet)/analysis-API frameworks, and the Biological and Environmental Research (BER) program's Biological and enviRonmental Infrastructure for Data manaGement and Exploration (BRIDGE), the data lakehouse.         
     
  • Ongoing Collaboration         
    Participants will collaborate through future presentations and targeted outreach efforts to provide critical feedback, which will be integrated into the development process. Additionally, they will help shape user guides, schema definitions, and other documentation to ensure broad adoption and support ongoing advancements.

About the Campaign

EMSL's baseline data modeling capabilities for AMP2 are currently being developed. These capabilities will require metadata schema extensions to meet needs and opportunities (identified by AMP2 workshop participants) to include detailed sample provenance, anaerobe-specific experimental conditions, and standardized metadata enabling integration with external ontologies and legacy or landmark datasets. The objective of this campaign is to identify additional needed metadata schema extensions and use cases through community engagement and to implement those expected to have the highest impacts.

This campaign will then deliver a draft schema compatible with existing EMSL systems and validate it against a suite of representative use cases.

Researchers with AMP2-relevant sample and experimental metadata, especially those requiring detailed provenance, anaerobe-specific conditions, and ontology alignments, will indicate requirements/needs/objectives for the schema, which form the basis for the interoperability and accessibility of relevant data.  

EMSL staff and participants will meet on a regular basis virtually and will discuss aspects such as task progress, technical details, etc.

This campaign supports the recent presidential memo on FY 2027 national research and development priorities (PDF) and demonstrates how AI accelerates scientific discovery within Earth sciences. Schema development for this campaign will support AI-ready data workflows tailored to complex biological and environmental datasets. Workflow chaining within the data model will allow users to filter using bidirectional graph traversal (i.e., "show me the temperature at which samples with this characteristic were incubated"), further facilitating autonomous science and rapid experimental design. Improved schema design will expand ontological frameworks and will accelerate data-driven decision-making and predictive modeling for BER domains like biotechnology and environmental sciences, especially as it is later incorporated into the BER BRIDGE data lakehouse.

Campaign Timeline

  • DECEMBER 2025 – KICK-OFF MEETING        
    An initial campaign kick-off meeting identified targeted experimental metadata and the user community's needs from the campaign's first draft of the data schema.        
     
  • MARCH 2026 – PRESENT THE PROPOSED DATA MODEL EXTENSIONS        
    Present the proposed data model extensions at the EMSL-JGI Joint User Meeting (Seattle, WA) for a second round of community feedback.        
     
  • APRIL 2026 – PROTOTYPE THE DATA MODEL        
    Prototype the data model and mock up the experimental data for validation. Synchronization with the BER BRIDGE data lakehouse.        
     
  • AUGUST 2026 – REFINE THE DATA MODEL        
    Refine the data model and accommodate AMP2 first science data.        
     
  • OCTOBER 2026 – COMPLETION OF CAMPAIGN        
    Complete the campaign and deliver the schema documentation to the community.

Campaign Methods

  • IDENTIFY GAPS       
    Systematically identify the gaps between EMSL's current data model (MONet-derived and proteomics plus AMP2 extended) and AMP2 workshop requirements using user feedback and schema comparison tools.       
     
  • IMPLEMENT THE METADATA SCHEMA       
    Implement a draft AMP2 sample and experimental metadata schema, ensuring compatibility with EMSL's LIMS, MONet/analysis-API data structures, and the BER BRIDGE data lakehouse, potentially leveraging existing large language model-based conversion tools.       
     
  • MAP USE CASES TO THE DRAFT SCHEMA       
    Select representative AMP2 use cases and map them to the draft schema, validating with test datasets and automated schema validation tools.  

Solve a Big Challenge

This campaign addresses the scientific challenge of advancing data and metadata standards to enhance interoperability, accessibility, and scientific discovery on large-scale automated platforms, starting from EMSL's revolutionary AMP2 automated platform.

The current gaps in standardized metadata, infrastructure usability, and adherence to FAIR (findable, accessible, interoperable, reusable) principles impede collaborative research and biotechnology advancements. By leveraging community engagement, this campaign will prioritize and implement high-impact extensions to the AMP2 metadata schema, focusing on critical needs such as detailed sample provenance, anaerobe-specific experimental conditions, and ontology alignments. These extensions will ensure compatibility with existing EMSL systems and maximize the utility of AMP2-generated data, supporting integration with legacy datasets and external ontologies. By resolving these challenges, the campaign will deliver a validated metadata schema and foundational infrastructure that empowers researchers to conduct transformative studies in sustainable biotechnology development and other multidisciplinary domains.

Expected Campaign Outcomes

  • VALIDATED AMP2 SAMPLE AND EXPERIMENTAL METADATA SCHEMA      
    After incorporating community-prioritized extensions into the AMP2 data model and metadata schema, they will be validated against 2–3 representative AMP2 use cases (also decided with community input). Documentation and example templates will be provided to users upon the conclusion of the campaign.      
     
  • PUBLICATION OF RESULTS      
    Key stakeholders will be engaged through workshops and targeted outreach. Feedback will be systematically incorporated, and comprehensive documentation (including user guides and schema definitions) will be published to support adoption and future development. 

Advance Your Research

  • ACCELERATE YOUR SCIENCE     
    Participating in the campaign will ultimately accelerate participants' own science by providing access to a validated AMP2 metadata schema that meets the needs of their experimental workflows, enabling improved data interoperability, enhanced AI-driven workflows, and streamlined integration with BER data infrastructure to support predictive modeling and collaborative biotechnology research.     
     
  • ACCESS TO CUTTING-EDGE TECHNOLOGY     
    Participants in this campaign will gain access to cutting-edge technologies and capabilities, including the validated AMP2 sample and experimental metadata schema, schema comparison tools, and AI-ready workflows for integrating complex biological and environmental datasets. These resources will enable users to facilitate data interoperability, enhance collaboration, and utilize advanced predictive modeling tools, such as those tailored for BER-focused domains like microbial phenotyping and bioeconomic applications.     
     
  • GAIN EXPERIENCE AND KEY KNOWLEDGE     
    Participants will gain valuable experience by contributing to the development and validation of the AMP2 metadata schema, enhancing their understanding of data interoperability, ontology alignment, and schema compatibility across platforms like EMSL’s LIMS and the BER BRIDGE data lakehouse. Through collaboration, they will acquire key knowledge of integrating complex experimental workflows, AI-ready data frameworks, and predictive modeling approaches tailored to advancing their scientific domains, such as microbial phenotyping and biotechnology-related research.     
     
  • SCIENTIFIC PUBLICATIONS     
    Participants will have the opportunity to be coauthors on scientific papers about the campaign’s activities and outcomes, gaining recognition for their contributions for advancing metadata schema development and data integration, while showcasing their expertise to drive future collaborations and discoveries.

Why "Community" Science Campaigns?

Each community science campaign is intended to bring together researchers with a wide variety of expertise to tackle the same strategically identified challenges that are bigger than what an individual principal investigator or small team research effort can accomplish alone.

Contributions from the scientific community are essential to this campaign as it ensures the metadata schema reflects diverse user needs, aligns with real-world scientific workflows, and addresses high-priority challenges. Collaborative input helps refine the schema’s design, making it more impactful, interoperable, and widely applicable across BER-relevant domains.

Contacts