Integrated Data Models
Full Campaign Name:
Enabling AI-Ready Data on the Anaerobic Microbial Phenotyping Platform
The quality, consistency, and framework of standardized metadata are critical for supporting the development of biotechnology. Extending the existing metadata framework at the Environmental Molecular Sciences Laboratory (EMSL) to support the new Anaerobic Microbial Phenotyping Platform (AMP2) was recognized as imperative at EMSL’s AMP2 Science Workshop. This extension would maximize the scientific discovery possible with this automated phenotyping platform for anaerobic microbiology research.
EMSL’s existing data services ecosystem includes detailed sample provenance and standardized metadata. However, for anaerobic phenotyping, scientists must be able to capture anaerobe-specific experimental conditions and data on strain and media development. This campaign will build on existing EMSL data resources to enable researchers to leverage multi-omics datasets and experimental data in relation to the observed phenotype.
Through this Computing, Analytics, and Modeling Community Science Campaign, EMSL is partnering with invited research community members to identify high-impact use cases for AMP2 so EMSL can ensure that metadata capture for these are supported by the scientific data model. The refined data model and metadata framework will be made available to the research community for further feedback and through ongoing use of the AMP2 platform.
By participating in this campaign, you will:
- Help solve a big science problem
- Drive important outcomes
- Advance your own research
Participation
How researchers have been invited to participate
A panel of researchers (comprising both EMSL AMP2 Science Workshop participants and additional researchers from external organizations) were invited to participate in the initial community science meeting based on their domain expertise, experience, and overlapping interests with AMP2.
Required participant background
Possession of AMP2-relevant sample and experimental metadata, especially those that require detailed provenance, anaerobe-specific conditions, and ontology alignments, and/or
Familiarity with the collection and stewardship of experimental metadata from high-throughput, autonomous laboratories.
How will you contribute?
Feedback Through Community Science Campaigns
Participants in the initial community science meeting provided vital feedback highlighting specific priority needs for the AMP2 metadata schema and suggesting key use cases.
Help Inform AMP2 Sample and Experimental Metadata Schema
Participants’ input will contribute to ensuring that relevant use cases will be readily captured in EMSL’s data infrastructure and the Biological and Environmental Research (BER) program’s Biological and enviRonmental Infrastructure for Data manaGement and Exploration (BRIDGE) data lakehouse.
About the Campaign
Building on EMSL’s existing data services ecosystem, which was developed with the objective of being able to represent any processes that could occur in a laboratory, this campaign will expand this pre-existing ecosystem for EMSL’s new AMP2 capabilities. The objective of this campaign is to identify, through community engagement, a refined list of AMP2 metadata that should be captured in our data systems and so that EMSL staff can implement those expected to have the highest impacts, including sharing with BER partners and AMP2 users. Specifically, this requires extending the existing metadata schema to meet the following needs (identified by AMP2 workshop participants): include detailed sample provenance and anaerobe-specific experimental conditions associated with strain acquisition, media preparation, culturing, plate-based growth measurements, and high-performance liquid chromatography. These workflows are core to AMP2, hence their inclusion in this campaign, though further extensions in the coming years will be needed to accommodate additional workflows for AMP2 and later Microbial Molecular Phenotyping Capability (M2PC) platforms as they come online.
This campaign will deliver a data schema focused on the metadata for culture-to-plating workflows specific to AMP2. This schema will be compatible with existing EMSL systems and developed in parallel with efforts to represent the same workflows within EMSL’s laboratory information management system (LIMS), ensuring alignment in how AMP2 experimental conditions are recorded and passed through the EMSL data ecosystem.
This campaign supports the recent presidential memorandum on fiscal year 2027 national research and development priorities and demonstrates how AI accelerates scientific discovery within Earth sciences. Schema development for this campaign will support AI-ready data workflows tailored to complex biological and environmental datasets. Workflow chaining within the data model will allow users to filter using bidirectional graph traversal (i.e., “show me the temperature at which samples with this characteristic were incubated”), further facilitating autonomous science and rapid experimental design. This expanded schema will accelerate data-driven decision-making and predictive modeling for BER domains like biotechnology and environmental sciences, especially as it is later connected into EMSL’s LIMS and incorporated into the BER BRIDGE data lakehouse.
Campaign Timeline
OCTOBER 2025 – CAMPAIGN TOPICS AND DESCRIPTIONS DRAFTED
Identify community science campaign topics aimed at solving a significant scientific challenge or filling current gaps in knowledge.
NOVEMBER 2025 – POTENTIAL CAMPAIGN PARTICIPANTS IDENTIFIED
Strategically identify researchers with ideal domain expertise and experience to invite participation in the upcoming community science meeting.
DECEMBER 2025 – COMMUNITY SCIENCE MEETING
Host a community science meeting to identify targeted experimental metadata and the user community’s needs for the campaign’s first draft of the data schema.
MARCH 2026 – PRESENT PROPOSED DATA MODEL EXTENSIONS
Present proposed data model extensions at the EMSL-Joint Genome Institute Joint User Meeting (Seattle, WA) for a second round of community feedback.
MAY 2026 – PROTOTYPE DATA MODEL
Prototype the data model and mockup experimental data for validation. Synchronize with the BER BRIDGE data lakehouse at a virtual two-day hackathon.
AUGUST 2026 – TEST AND REFINE DATA MODEL
Refine the data model with metadata for all targeted workflows. Identify or create validation datasets for testing.
SEPTEMBER 2026 – COMPLETE CAMPAIGN
Complete the campaign and deliver schema documentation to the community.
Campaign Methods
IDENTIFY REQUIREMENTS
Systematically identify the highest priority AMP2 workflows that EMSL’s current data model
IMPLEMENT THE METADATA SCHEMA
Implement a draft AMP2 sample and experimental metadata schema, ensuring compatibility with EMSL’s data services ecosystem and the BER BRIDGE federated data resource, potentially leveraging existing large language model-based conversion tools.
MAP USE CASES TO THE SCHEMA
Select representative AMP2 use cases and map them to the draft schema, validating with test datasets and automated schema validation tools.
Solve a Big Challenge
This campaign addresses the scientific challenge of advancing data and metadata standards to enhance interoperability, accessibility, and scientific discovery on large-scale automated platforms, starting with EMSL’s new AMP2 automated platform.
Across the broader biotechnology field, insufficient metadata standardization, limited infrastructure usability, and inconsistent adoption of FAIR principles—findability, accessibility, interoperability, and reusability—impede collaborative research and advances. By leveraging community engagement, this campaign will prioritize and implement high-impact extensions to the existing metadata schema, focusing on critical AMP2 needs such as detailed sample provenance, pre-culturing steps, and anaerobe-specific experimental conditions. Building on the existing data model and metadata framework at EMSL will ensure compatibility with existing EMSL systems and maximize the utility of AMP2-generated data. The campaign will deliver a validated metadata schema and foundational infrastructure that empowers researchers to conduct transformative studies in sustainable biotechnology development and other multidisciplinary domains.
Expected Campaign Outcomes
VALIDATED AMP2 SAMPLE AND EXPERIMENTAL METADATA SCHEMA
After incorporating community-prioritized extensions into the AMP2 data model and metadata schema, they will be validated against representative AMP2 use cases identified with community input.
DISSEMINATION OF RESULTS
Key stakeholders will be engaged through workshops and ongoing targeted outreach (for example, via hackathons with our BER partners at BRIDGE, or virtual exchanges with those developing similar schemata in parallel). Feedback will be systematically incorporated, and comprehensive documentation, including user guides and schema definitions, will be made public to support others’ ability to understand and access AMP2 data.
Advance Your Research
Accelerate your science
Human and agent-legible data access
Why "Community" Science Campaigns?
Each community science campaign brings together researchers with a wide variety of expertise to tackle the same strategically identified challenges that are bigger than what an individual principal investigator or small team research effort can accomplish alone.
Input from the scientific community is essential to this campaign because it ensures that the metadata schema reflects the full spectrum of BER science, aligns with real-world scientific workflows, and addresses high-priority challenges. Collaborative input helps refine the schema’s design, making it more impactful, interoperable, and widely applicable across BER-relevant domains.
Contacts
Campaign leader (science domain expert): Maia Kapur | Website bio
EMSL user program contact (logistics): Rick Washburn | Proposal calls
