Skip to main content

Mass Spec Molecular Annotation

Full Campaign Name:

Advancing High-Resolution Mass Spectrometry Data Processing with Artificial Intelligence

A wave of graphs on a blue background

Characterizing and identifying the molecular components of metal-binding metabolites and proteins can help improve understanding about the microbial processes in critical minerals as well as bolster the development of technologies benefiting biotechnology.

To support these efforts, EMSL is developing agentic workflows that coordinate high-resolution mass spectrometry (HRMS) data processing, molecular annotation, and the integration of complementy experimental data for metal-binding metabolites, metalloproteins, and protein–metal complexes.

Through this Computing, Analytics, and Modeling Community Science Campaign, EMSL will partner with invited research community members to develop AI-driven molecular identification workflows using CoreMS as the underlying framework. After these workflows are developed, they will be made available to the research community through the open-source CoreMS repository.

By participating in this campaign, you will:

  • Help solve a big science problem
  • Drive positive outcomes
  • Advance your own research

Participation

How researchers have been invited to participate
  • A panel of researchers was invited to participate in the initial community science meeting based on their domain expertise, experience, and overlapping interests with the campaign topic. After this meeting to identify high-priority campaign objectives, a call for proposals was opened and shared with the community science meeting attendees.
Required participant background
  • Possess HRMS data related to metal complexes of proteins and metabolites that are important in critical minerals and materials (CMMs) or biotechnology.
  • Have subject matter expertise in current methodologies, molecular simulation techniques, and AI frameworks related to enhanced understanding of analytical results and/or compound identification by means of mass spectrometry (MS) and auxiliary techniques.
How will you contribute?
  • Identify and prioritize knowledge gaps and needs
    • Participants will help identify current barriers and Biological and Environmental Research (BER) program needs in characterizing classes of molecules and metal complexes important to CMMs and biotechnology. Together, campaign participants will help enable strategies for elucidating metal binding.
  • Supply data
    • Participants identify and help supply existing datasets related to CMMs, biotechnology, or other BER-relevant fields that can be readily used in the campaign.
  • Feedback through community science meetings
    • Participants in the community science meeting provide vital feedback on campaign progress, highlighting how the campaign contributes to ongoing advancements in CMMs, biotechnology development, or other BER-relevant fields.
  • Expert input
    • Offer expertise on current MS methods, data-processing tools, molecular annotation strategies, and agentic AI frameworks. Help refine goals for integrating HRMS with agentic workflows and advanced annotation tools, with the aim of developing more automated, reproducible, and context-aware approaches for identifying and prioritizing candidate molecules.
  • Looking forward
    • Contribute to shaping ideas around innovations and future advancements in this area.

About the Campaign

The intent of the Mass Spec Molecular Annotation campaign is to advance the Department of Energy’s priorities in biotechnology and CMM applications through innovative uses of AI, graph/network methods, and metadata to advance molecular annotation in HRMS. In particular, the campaign will leverage EMSL’s 21 Tesla hybrid Orbitrap/Fourier transform ion cyclotron resonance (FTICR) mass spectrometer in combination with AI-enabled data processing to transform how metal-binding metabolites, protein–metal complexes, and related microbial processes are characterized. The instrument’s exceptional mass accuracy and resolution generate rich, complex datasets that can reveal endogenous metabolite profiles, critical metal-binding metabolites, and intact protein–metal complexes linked to CMMs and biotechnology. However, these experiments require new, agile workflows and tools capable of handling the instrument’s performance and the complexity of the resulting data. To address this, the campaign will develop AI-driven workflows centered on the CoreMS framework, integrating agentic AI, retrieval augmented generation (RAG), graph-based molecular networks, and external code integration (e.g., BioPython, OpenMS). These workflows will automate signal processing, mass recalibration, molecular formula assignment, data-independent acquisition (DIA) deconvolution, and molecular annotation, while using contextual metadata and external databases (e.g., EMSL, Joint Genome Institute, UniProt, PubChem) to guide and constrain molecular identification.

In parallel, the campaign will create user-accessible toolkits for visualization and statistical analysis, ensuring that researchers can readily turn high-resolution data into actionable insights. The goal is to lay the groundwork for autonomous molecular identification and discovery that is tightly aligned with BER objectives in CMM management and biotechnology but could be easily adapted to other applications.

Through this campaign, participants will share HRMS data, identify knowledge gaps and research needs, and provide valuable insights into methodologies and AI frameworks for molecular annotation. They will also help refine campaign objectives, explore applications in areas like critical mineral management and biotechnology, and collaborate on innovative approaches to accelerate the discovery of metal-binding compounds and proteins.

This campaign supports the recent presidential memorandum on fiscal year 2027 national research and development priorities and the Genesis mission demonstrating how AI accelerates scientific discovery within Earth sciences. In addition, this campaign is in alignment with the Department of Energy’s thrust regarding CMMs.

Campaign Timeline

OCTOBER 2025 – CAMPAIGN TOPICS AND DESCRIPTIONS DRAFTED

  • Identify community science campaign topics aimed at solving a significant scientific challenge or filling current gaps in knowledge.

NOVEMBER 2025 – POTENTIAL CAMPAIGN PARTICIPANTS IDENTIFIED

  • Strategically identify researchers with ideal domain expertise and experience to invite participation in the upcoming community science meeting.

DECEMBER 2025 – COMMUNITY SCIENCE MEETING

  • Host a community science meeting to identify and establish the core components needed for autonomous molecular identification and discovery in BER priority science areas.

DECEMBER 2025 – CALL FOR PROPOSALS

  • Invite community science meeting attendees to submit proposals for projects contributing to overall campaign goals.

JANUARY 2026 – CAMPAIGN PROPOSAL DEADLINE

  • Deadline for invited campaign participants to submit proposals for projects contributing to overall campaign goals.

JANUARY 2026 – INITIATE WORK ON ACCEPTED PROPOSALS

  • Work begins on accepted proposals for projects contributing to overall campaign goals.

FEBRUARY–SEPTEMBER 2026 – DEVELOP AND APPLY WORKFLOWS

  • Apply collaboratively developed workflows to address key science questions refined by campaign participants.

SEPTEMBER 2026 – COMPLETE CAMPAIGN

  • Complete the campaign and deliver workflows and documentation to the community.

Campaign Methods

AI-ENABLED MOLECULAR ANNOTATION OF HIGH-RESOLUTION MS DATA
  • Develop AI- and agent-based workflows within CoreMS to process Orbitrap/FTICR data, integrating graph/network methods, RAG, and external code to enable the robust annotation of metal-binding metabolites, protein–metal complexes, and low-abundance heteroatoms. Emphasis will be on iterative prototyping, internal validation, and scalable, autonomous workflows aligned with BER biotechnology and CMM objectives.
AUTOMATE ADVANCED HRMS DATA PROCESSING WITHIN CoreMS
  • Use agentic workflows to automate foundational HRMS processing steps in CoreMS, including signal processing, mass recalibration, molecular formula assignment, and DIA deconvolution. Workflows will leverage contextual metadata and EMSL resources to standardize high-quality processing, improve reproducibility, and lower barriers to adoption through agentic workflows, documentation, and tutorials.
BUILD RAG- AND GRAPH-BASED PIPELINES FOR CONTEXT-AWARE ANNOTATION
  • Create RAG-enabled, graph-based pipelines that construct molecular networks from mass differences, isotopologues, adducts, and homologous series, and use these networks plus external knowledge (EMSL, Joint Genome Institute, UniProt, PubChem, structural/motif databases) to constrain candidate formulas and structures. Incorporate BioPython and OpenMS to enhance DigiPhen workflows for intact proteins and protein–metal complexes, improving the identification of metal-binding domains and enabling quantitative confidence scoring and cross-technique validation (liquid chromatography inductively coupled plasma mass spectrometry/liquid chromatography-mass spectrometry + FTICR).

Solve a Big Challenge

This campaign helps address the scientific challenge of efficiently identifying and characterizing metal-binding metabolites and protein–metal complexes using HRMS data. The complexity of analyzing these datasets, particularly for microbial processes related to CMMs and biotechnology, demands the development of advanced AI-driven workflows and tailored molecular annotation tools. By overcoming barriers in data processing, accessibility, and scalability, the campaign aims to enable faster discovery, delivering improved predictive modeling and more reproducible molecular characterization workflows.

Expected Campaign Outcomes

ENHANCED IDENTIFICATION OF CRITICAL METAL-BINDING METABOLITES
  • Agentic workflows will automate the processing of HRMS data to rapidly identify and characterize microbially produced metabolites that bind to critical metals.
RAPID DEPLOYMENT OF USER-ACCESSIBLE TOOLKITS
  • Highlight AI’s role in enabling the quick testing and deployment of software for data visualization, statistical analysis, and molecular annotation.
INSIGHTS INTO CMMs
  • By integrating AI-generated workflows with the spectrometer’s capabilities, the campaign expects to produce rich, multidimensional datasets that illuminate microbial contributions to CMMs. Benefits include identifying novel metal-binding compounds and complexes.

Advance Your Research

  • Accelerate your science
  • Access cutting-edge technology
  • Gain experience and key knowledge
  • Co-author scientific publications

Why "Community" Science Campaigns?

Each community science campaign is intended to bring together researchers with a wide variety of expertise to tackle the same strategically identified challenges that are bigger than what an individual principal investigator or small team research effort can accomplish alone.

With the help of the scientific community, EMSL can make sure that the AI workflows developed are addressing high-priority issues and are using the right resources to tackle the scientific community’s most pressing needs.

Contacts

Campaign leader (science domain expert): Yuri Corilo | Website bio

EMSL user program contact (logistics): Rick WashburnProposal calls