Skip to main content

New Combo of Spectroscopy, Machine Learning, and MONet Data Unearth Microbial Soil Predictions at Continental Scale

Researchers from University of Wisconsin-Madison partnered with EMSL for new insights

Maegan Murray |
Photo of UW-Madison PhD student Soni Ghimire standing in a lab setting, smiling at the camera.

Soni Ghimire, a PhD student at the University of Wisconsin-Madison (UW-Madison), is part of a team of researchers at UW-Madison that experimented with using a combination of mid-infrared spectroscopy (MIR), machine learning methods, and data from EMSL's Molecular Observation Network (MONet) to see if they could make new and reliable predictions about microbial communities on a continental scale. (Photo courtesy of Soni Ghimire)

Insights into microbial community composition, diversity, and behavior can be used to make important predictions about soils. These communities act as indicators for soil health, influence how nutrients flow in an environment, and govern soil carbon sequestration, making them important players in biogeochemical processes.

While scientists understand how important microbial communities are for predicting the future fate of carbon, there is currently limited availability of microbial functional data needed to properly calibrate biogeochemical models for accurate large-scale predictions. The collection of these data requires tremendous amounts of sampling and analysis from a variety of sites, which requires extensive time, resources, and funding.

To help combat this issue, a team of researchers from the University of Wisconsin-Madison (UW-Madison) led by principal investigator Zachary Freedman experimented with using a combination of mid-infrared spectroscopy (MIR), machine learning methods, and data from the Environmental Molecular Sciences Laboratory (EMSL) Molecular Observation Network (MONet) to see if they could make new and reliable predictions about microbial communities on a continental scale.

This new process could potentially save time and costs of sampling and analysis, said Soni Ghimire, PhD student at UW-Madison and member of the project.

"We're looking into faster and cheaper methods to estimate soil microbial properties for the improvement of continental scale soil carbon modeling and prediction programs," she said. "What we've found is that MIR spectra, used in combination with machine learning and standardized data, shows promise as an alternative method to traditional laboratory techniques."

The team's research was published in Applied Soil Ecology.

Partnerships for New Results

MIR spectroscopy is an analytical technique that examines how molecules absorb light within the mid-infrared light spectrum. This absorption occurs due to the vibrations of chemical bonds within molecules, creating unique spectra that serve as fingerprints to identify and study molecular compositions, structures, and interactions.

The UW-Madison team used MIR spectroscopy to analyze air-dried soil samples from 67 different sites from EMSL's MONet project and correlated the spectra with soil microbial analyses to assess MIR's effectiveness.

Through MONet, researchers from across the U.S. perform soil sampling for identified sites using standardized methods, which are then sent to EMSL to be analyzed using standardized workflows. Soil samples are taken from two depths (0–10 cm and 20–30 cm) and analyzed within 48 hours by EMSL staff using traditional laboratory techniques. Both microbial and soil chemical property data is generated.

The UW-Madison team applied a partial least squares regression model (a computational machine learning method) to identify relationships between MIR wavenumbers and measured microbial properties. Ghimire said the method is particularly useful as it can handle large, complex datasets—especially those with highly correlated variables.

The researchers discovered that MIR could moderately predict essential soil and microbial properties, such as respiration, microbial biomass, and levels of soil organic carbon and total nitrogen. These traits are connected to carbon-rich compound patterns identified within the MIR measurements.

Ghimire said the technique holds great potential for the rapid estimation of soil microbial properties across diverse spatial and temporal ranges at the continental scale.

"One of the challenges in improving the accuracy of biogeochemical models for capturing carbon dynamics is the lack of sufficient microbial-explicit data for parameterizing and validating these models," she said. "MIR spectroscopy helps address this challenge by rapidly generating huge amounts of data on organic functional groups present in soil. We're able to process this information toward the prediction of biogeochemically-relevant microbial properties with the help of machine learning tools."

Importance of Consistent and Standardized Data

Ghimire said standardized data produced from consistent methodologies is a crucial component for improving existing microbial-explicit biogeochemical models. She said access to EMSL's MONet database is what provided the consistency needed to successfully validate the team's results.

"One of the best things about data generated by MONet is they're able to run analysis on samples from different parts of the U.S. following consistent methodologies," she said. "It is very difficult to find extensive data sets that use consistent methodologies."

Most soil microbial studies, Ghimire said, focus on smaller local or regional scales. When comparing various studies from different regions, the same methodologies typically aren't followed, making extrapolating meaningful results from that data difficult, she said.

Combining MONet's consistent, standardized data with spectroscopy and machine learning tools elevates more accurate predictions, Ghimire said.

"I'm deeply grateful to MONet," she said. "The long-term impact of this research is significant for strengthening our ability to model soil carbon at a continental scale, protect ecosystems, and support sustainable agriculture."

Access MONet Data and Participate in Sampling

Through EMSL's new Community Science Campaigns, EMSL holds open soil sampling proposal calls throughout the year that are based on a first-come, first-served basis (after sampling requirements are verified). To view current and future open calls, visit the EMSL proposals overview page.

All data from MONet is made available through a public and free database. Access MONet data via EMSL's Science Central platform.