If Montana Smith were to predict a few years ago where she would be today in her career, she would not have said data science.
The former Earth scientist was well into a career focused on soil microbial ecology at the Environmental Molecular Sciences Laboratory (EMSL)— seven years to be exact—when the COVID-19 pandemic hit and sent most people home. It was in this period that Smith spent a chunk of time organizing and developing workflows for various data-heavy projects. This proved to be her calling: blending science and data organization.
“The whole time I’ve been at the Lab, mostly as a soil microbial ecologist, I did experiment design and carried out analysis,” she said. “However, with the various labs I’ve worked in, I’ve always kept good metadata. I’m also good at data management and file management.”
Smith officially made the career pivot from Earth scientist to data scientist at EMSL about a year ago. And this month, she is being honored with the Dawn Field Award for Outstanding Contributions to Genomic Standards by the Genomic Standards Consortium (GSC). She is the second person to earn the award.
Smith said it is a true honor to receive the award as she such a strong believer in the organization’s mission and objectives.
“Setting and streamlining standards is a key piece in advancing science and providing access to data for individuals who may not be able to access and observe that data,” she said. “Plus, it’s the first time that I’ve earned an award for my work. It feels pretty great.”
With the award, Smith is specifically being recognized for her contributions to metadata standards and the creation of a stable isotope probing (SIP) data checklist that that she completed with collaborators at Purdue University, Northern Arizona University, and KBase—an open software and data platform that aims to enable researchers to predict and ultimately design biological function.
The SIP checklist is used for experiment setup, data tracking and processing, as well as data access. SIP, Smith said, has a lot of complex data that requires a streamlined organizational structure to make sure researchers get the most out of their data.
Smith will accept the Dawn Field Award at the GSC’s annual conference in Bangkok, Thailand, which takes place Aug. 7 – 11. Smith will also give a presentation on August 10, at the conference about her ongoing work.
Since initially diving into the world of data science, Smith has worked on a range of projects to help streamline experiment setup, data management, and data access. Beginning at EMSL, Smith has worked with a variety EMSL users to determine their sample types, standardize the metadata they need for their samples, and identify how to reproduce their data in a standardized way. She now leads the metadata team at EMSL.
Lee Ann McCue, then EMSL chief data officer, brought Smith to the National Microbiome Data Collaborative, where the group was working out what standards they wanted to implement to streamline data management and access, how to implement those standards, as well as how to get researchers to adopt and abide by those standards.
“I started attending working groups and calls to talk about new packages and a checklist, which is where users get tripped up,” Smith said. “We did a lot of research on usability testing, where we created templates and had researchers tell us where things did and didn’t make sense.”
Smith said it proved to be a fun challenge, especially as she knows from experience how difficult data setup and management can be as a non-data scientist from her experience as a soil ecologist.
“I have a lot of first-hand research experience on sample collection and can tell you what doesn’t make sense from that end of things,” she said.
Smith said her background propels her to ask the right questions in rooms full of others who do other areas of research.
“I definitely feel like I found my niche that allows me to apply my experiences in the lab from the past,” she said. “My favorite thing about working at EMSL [and Pacific Northwest National Laboratory] is that if you find something interesting, you can generally find someone to work with you to explore that opportunity. That is exactly what happened with me. If you decide you want to reinvent yourself, you can. I did.”
The need for streamlined data organization
Looking back on her career so far, Smith said she is most proud of the work she has done to streamline data management and make processes easier for users and other fellow researchers. In her work with the GSC, she has had the opportunity to help standardize and refine processes to make it easier for scientists across the world—not just those within the individual research groups she is a part of.
“If you look at the GSC, people do different types of sequencing. With that, there needs to be standards that streamline that work for everyone,” she said. “Some individuals may not be familiar with what the GSC already has. And some people run into problems with data validation and consistency.”
Smith said she has worked closely with Mark Miller of Lawrence Berkeley National Laboratory to link machine learning with data setup and organization to streamline data processing and access with the GSC. Her main contribution has been working with subject matter experts in SIP to streamline data management—the area in which she is being recognized.
Future of data science and management
Moving forward, Smith said data checklists, processes, and standards will be vital as the world embraces machine learning to conduct more processes autonomously that were formally done by humans. She said it will help expedite scientific discovery.
“You can’t do any kind of machine learning without any kind of standard,” she said. “For example, if one person calls it soil and another person calls it dirt, the machine won’t know what you’re talking about. That is obviously a gross understatement of the data that we’re talking about, but it’s a good basic visual example of what we’re trying to accomplish.”
Smith said researchers would miss out on a huge corner of research if standards did not exist that were interoperable. That’s why she’s elated to have a part in the larger scope of making data more FAIR: findable, accessible, interoperable, and reusable.
“I am really excited about not only the opportunities in developing and improving metadata standards and data interoperability, but I also look forward to the next steps,” she said. “It will be exciting to see the publications of not only users, but also the broader scientific community because of the steps we took to better data interoperability.”