Skip to main content
Science Areas
Computing, Analytics, and Modeling

EMSL User Project Using AI to Advance Discoveries in Protein Folding

AI platform could potentially lead to revelations in biological research

Maegan Murray |
Hand holding light bulb and smart brain inside and innovation icon network connection on dark blue city background, innovative technology in science and industrial stock photo

Pernilla Wittung-Stafshede, a professor at Chalmers University of Technology in Sweden, is applying artificial intelligence capabilities at the Environmental Molecular Sciences Laboratory to identify how proteins fold. (Illustration provided by iStock)

The launch of ChatGPT in the last year has skyrocketed interest in artificial intelligence (AI).

It has revolutionized how customers interact with businesses by generating automated responses catered to customer interests. Certain platforms can generate realistic images and video that are used to illustrate processes, products, and services. It can also sort through mass datasets to identify trends and target unusual data points, helping identify problems and potential solutions for larger issues.

And while AI has a long way to go before programs and platforms are reliably and wholistically self-sufficient, it has already helped expedite a variety of processes and programs that were not possible until now.

At the Environmental Molecular Sciences Laboratory (EMSL), computational researchers have taken a deep dive into exploring how AI can support scientific discovery through examining processes and components at the smallest of scales. Researchers are using AI to identify proteins like never before using a new computational program called PeakDecoder, which was developed by EMSL Computational Scientist Aivett Bilbao. Through Model-Experiment (ModEx) integration, they are integrating experimental measurements (e.g., soils, rhizosphere, and biologic and anthropogenic emissions) into computational and modeling frameworks either directly for scale-appropriate models or through parameterizations. Researchers are also using AlphaFold to make protein structure predictions, just to name a few.

Through the EMSL User Program, scientists from around the world are also working with staff researchers and a range of EMSL computational technologies to expedite scientific discovery through AI. This year, one such user project led by Pernilla Wittung-Stafshede, a professor at Chalmers University of Technology in Sweden, is using AI to identify how proteins fold—a feat that has stumped scientists for years. 


Pernilla Wittung-Stafshede
Pernilla Wittung-Stafshede's research seeks to understand how proteins get to their folded structures. (Photo provided by Pernilla Wittung-Stafshede)

“When AlphaFold came out a few years ago, it allowed us to predict the folded structure of nearly every protein using AI,” Wittung-Stafshede said. “Many [people] said that solved the ‘protein folding problem.’ This is wrong. It solved the ‘protein structure problem,’ which indeed is a huge feat. But we don’t know how the proteins get to their folded structures.”

Knowing how proteins fold, Wittung-Stafshede said, serves as a base for understanding how normal cells function compared to those affected by disease and other factors. She said without knowing how they fold, researchers can’t fully understand how proteins’ functions are affected by the surrounding environment.

Listen to the EMSL podcast, Bonding Over Science, to hear about Wittung-Stafshede's project and EMSL's AI resources.

Support of EMSL computation to crack protein folding

Wittung-Stafshede had tried and failed to get funding in previous years to support an AI-based idea to uncover how proteins fold. The project required collaboration with computational experts and did not fall within Wittung-Stafshede’s ongoing projects.

Last year, Wittung-Stafshede submitted a Large-Scale Research proposal to access EMSL and its computational technologies and was awarded funding for her user project. She is now working with Margaret Cheung, an EMSL computational scientist and the team in the Computing, Analytics, and Modeling science area, to develop a program that aims to examine, detail, and visualize how proteins fold.  Wittung-Stafshede’s idea is to merge AlphaFold structural prediction capacity with experimental folding kinetic data. With this work, the team aims to create models that can predict how proteins fold and interpret fundamental factors behind protein folding.

Margaret Cheung
EMSL computational scientist Margaret Cheung is working with Wittung-Stafshede to help develop a program that aims to examine, detail, and visualize how proteins fold.  (Photo by Andrea Starr | Pacific Northwest National Laboratory)

AlphaFold, Cheung said, is an excellent tool that gives a translation of what is one-dimensional code into a three-dimensional structure. With Wittung-Stafshede’s project, however, they strive to give those proteins functional meaning.

“For example, she wants to understand how a particular protein will respond in terms of to an extreme or pathological environment,” Cheung said. “Because of the sheer amount of parameters to vary, that feat would be nearly impossible with just human effort. It would take a long time.”

Specific interest in how metal affects protein folding

Wittung-Stafshede is particularly interested in how metals affect protein folding in cells. Metal ions, she said, can bind to unfolded proteins and steer the folding process. If this happens in cells (like in test tubes), scientists need to understand and learn how proteins fold. Many metal ions, including copper, are essential for living organisms and act as cofactors in important enzymes.

But metal ions can also be dangerous to the cell, she said. AlphaFold, is currently limited because it only focuses on the polypeptide. It cannot yet predict metal-binding sites in proteins.

Wittung-Stafshede’s project focuses on a copper-binding protein called azurin. The idea is to develop a folding program that takes metal ions, as well as chemical modifications, into account. The next protein to study is another copper-binding protein called lytic polysaccharide monooxygenases (LPMO) that is used to accelerate biomass degradation. The hope is to identify ways to improve the LPMO protein for the benefit of biofuels production, she said.

“If we can understand the folding rules, we can improve the protein by engineering,” Wittung-Stafshede said.

An AI platform that can tell the rules of protein folding could potentially lead to revelations in research on neurodegenerative diseases like Alzheimer’s and even cancer, she said. Nearly all diseases are connected to proteins not folding in the way they should.

Challenges with the future of AI

While there are many positives that come with the advent and increased use of AI and machine learning technology, it also comes with some downfalls, Wittung-Stafshede said.

Platforms like ChatGPT, she said, still have a lot of room to learn and grow in regard to providing and verifying information. There need to be standards, limits, and checks in place to be sure that the AI is providing accurate information, she said.

“For example, I’ve asked ChatGPT about myself and it doesn’t know who I am really, it just makes up a fake CV,” she said. “We need to think critically about how to interpret the information, and we must verify it. Everyone needs to learn about possibilities and limitations with various AI platforms.”

Wittung-Stafshede said AI technology will help speed up many processes, including using the tool to analyze and distill more research papers so researchers can keep up to date on the many scientific advancements in various fields.

“But on the other hand, how do you know that all of that information is pulled from reliable sources and is entirely accurate?” she asked.

Cheung posed the same questions.

“I think the concern about ethics is real, especially in a time where misinformation is rampant,” Cheung said. “Knowing what information is real and what is not could be a challenge. We need to come up with ways to validate the information.”

As they craft their program to identify how proteins fold, they are constantly discussing parameters to make sure the information pulled from both existing and new protein datasets is accurate. That is just as important when they are using the software to generate models to show and predict trends in protein folding.

“AI is a powerful tool to solve pending challenges in the human society as a whole, which are driven by changes in climate and other factors,” Cheung said. “It is important to teach the algorithms how to gather and accurately interpret relevant data.”