On episode six of Bonding Over Science, host Dawn Stringer explores the role of artificial intelligence in understanding protein folding. Stringer speaks with EMSL user Pernilla Wittung-Stafshede about the positive and negative effects of using AI not only in scientific research, but in everyday life. What’s your take?
Dawn Stringer: Have you ever used artificial intelligence?
AI, or artificial intelligence, refers to the development of computer systems that can perform tasks that typically require human intelligence, such as perception, learning, reasoning, and decision-making. The goal of AI is to create machines that can function autonomously, adapt to new situations, and improve their performance over time.
While AI has the potential to revolutionize many aspects of our lives, there are also concerns about its impact on employment, privacy, and security. Therefore, it is important to approach the development and deployment of AI in a responsible and ethical manner.
I’m Dawn Stringer, and the description you just heard was generated by artificial intelligence. Let’s learn more about it and how it’s helping the science world learn about biofuel production and more... on this episode of Bonding Over Science.
Dawn Stringer: Artificial intelligence is capable of so many things.
But AI is proving to have a place in the science world. Today’s guest and EMSL User knows this first hand. She sought the help of EMSL experts and capabilities through an open call for proposals to use AI to learn about protein folding pathways.
Pernilla Wittung-Stafshede: My name is Pernilla Wittung-Stafshede, I’m a professor in chemistry, or biochemistry, in Sweden at the University of Chalmers, it's in Gothenburg, Sweden, and I have been faculty at several universities around the world. I spent a lot of time in the U.S. before I moved back to Sweden actually, now, many years ago. But my research has always… I started out doing protein folding and then from more from a basic point of view, mechanics and kinetics and trying to understand how this protein folds and that has kind of led to, you know, going over to protein misfolding and aggregation and amyloids and neurodegeneration. Also going into metalloproteins, how metals bind to proteins and what that means for a cell.
And here we stumble on some copper proteins that actually play a role in cancer. So now today I don't do much protein folding, but it's all kind of evolved here to understanding how proteins work and actually not work when diseases occur. So I'm an experimentalist, I don't do any computations, but I collaborated with computationalists or theoreticians to you know, match or combine my experiments with simulations of what goes on.
And I didn't know anything about EMSL in the beginning. But I know Margaret Cheung, who is my expert in charge of my project now at EMSL. We collaborated when she was faculty in Houston and I also was faculty in Houston many years ago. But then she told me, you know, I'm here now. They have these great projects you can apply for, you know, support to do a project with us.
And in principle, I could perhaps help you through a project that way, and we could work together again. And I had been thinking for some time that AI is coming more and more. This AlphaFold came out a few years ago, and it was this big bust, right? Now you can predict all the structures of all the proteins using AI.
And a lot of people then said when they interview me as well, like now we solved the protein folding problem and to some degree that's true. We have solved the protein structure problem. So we know how proteins look like much more. And I mean, it's amazing how many structures we have, but there are limitations. First thing, we don't know how the proteins get to that structure.
The folding pathway, when you make a chain or peptide and they should fold up to this final structure, and this pathway is important because proteins unfold and refold all the time in a cell. I mean, and then they get stuck in some misfolded conformation. So we need to know about the dynamics in order to understand diseases or normal function in the cell. And AlphaFold cannot tell us that because it gives you one structure. The other thing is that AlphaFold don't know anything about metals. So almost half of all the proteins we have in the human body in any organism binds a small metal ion for function. It could be copper, it could be iron, it could be zinc, it could be something else.
And AlphaFold only cares about the polypeptide. So I mean, that's I know people are probably working on this trying to improve AlphaFold to incorporate interactions with other things that are not protein in itself. But many of these metal ions affect the structure, affect the folding, and affect the dynamics of the protein. So when I thought about this and I started to think about, you know, how do we learn protein folding?
It was like this big thing, when I started my postdoc many years ago, you know, everybody wanted to know how fast a protein would fold, which would also solve the protein folding problem. And most of those researchers, including myself, has now moved on to other areas. People study proteins in a cell or you study amyloid formation, or you study liquid phase separation, but nobody really solved the protein folding problem.
Then I remembered, and I applied for some other funding in Europe, with some others, but we didn't get it, but that's what kind of the stage was when Margaret asked me that, I thought about we have some experimental data on folding mechanism, the kinetics or the dynamics. Can we use that kind of experimental data to help the prediction models?
So they would, instead of just giving us a structure, would learn something about the pathway there. So that was my idea in principle to do AI ML (machine learning) all these computational programs that I don't know much about, but feed that with experimental data that says something about the reaction mechanism. So then I proposed it right in this proposal, and the idea is to mostly use my own data that I did many years ago, and there are also experimental data for other proteins published that you can kind of start with. If we come to a model in computations that suggests something, we can go back to the lab and I can do that and do more experiments and we can feed more experimental data. But the idea is really to like use the power of AI, but make it better by like telling it something more than what we do today.
Dawn Stringer: Focusing on your project, can you explain the importance of these folding binding proteins with metals?
Pernilla Wittung-Stafshede: First thing when I started to look at protein folding, we found that—really starting my own career, really—that I found that metal ions can bind to unfolded proteins so before they fold the metal combines. So then the idea came that maybe metal binding could be the trigger of protein folding and can guide folding to the right structure.
And we have over the years seen that many metal ions, if they bind early during folding, they affect the folding pathway so they can steer the protein. You know, you reach the final structure many times, but the way there is different. So that's one thing. And then these metal ions are important to a function in the final folded structure, but metal ions are also dangerous to a cell, so it’s very important that the cells handle these metals really carefully.
So there's not a lot of metals floating around in a cell, it’s all like dedicated transport proteins that move the metal to the right place and protein and deliver it. So over the years, we also looked at a lot of these transport pathways how for example, copper, which is the metal ion that I've studied a lot, how is that moved?
You know, you get into a cell, but then how do you move it to the right protein that needs it for some catalytic activity? So metals are really basic kind of components of many enzymes to do chemistry, but they can also do bad things. So then you get this balance of like too much or too little, it’s all bad.
And then if something goes wrong in the cell, for example, when you have neurodegenerative disorders like Alzheimer’s and Parkinson’s, there’s a lot of, it's a lot of bad processes that happens when these proteins aggregate. But among that is oxidation or oxidative stress, and that can release metal ions from, you know, these different proteins and they can catalyze free radicals, reactive oxygen species.
And then it kind of, it's a bad cycle or bad things happening and it just gets worse. So we need to control these metals so we can also think about, you know, by knowing more about how to bind metals, you can find key letters that can kind of pick up bad metals. So, so I mean, metals are important.
They play a role in folding. And it turns out that copper binding proteins, copper transport proteins, and copper binding proteins, they are very important in cancer because cancer is very active. Cells are very active. So you need a lot of activity, therefore you need a lot more copper because the enzymes that do this activity need copper. So then cancer cells have to survive in an environment where you have a lot of copper.
And that could also be the bottleneck to kind of block cancer, right. If we can remove copper from those cells, maybe we can kind of stop cell migration. If you come to neurodegeneration and Alzheimer's and Parkinson's, proteins aggregate and specific proteins in each disease. But these proteins can bind copper, at least in the test tube, these copper ions speed up aggregation.
So there are ideas that maybe metals get loose in a cell, triggers the proteins to start to aggregate. And now you're into the disease because you have made the first seeds of, you know, aggregate. But there's a lot of unknown because then you have to think about these transport proteins in a biological setting, what comes first, and so on.
But I think really we need to study metals much more than we do. Early on when I started to do protein folding and that was kind of my, my thing why I got interested is that everybody studied proteins, but they removed all these extra things like a metal or a cofactor because you just wanted the polypeptide to make it as simple as possible.
So I kind of added back these kind of what I thought was these important factors because they will play a role. So that, I mean, I could say one more thing and so, so the protein I have in my project here that we focus on as a start is called azurin. And it's called the blue copper protein.
It binds a copper ion. So it is a metallic protein. And we use that as a model system to begin because we have done a lot of experimental work on this protein. I've studied folding of this protein with the metal, without the metal, you know, and in cell-like conditions, adapted this and that you know so we know a lot of these experimental parameters, but it's also a copper binding protein.
And thinking about EMSL’s mission about, you know, bioenergy or sustainability, there are several copper proteins that are important in degradation of biomass, for example. And one particular protein is called LPMO (Lytic polysaccharide monooxygenases) so but that's a copper binding enzyme that is used actually in, in industrial settings right now to catalyze kind of degradation of cellulose, for example.
So if we can understand that protein, which would be the next target for us in this proposal, we can maybe improve it so it's better because it's making a huge difference. But if it can make it more stable, if you can understand what's its limitations is, you can improve those. That's why I think that this model system is actually really helpful for also like a long-term purpose of actually using this type of research to something valuable, but also because many proteins bind metals, not just copper but other metals.
If we learn how to figure out the interface between kind of inorganic chemistry and metal interaction and proteins and all the AI and structures and things we have there, you know, we need to merge these two parts.
Dawn Stringer: Now, can you talk about if you're generating this code from scratch or is AI helping with this generation?
Pernilla Wittung-Stafshede: We are using several existing programs or codes, and here we collaborate with a team member or faculty at AML, which is another national lab, Argonne National Lab, and he has some codes that he has used AlphaFold 2, which is a better version of AlphaFold, and that's transferred to EMSL. And then we try to merge that with another code that goes for how you actually put in a metal in a protein because then you have different types of bonds so you need another type of way to represent that.
And then we want to work on this so we can do in principle in these simulations to fold the protein within these programs. And then we want to put in these experimental values that says something, you know, during folding this happens in the middle, but then we can also do more simulations to generate kind of computer values that possibly could represent that folding reaction and that can also go into the code.
Okay. That was a long story saying that I think what we are making will be new, but we use pieces that exist.
Dawn Stringer: Pernilla, because EMSL has a huge focus on environmental and biological science, can you talk about the application of this project in regard to biofuels and revolutionizing the biofuels industry.
Pernilla Wittung-Stafshede: In principle, degrading biomass, I mean, you have kind of a cocktail of things that you add, and there's a lot of different enzymes that we know of that can degrade biomass. And there's a lot of research. Some of my colleagues in my department actually look at a different set of enzymes that catalyze certain bonds in a piece of wood in principle or cellulose into smaller pieces. But it's really tricky because it's very, I mean, it's a tough material in a way, to take a tree and break it down into all these little pieces. Some years ago, it was found that this one protein—LPMO or lytic polysaccharide monooxygenase—that made a huge difference. And it has a copper ion.
And some people are focused to look at that specifically like, you know, what the reaction mechanisms are on the copper and you know what residues induce binding and exactly what's going on. But the point is that this enzyme can catalyze cleavage of these cellulose bonds. So between the kind of polysaccharides or other saccharides this material and it makes a huge difference.
So several plants out there now add in this protein as part of that cocktail. However, you need to draw the harsh conditions where you want harsh conditions so you would kind of speed up reaction. So if we know more about this enzyme, we can make it more robust and we can maybe improve its activity. So we know the structure of the protein.
We know that it binds copper, but you need to be able to purify huge amounts of it and separate it so you can add it. So here, I think by knowing more about the folding of that protein and when the copper comes in, how it comes in, how we can make sure it's bound so it's protected and doesn’t fall off.
That's some of the problem with this protein, that when you make it, you don't have 100% copper in there. So you want to, you know, just improve the process of being able to make it. I mean, ultimately, maybe you can even mimic it with a smaller molecule that has the same property and activity. If we understand exactly how that metal is sitting in the smallest environment, you can kind of cut out that little piece and do that as your molecule you add. Nobody's ever folded these proteins. I mean, we don't know anything about the mechanism here and what might be important kind of fluctuations in the protein when it's just sitting there in a cell doing what it's supposed to do. So here, I think that is, it's kind of a perfect target because it's one protein [that] is not too large, so you can actually do computations on it. You can do many simulations of all atoms and you can then merge it with these AI methods to understand more about both the dynamics, structural properties of the protein itself, but also the interaction with the metal because the metal is doing the whole thing.
I'm sure that if we have better AI tools for learning about protein dynamics, we can also forget about the metals, I mean, just any protein we can understand better. And there might be many enzymes that are important in environmental processes that we can learn more about and improve. The other part that I didn't say much about, but that is, in addition to metals, a lot of proteins that that we have in our bodies and also in the enzymes that are used for these kind of biomass degradation, many times they come from fungus or from different kind of weird species. They have post-translational modifications so they can be phosphorylated or they are amidated, they could be different modifications to amino acid residues in the sequence. And that's also not taken into account in the current AlphaFold or any computational program. So that's something that we thought about by adding in a metal.
If we learn that, we can add in some other modification so we can broaden this to in principle, come closer and closer to what really proteins look like when they work.
Dawn Stringer: Where are you in this project, and where do you hope to head in the near future?
Pernilla Wittung-Stafshede: In principle, we started in the fall last year. So soon in a few months we'll come to one year and now we have the computer programs kind of set up and transferred. And what we're doing right now is putting the metal into the protein. So we have in the computer the full model of this model protein azarine, that it’s called, with the metal there.
So we can start to unfold and re-fold it and do simulations. And during that stage, we can also feed in these experimental data. So we have been actually spending a lot of time setting up this whole system. But my idea, my belief, is that now when we get going on the model system, we have all the computational methods in place.
So then it's much easier to move on to the next protein. You know, I think in the fall I will start to see results. I will see folding dynamics, I will see, you know, how this protein is actually folding and it's a model system. But then we can see how that matches and then we can see how we go to like a relevant protein.
So I'm seeing results coming in now. It's been kind of, you know, waiting or getting everything in place, which I would say if I hadn't had this proposal accepted, I mean, I would never have got anybody to do this much work for me. I mean, some people in my computer or computer science department asked me oh we should collaborate and we know this, we know that, but they don't want, I mean, nobody would want to do this much prep work in a way to come to an actual result. So I think this type of programs are really unique.
Dawn Stringer: Absolutely. And it’s great that we have the ability to work with people from around the world.
Pernilla Wittung-Stafshede: You know, I think it’s really, it’s really fun and it gives me a new twist and I can see this helps me now think about other things that come up around AI, machine learning, and I’m supervising some students that is partly doing something on this. And you know, I'm learning and I get a little bit more I wouldn't say brave, but still, you know, you dare to kind of go closer and see and think in a broader way about how can I kind of exploit this even more?
Because, I mean, AI is here and we need to I mean, even if I'm not going to be an expert on AI, we need to all be aware and understand enough, right? So we can't be kind of fooled by it. So I think it's really essential for both research and society in general to be to know enough about these systems so we can kind of deal with them.
Dawn Stringer: You touched on the next thing that I really wanted to dive into. AI, especially with the launch of ChatGPT, has taken the world totally by storm. Can you talk about the role that it plays and the role it will continue to play in the world of science as well as the importance?
Pernilla Wittung-Stafshede: I mean, I think it's really important. I mean, there's so much discussion now on all various levels, right, about ChatGPT. But AI in general, most people don't really know what they mean when they use these words. But I mean, ChatGPT has really changed I think it brought it into everybody's house. And if you think about universities, I mean, cheating on tests and how to write, even write your paper and using ChatGPT to write certain things.
I mean, I think it's definitely a limit. And I mean, I've asked ChatGPT about myself and it doesn't know who I am really and it lies. And then it just argues I say you're wrong, ChatGPT, and it said, Oh, I'm just an AI machine. I don't know enough. Yeah. So, I think we all need to not just believe it.
We need to understand how these answers come by, right? So we can interpret that. Then we need to live with it. I mean, there's no way we can ban it from everybody. So I think in society it's important and it would be interesting to see how it goes because I mean, there are people out there that also think we should ban it to a certain level and others say that we need to kind of, you know, set rules for how to use it.
I think in terms of research, we need to use it to our advantage. And we can, I mean, it would speed up a lot of things. Maybe you can use it to you know, you just do the most important experiments. You don't need to do as many. You still always will need experiments. But you can do the crucial ones and you can be smart.
I mean, I'm thinking I want AI or a ChatGPT to be able to read all papers and look at what has been published and look at the figures, interpret the data and then kind of, you know, draw conclusions from that. What is good data? Bad data? What's the you know, what's the answer to something? Because that's another thing, right?
There's so much literature out there. So as a scientist trying to keep up with everything that's published, that's really hard. And then some things are published that are not, you know, that are simply wrong. And of course, that's worries too, because the Chat or an AI bot doesn't know what's right or wrong. You just take everything in that you can see.
So somehow it's almost like this has to come on, you know, everybody's kind of education in the future just so we can handle it. And that's why I think also for us, you know, these are the things that we, I’m doing here and things the AlphaFold, that's just kind of the beginning and it's kind of used in a smart way, but on a confined kind of set of data or questions.
But even probably kind of coming to so many different aspects of life that we can't, that's hard to predict now.
Dawn Stringer: And to your point, there are also probably some downfalls of AI in science and even looking into the future. Can you talk about any concerns that you have from a scientific perspective of the advancement of this artificial intelligence?
Pernilla Wittung-Stafshede: But it's also a worry, I think, that some of these AI methods or, you know, just AlphaFold in a way you go into a website and you find the structure of a protein. And then if you take that for granted, I mean, we don't need to understand how things come about. We can just go and pick and get things or ask things.
And sometimes those things are wrong. I mean, in principle, AlphaFold doesn't predict the proteins involved in neurodegeneration. We don't get the right structures, proteins that have many domains, you don't get the right structure. But I mean, if you don't know that certain things don't work so well, I mean, you can easily be misled. And I think that's a big worry I have that it's so easy to get access to data and you think you have it, but you don't really know.
Dawn Stringer: That's true because machines don't have critical thinking like humans do. And on that note, do you see any risks involved in using AI?
Pernilla Wittung-Stafshede: I do think that there are risks. Everything that we you know, everything that we discover or develop, I mean, say CRISPR Cas9. I mean, it's an amazing tool. Right. And people also talk about that. You can use it to do bad things and you can and the same goes here that it can be used in the wrong way.
And I think that we can't hinder discovery or we can't hinder progress, I think because it will always happen somewhere. So I think we probably need to set rules and we need to maybe have ways to have some kind of control of what we use it for and why and kind of be very open and share. I don't think AI will take over the world in a way.
I mean, but I think it can I mean, definitely you can probably develop processes where AI can select in the wrong way or make judgments in the wrong way. And eventually, if you couple that to pressing buttons here and there, you know, bad things can happen. But I think the good things are so huge right, also, and we don't necessarily know what they are, but that's always the case with something new.
We don't know what to use them for fully until we start to explore.
Dawn Stringer: I love hearing your perspective on that and it also makes me wonder what you think about the future of AI in science and where that's heading.
Pernilla Wittung-Stafshede: I mean, I think that it will be both good and bad outcomes here, but I think we will see a lot of progress. But maybe there will also be a lot of kind of too much. I mean, progress that are not necessarily true in a way because it looks too good or you use AI and then in the end you predict something or you say, I'll cure this disease, because AI told me that this is the way to do it and it's not going to work.
So I think with many things there will be this hype first and then we'll kind of find some kind of steady state level where you reasonably understand what they can be used for. And I think that will come. I think now is just everybody is just so amazed and it's so kind of fun to play with in a way, that will mellow down to something more reasonable, I think.
I mean, the other thing though, it’s another separate thesis, I mean, is there's so much data, right? How can we store all this data and all this? I mean, to me, that feels like eventually the internet will kind of blow up with all this stuff. And then maybe, that's also, because that takes a lot of energy power to have data stored.
And if people start to kind of mess with that, that's also a big problem. I mean, I see this a lot, this is a bright future, but there's also a lot of things that might be messed up in the future. So I, I don't know, it's kind of exciting.
Dawn Stringer: AI still has a long way to go and a lot of potential for advancing many areas in our lives, and EMSL is excited to see how it can advance science in the future, even potentially producing biofuels in a climate-conscious way.
Dawn Stringer: Thank you for listening to Bonding Over Science. I’m Dawn Stringer for the Environmental Molecular Sciences Laboratory.
We don’t have time to cover it all, so don’t forget to check out EMSL-DOT-PNNL-DOT-GOV for a full article on this topic featuring who I spoke with today. And don’t forget to follow us on all social media platforms for the latest and greatest news coming from EMSL!
Dawn Stringer: EMSL is a Department of Energy, Office of Science national user facility that accelerates scientific discovery and pioneers new capabilities to understand biological and environmental processes across temporal and spatial scales. EMSL leads the scientific community toward a predictive understanding of complex biological and environmental systems to enable sustainable solutions to the nation’s energy and environmental challenges. If you’re interested in working with EMSL, learn more at emsl.pnnl.gov, that’s E-M-S-L-DOT-P-N-N-L-DOT-G-O-V.