Computing, Analytics, and Modeling
Functional and Systems Biology
Interpreting 3D Protein Structure Data using a Graphical Approach
Using advanced computing, scientists discovered a graphing approach to identify the structure of proteins.
The Science
Cryo-electron tomography is an imaging method that reconstructs three-dimensional images of molecules, including proteins. The method holds a great deal of promise for understanding the structure of these molecules in their native environment, but sometimes the image data are too distorted or too sparse for scientists to fully determine the structure of a given protein. To solve this problem, scientists are developing a new approach to mine these data for patterns, and they are finding that the mathematical graphs based on these patterns can identify protein structures with high accuracy.
The Impact
Cryo-electron tomography generally relies on a series of two-dimensional images of the same molecules at different tilted angles to generate a three-dimensional volume. But the quality of those two-dimensional images—which can suffer from low contrast, data contamination, data deformation, or data loss—can make it difficult for researchers to interpret the results. A new approach lays the foundation for overcoming these challenges in the future by building a mathematical graph of the molecule based on data patterns, much like a map. The graph is particularly useful in identifying the structure of molecules like proteins. Better identifying the structure of proteins can help scientists develop new biological materials that can be used for clean energy production and other bioproducts.
Summary
A multi-institutional team of scientists hypothesized that proteins were distinct in their structure, even though the data/images from a cryo-electron tomography system of that structure might be distorted. To test this hypothesis, they employed pattern mining to transform three-dimensional simulated tomography images without noise into mathematical graphs. They then systematically introduced data distortion or defects into the simulated images to see whether the differences affected the ability to interpret the graphs. Using the Tahoma computing system at EMSL, the Environmental Molecular Sciences Laboratory, a Department of Energy (DOE) Office of Science user facility, team members calculated similarities between graphs from the simulated images and graphs from transformed images. They found they could accurately identify 80 to 100 percent of the proteins from 10 distinctive samples when the background noise was not included. Current work is focused on adapting the approach to more realistic data with background noise and additional artifacts to accelerate the mining and interpretation of cryo-electron tomography data. This research serves as a proof-of-concept for the approach and could help improve tomogram processing for a wide range of disciplines.
Contacts
Margaret Cheung, EMSL, margaret.cheung@pnnl.gov
James Evans, EMSL, james.evans@pnnl.gov
Funding
The research was funded through Research Computing and the Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory, as well as the DOE Office of Science, Office of Workforce Development for Teachers and Scientists Community College Internship Program. A portion of the research was conducted at EMSL, the Environmental Molecular Sciences Laboratory, a DOE Office of Science user facility.
Publication
A. George, et al., “Graph identification of proteins in tomograms (GRIP-Tomo).” Protein Science 32, e4538 (2023). [DOI: 10.1002/pro.4538]