Skip to main content
Science Areas
Computing, Analytics, and Modeling
Functional and Systems Biology

Interpreting 3D Protein Structure Data using a Graphical Approach

Using advanced computing, scientists discovered a graphing approach to identify the structure of proteins. 

data network

A team of scientists is developing a new approach to mine data for patterns to develop mathematical graphics to identify protein structure with high accuracy. (Image by kanawatTH | Freepik)

The Science 

Cryo-electron tomography is an imaging method that reconstructs three-dimensional images of molecules, including proteins. The method holds a great deal of promise for understanding the structure of these molecules in their native environment, but sometimes the image data are too distorted or too sparse for scientists to fully determine the structure of a given protein. To solve this problem, scientists are developing a new approach to mine these data for patterns, and they are finding that the mathematical graphs based on these patterns can identify protein structures with high accuracy. 

4RLC Beta-barrel Domain
Scientists are pioneering a way to use pattern mining to develop a mathematical graph of proteins, which can identify their structure with high accuracy. (Image courtesy of Pacific Northwest National Laboratory)

The Impact 

Cryo-electron tomography generally relies on a series of two-dimensional images of the same molecules at different tilted angles to generate a three-dimensional volume. But the quality of those two-dimensional images—which can suffer from low contrast, data contamination, data deformation, or data loss—can make it difficult for researchers to interpret the results. A new approach lays the foundation for overcoming these challenges in the future by building a mathematical graph of the molecule based on data patterns, much like a map. The graph is particularly useful in identifying the structure of molecules like proteins. Better identifying the structure of proteins can help scientists develop new biological materials that can be used for clean energy production and other bioproducts.  

Summary 

A multi-institutional team of scientists hypothesized that proteins were distinct in their structure, even though the data/images from a cryo-electron tomography system of that structure might be distorted. To test this hypothesis, they employed pattern mining to transform three-dimensional simulated tomography images without noise into mathematical graphs. They then systematically introduced data distortion or defects into the simulated images to see whether the differences affected the ability to interpret the graphs. Using the Tahoma computing system at EMSL, the Environmental Molecular Sciences Laboratory, a Department of Energy (DOE) Office of Science user facility, team members calculated similarities between graphs from the simulated images and graphs from transformed images. They found they could accurately identify 80 to 100 percent of the proteins from 10 distinctive samples when the background noise was not included. Current work is focused on adapting the approach to more realistic data with background noise and additional artifacts to accelerate the mining and interpretation of cryo-electron tomography data. This research serves as a proof-of-concept for the approach and could help improve tomogram processing for a wide range of disciplines.  

Contacts 

Margaret Cheung, EMSL, margaret.cheung@pnnl.gov 

James Evans, EMSL, james.evans@pnnl.gov 
 

Funding 

The research was funded through Research Computing and the Laboratory Directed Research and Development Program at Pacific Northwest National Laboratory, as well as the DOE Office of Science, Office of Workforce Development for Teachers and Scientists Community College Internship Program. A portion of the research was conducted at EMSL, the Environmental Molecular Sciences Laboratory, a DOE Office of Science user facility.  

Publication

A. George, et al., “Graph identification of proteins in tomograms (GRIP-Tomo).” Protein Science 32, e4538 (2023). [DOI: 10.1002/pro.4538]