Proteins are the machinery of life and are involved in many processes, from photosynthesis in plants to cellular communication in the brain. The physical arrangement of a protein’s amino acids dictates how it folds and interacts with its environment. While major advances in machine learning, such as AlphaFold, have improved our ability to predict protein structure from sequence, predicting protein function (e.g., its reactivity to other molecules) remains a major challenge. Since protein function is closely related to the 3D structure, we will develop novel physics-guided machine learning models with graph neural networks to learn representations of protein interfaces. Specifically, we will use coarse-grained representations of protein structure micro-environments as input to a graph neural network and, through transformations that respect the physical symmetries in the data, learn representations that reflect biophysical properties of proteins and protein-protein interactions. Moreover, we will use the inferred molecular representations as a generative model, to design molecular targets (e.g., small molecules and peptides) for specified protein interfaces. Our approach opens a new path towards interpretable computational models of proteins that describe how biological properties and biological function emerge from protein subunits.
Our computational work has broad implications for molecular modeling and design, including in development of antimicrobial agents for plants, and enzyme design to improve carbon fixation in plants and algae. Beyond biological applications, our approach can also be used to model and reason about complex interactions in 3D, e.g., relevant for aerosol chemistry or organic carbon stabilization.