Skip to main content

Plant proteome structure-function and pKa prediction augmented with environmental considerations, Deep Learning and integrated datasets


EMSL Project ID
60376

Abstract

One of the goals of computational biology is to model cellular behavior. At the crux of this endeavor is the need to understand proteins in their biologically relevant environmental context. In other words, one must be able to model protein behavior given the various stresses imposed, eg., osmolarity, heat and cold shock, and ion homeostasis. With the advent of the genomic and proteomic revolutions our ability to tract evolutionary pressures has made significant strides both in phenotype and at the level of genome-wide characterization (genetic screens, gene expression and engineered gene activity). Studies designed to understand the activated pathways of stresses (salt, cold, heat, acid, alkaline osmolarity, hydration, and chemical) are now often able to identify the key players. However, understanding the mechanisms involved beyond the phenomenological level and at the molecular and atomic level requires structural and energetic knowledge and the associated dynamics of the proteins and macromolecules. It is at the level of molecular detail where engineering of biology will become transformative, eg. engineering organism salt tolerance. To this end structure based algorithms have been used to model function. Despite over 40 years of computational work and the existence of many algorithms, prediction of protein function at the levels needed for system engineering is still out of reach. The difficulty lies in the understanding of the molecular determinants of electrostatic energies and pKa values – in particular for those where energy can be stored. Most computational methods are not sufficiently accurate to predict pKa values of these functional groups and yet their knowledge is crucial for understanding many biochemical processes.

Major obstacles to the molecular modeling of proteins have been overcome recently. In particular the recent progress in protein folding by AlphaFold allowing the atomic level description of proteins and protein interactions. Coupled with the advancement of ionizable residue pKa prediction methods we are at a stage where a complete picture of a cellular proteome can be determined. It would seem we could identify trigger points in disease and adaptation as well as reprogram desired characteristics. However two hurdles remain: the problematic low accuracy of the predictions for functional groups and lack of understanding of the effect of environmental influences on pKa values. These two hurdles form the basis of the first two specific aims of this proposal. This set of studies aims to improve the prediction of pKa values through the integrated use of (1) additional structural datasets made available through the advent of plant genomic data and the recent structure predictor AlphaFold (2) parameterization of environmental influences on pKa values and pH dependent properties (3) the use of multiple protein databases and Deep Learning to train for identification of functional groups and (4) an experimental evaluation of the robustness of our improvements using a candidate protein. These studies will contribute to the insight needed to guide the development and the application of computational algorithms to the levels needed to model biological systems.

Project Details

Project type
Large-Scale EMSL Research
Start Date
2022-10-01
End Date
N/A
Status
Active

Team

Principal Investigator

Carolyn Fitch
Institution
Johns Hopkins University

Team Members

Sean Fitch
Institution
Rensselaer Polytechnic Institute

Christopher Cooper
Institution
Universidad Tecnica Federico Santa Maria