Accurately predict the spatial arrangement of protein complexes from subtomogram images through physics-informed machine learning
EMSL Project ID
60121
Abstract
Deducing three-dimensional (3D) structures of molecular assemblies from cryo-electron tomography (cryoET) data provides an unprecedented level of detail about biological systems in their native unperturbed state. The proposed work is focused on the automation of subtomogram averaging (SA) - a reconstruction approach that extracts a 3D structure of the specimen from series of 3D sub-volumes that individually show different and incomplete views of the same structure. SA currently constitutes an arduous process requiring manual intervention to identify and box out all the sub-volumes of interest. One of the main challenges to fully automate SA is the number of tomogram sub-volumes (>10,000) to achieve even 6Å resolution. Due to this, most examples of SA to date have looked at complexes at high copy number in cells and machines with low variations in compositions and dynamics. However, interesting protein complexes that form dynamic clusters often appear as particles with low density in electron tomograms and are often filtered out as noise in a reconstructed voxel during a standard ensemble averaging technique. The main technical objective of this work is to leverage the structural ensemble of protein conformations from molecular dynamics simulations as physical constraints for machine learning. This will be complimented by developing a reproducible, well-defined, and automated pipeline for SA which will significantly increase the throughput of SA data processing and lessen the number of required input datasets.
Project Details
Start Date
2021-11-16
End Date
2023-10-31
Status
Closed
Released Data Link
Team
Principal Investigator
Team Members