Structural Genomics of Model Eukaryotic Organmisms
EMSL Project ID
7598
Abstract
The primary goal of our pilot project is to develop integrated technologies for high-throughput (htp) protein production and 3D structure determination. Our NESG Consortium has the particular focus of evaluating the role of protein NMR spectroscopy in future Structural Proteomics Centers to be funded by NIH. Our targets include members of large protein families for which no 3D structural information is available. We prioritize protein families with representatives in human and higher eukaryotic genomes because of the important role of these proteins in human health and model organism biology. During the first three plus years of this project, the NESG Consortium has submitted > 90 new protein structures to the PDB, and this productivity compares very well with that of other NIH-funded structural genomics projects. On average, each of these structures allows for homology modeling of ~ 100 protein structures with previously unknown folds (~ 100 modeled structures / structure). In order to further increase our productivity in the remaining two years of this project, we will rely heavily on distributed NMR data collection facilities, particularly those at Pacific Northwest National Laboratories.
The primary genome targets for this methodology development are eukaryotic model organisms which are subjects of extensive functional genomics research, including S. cerevisiae, C. elegans, and D. melanogaster, as well as homologues from the human genome. Within these genomes, the pilot project focuses on proposed open reading frames (ORFs) encoding phylogenetically conserved polypeptide chains of < 340 amino acids with no predicted 3D structures. While these targets will be identified in genomes of these eukaryotic model organisms, in some cases it is more practical (or more interesting) to express and study the corresponding prokaryotic or human homologues. All of the expressed and purified targets are screened by NMR and other biophysical methods to determine feasibility for structure analysis, and put through initial htp crystallization trials. Criteria have been established for assigning particular targets for crystallographic or NMR structure determination. Proteins providing good HSQC spectra and high biological interest scores are prioritized for NMR data collection and structural analysis.
Over 2160 proteins have been cloned and expressed to date from the genomes of C elegans. D.melanogaster, H. sapiens, and reagent genomes of microorganims which have homologues of these metazoan proteins. For over 850 of the proteins, conditions have been identified that provide high level expression and solubility. Several of these with high biological interest have been selected for NMR studies. Of the > 90 NESG protein structures in the Protein Data Bank, approximately half were elucidated by NMR spectroscopy. To this end the EMSL High Field Nuclear Magnetic Resonance Facility has played a key role in this productivity. In 2003, NMR data collected at EMSL across the entire NESG directly resulted in a total of seven new NMR solution structures submitted to the PDB - 1RZW (GR4, Powers/Montelione), 1R57 (ZR31, Cort/Kennedy), 1RQ6 (TT802, Wu/Arrowsmith), 1Q48 (IR24, Ramelot/Kennedy), 1NXI (OP3. Ramelot/Kennedy), 1NYN (YTYst425, Ramelot/Kennedy), JR19 (1NY4, Aramini/Montelione) - with three others in refinement, provided valuable high-field or 4D NOESY information used in two other structure determinations – 1PQX (ZR18, Baran/Montelione), 1PUL (Tejero/Montelione) - and six assignment and structure publications (Zheng et al., 2003; Ramelot et al., 2003a,b; Pineda-Lucena et al., 2003; Aramini et al., 2003; Wu et al., 2003). In addition to NMR data required for a complete structure analysis (typically 4 to 6 weeks of multidimensional NMR experiments per protein), EMSL has also been a valuable resource for the collection of high field NOESY data for structure refinement several NESG targets, as well as relaxation data to probe the dynamics of proteins in solution and ligand binding assays. In addition, dynamic and substrate binding studies have been carried out on NESG proteins whose structures have been recently elucidated in our consortium (SR10, WR41, and QR46).
High field (600, 750, and/or 800 MHz) NMR instrument time over the next six month period is requested to acquire NMR data required for the 3D structure determinations of NESG which have already been expressed and demonstrated to provide good/excellent quality HSQC spectra. These include HR1757 (108 aa), WR73 (181 aa), and AR81 (139 aa). HR1757, from Homo sapiens, is a ubiquitin-like protein fragment of unknown function found in many eukaryotes. WR73, from Caenorhabditis elegans, is a homolog of a transitionally controlled tumor protein (TCTP) found in other eukaryotes including humans. AR81, from Arabidopsis thaliana, is a protein of unkown function found in eukaryotes. Sequence data and HSQC spectra for these proteins, along with other promising targets recently screened, are included in the attached PowerPoint file. Protein samples will also be provided on behalf of the Northeast Structural Genomics Consortium from the laboratory of Cheryl Arrowsmith at Univ. of Toronto. Given that a standard data collection for a protein structure determination requires at least 4 weeks of NMR instrument time, determining the structures of the three targets discussed above would require a total of 10 weeks of instrument time (see below).
The 3D protein structures completed to date provide crucial clues that have allowed us to propose and test biochemical and biophysical functions of these proteins, which appear to be involved in fundamental aspects of translational regulation, intracellular signaling, and cell growth regulation. These 3D structures have also formed the basis for homology modeling across our large protein target families. The data collected at PNNL has been invaluable in demonstrating the role of NMR in the emerging area of structural proteomics.
Project Details
Project type
Capability Research
Start Date
2004-04-16
End Date
2004-09-30
Status
Closed
Released Data Link
Team
Principal Investigator
Team Members
Related Publications
Yin C, JM Aramini, LC Ma, JR Cort, GVT Swapna, RM Krug, and G Montelione. 2011. "Backbone and Ile-?1, Leu, Val Methyl 1H, 13C and 15N NMR chemical shift assignments for human interferon-stimulated gene 15 protein." Biomolecular NMR Assignments 5(2):215-219. doi:10.1007/s12104-011-9303-8