From Activity to ORF
EMSL Project ID
23390
Abstract
THE PROBLEM: There are many known activities, the genes for which are unknownAlthough vast databases of DNA sequences are now available for many organisms, there are still many known protein activities that are not associated with DNA sequences. We call these proteins "foundlings". Unlike orphans, a "foundling activity" has a gene Mother, but doesn't know who it is. The severity of the problem of "foundling activities" was assessed in the report from a workshop, "An Experimental Approach to Genome Annotation", convened by The American Academy of Microbiology and supported by NSF (1). (See proposal section for references.) They concluded that as many as 40% of all predicted genes in completed prokaryotic genomes have no functional annotation. Despite the substantial success of bioinformatics, uncharacterized or misannotated ORFs account for 13 to 60% of the ORFs of most completely sequenced genomes (2, 3, 4, 5). Siew and Fischer (5) suggested, on the basis of structural analysis, that many of the uncharacterized ORFs are likely to correspond to expressed, functional (and even essential) proteins.
We propose developing a sensitive, virtually universal and easy to use method for associating known activities with unknown ORF's . We will partially purify the activity and correlate the relative quantity of protein with the activity profile. The proteins in fractions with little or no activity will be compared with the proteins in fractions with peak activity. This provides an internal control, decreasing the complexity of the task at hand from knowing the absolute amount of the candidate protein to knowing its relative amount in the chromatographic fractions. Proteins in the fractions will be identified from the mass spectra, measured by PNNL, of their "tryptic peptides", using sequence information and the software developed to facilitate this identification. Those proteins whose relative activity and abundance are reasonably correlated with their relative abundance will be candidates for the protein of interest. The ORF of interest will be selected from among the candidates by over-expressing the candidates and assaying them for the activity of interest. This will lead to associating the activity with its ORF most often after less than 100-fold purification, depending on the abundance of the protein of interest. Purification at this level is far less than required for purification to homogeneity. The ability to choose among candidates by an independent method makes it unnecessary to achieve high degrees of purification. The constraint is that the protein of interest be within the concentration dynamic range of the mass spectrometer, about 1 per 1,000.
Project Details
Project type
Exploratory Research
Start Date
2007-03-22
End Date
2008-03-23
Status
Closed
Released Data Link
Team
Principal Investigator