(emsl2534)Magpie Annotation of Genome Sequences
EMSL Project ID
2534
Abstract
Several DOE programs are currently focused on studies of microorganisms and require that the investigators funded by these programs utilize genome sequence information to develop and test research hypotheses relating to DOE mission areas. We currently are part of several DOE projects that rely on having up-to-date bioinformatics analyses for several microbial genome sequences. EMSLs proteome center led by Dr. Richard Smith requires this information for interpretation of data derived from global analysis of proteins identified in bacteria. In addition, the environmental microbiology group has several projects focused on the bacterium, Shewanella oneidensis, and rely on information from genome sequence analysis to design experiments and interpret data derived from them. In another DOE funded project, we will be required to characterize genome sequences derived directly from complex bacterial communities. The recently submitted Genomes to Life Proposal involves investigators from the computational, proteomics, biology, and microbiolgy group and will also require genome sequence analysis. We are fortunate to have acquired an extensive software package, called MagPie, which can be used to automate the collection of data for analysis of genome sequences. This software was intially developed with DOE funds at Argonne National Laboratory and is now being further developed at the University of Calgary under the direction of Dr. Christoph Sensen (http://niji.imb.nrc.ca/sensencw/research_interest.html). The software is installed on Dr. Romine's personal computer (SunBlade 100). To analyze a typical genome, requires performing an average of 100,000 jobs. These jobs involve comparing DNA and protein sequences, deduced from the genome sequence, to several large databases that are publically available. These databases are updated daily. With a single cpu running so many jobs takes several months to complete a single pass analysis of the genome. Therefore, by the end of the analysis the data collected is already several months out of date. Analysis of multiple genomes, would be updates possible very infrequently. MagPie is currently designed to submit jobs to up to 20 cpu's simultaneously. To realize the full potential for the software for DOE projects involving genome sequence interpretation we are requesting access to a computer with parallel processing capability.
Project Details
Project type
Capability Research
Start Date
2002-05-22
End Date
2003-07-09
Status
Closed
Released Data Link
Team
Principal Investigator