Skip to main content

Exploratory high-throughput sequence analysis on microbial genomes


EMSL Project ID
19809

Abstract

Microbial genomes are a fundamental source of sequence information vital to the DOE effort to find a viable strategy for remediating waste products accumulated by the atomic weapons program, and possibly from spent reactor fuel. The premier DOE institution associated with the GTL efforts in curating and analyzing microbial genomes is the Joint Genome Institute (JGI) at Lawrence Berkeley National Laboratory. Currently the collection microbial genomes contains 1.5 millions proteins. A key scientific analysis to be performed by this group is comparing each of these proteins against all the proteins in the nonredundant protein (nr) database, distributed by NCBI. This comparison will require approxmately 30,000 CPU hours to complete, making it an intractible problem for conventional BLAST implementations. But we have already demonstrated on ScalaBLAST that near-ideal scaling is possible beyond 1500 processors on MPP2 for problems requiring many queries per processor. We propose to perform the microbial genome vs. nr comparison using ScalaBLAST on MPP2 as a SIGHTS job and request a rapid allocation of 30,000 CPU hours to perform this search. The impact of these results is that they will advance the mission of JGI for analyzing and disseminating high-quality curated microbial genomes for use in many GTL applications. The proposed SIGHTS run will also enable a high-impact journal publication with joint authorship of JGI and PNNL staff.

Project Details

Project type
Limited Scope
Start Date
2006-06-20
End Date
2006-09-21
Status
Closed

Team

Principal Investigator

Christopher Oehmen
Institution
Pacific Northwest National Laboratory

Team Members

Ernest Szeto
Institution
Lawrence Berkeley National Laboratory

Philip Hugenholtz
Institution
Joint Genome Institute