Exploratory high-throughput sequence analysis on microbial genomes
EMSL Project ID
19809
Abstract
Microbial genomes are a fundamental source of sequence information vital to the DOE effort to find a viable strategy for remediating waste products accumulated by the atomic weapons program, and possibly from spent reactor fuel. The premier DOE institution associated with the GTL efforts in curating and analyzing microbial genomes is the Joint Genome Institute (JGI) at Lawrence Berkeley National Laboratory. Currently the collection microbial genomes contains 1.5 millions proteins. A key scientific analysis to be performed by this group is comparing each of these proteins against all the proteins in the nonredundant protein (nr) database, distributed by NCBI. This comparison will require approxmately 30,000 CPU hours to complete, making it an intractible problem for conventional BLAST implementations. But we have already demonstrated on ScalaBLAST that near-ideal scaling is possible beyond 1500 processors on MPP2 for problems requiring many queries per processor. We propose to perform the microbial genome vs. nr comparison using ScalaBLAST on MPP2 as a SIGHTS job and request a rapid allocation of 30,000 CPU hours to perform this search. The impact of these results is that they will advance the mission of JGI for analyzing and disseminating high-quality curated microbial genomes for use in many GTL applications. The proposed SIGHTS run will also enable a high-impact journal publication with joint authorship of JGI and PNNL staff.
Project Details
Project type
Limited Scope
Start Date
2006-06-20
End Date
2006-09-21
Status
Closed
Released Data Link
Team
Principal Investigator
Team Members