High performance sequence analysis for data-intensive bioinformatics
EMSL Project ID
15490
Abstract
Volume of sequence data available through public repositories are growing almost exponentially in time. Currently, the nonredundant protein database doubles in size approximately every 2 years. At the current rate of growth, the volume of sequence data will be too large to fit in memory on conventional computers. But since sequence analysis involves scoring a query against this large datasource, the entire target database must pass through memory for each query. We are developing a prototype sequence analysis tool (ScalaBLAST) to take advantage of large globally addressible memory spaces to mitigate the performance penalty normally incurred by memory mapping or file I/O on conventional computing systems. This next-generation architecture approach to data-intensive computing for bioinformatics requires development of special functionality (using emerging features of global array toolkit) to optimize performance on shared memory systems, such as MPP2. This project will be aimed at incorporating the ability to prefetch large data blocks of sequence data and other optimization methods into ScalaBLAST to make high performance sequence analysis possible and efficient on shared memory systems.
Project Details
Project type
Grand Challenge
Start Date
2005-05-16
End Date
2008-11-13
Status
Closed
Released Data Link
Team
Principal Investigator
Team Members
Related Publications
McDermott JE, CS Oehmen, LA McCue, EA Hill, DM Choi, J Stockel, ML Liberton, HB Pakrasi, and LA Sherman. 2011. "A Model of Cyclic Transcriptomic Behavior in Cyanobacterium Cyanothece sp. ATCC 51142." Molecular Biosystems 7(8):2407-2418. doi:10.1039/C1MB05006K
Welsh EA, ML Liberton, J Stockel, T Loh, TR Elvitigala, C Wang, A Wollam, RS Fulton, SW Clifton, JM Jacobs, R Aurora, BK Ghosh, LA Sherman, RD Smith, RK Wilson, and HB Pakrasi. 2008. "The genome of Cyanothece 51142, a unicellular diazotrophic cyanobacterium important in the marine nitrogen cycle." Proceedings of the National Academy of Sciences of the United States of America 105(39):15094-15099. doi:10.1073/pnas.0805418105