Development of the Integrated MetaPROteomics Viewer (IMPROV) Software Toolkit
EMSL Project ID
42291
Abstract
Beyond the periphery of our senses exists an infinitely diverse microbial world. This microcosm extends into every imaginable habitat, thrives under powerful extremes, and helps create and sustain the conditions for life itself. Because the vast majority of microbes resist laboratory cultivation our views on the extent and meaning of microbial diversity have long been impaired. While selective isolation or enrichment schemes have successfully identified individual microbes mediating specific biogeochemical transformations, they represent only a minor fraction of a community's overall metabolic capacity. Environmental or 'meta' proteomics provides a model for the development of methods and tools to discover and validate the distributed networks linking microbial community metabolism with ecosystem-level function. Querying peptides against an incrementally clustered database is a data-intensive activity and more complex communities require the collation of larger and more redundant gene models to provide sufficient database coverage for peptide matching. The identification of expressed proteins can provide quantitative validation of gene models used in predicting the metabolic potential of microbial communities and resolve physiological differences between closely related strains along defined environmental gradients. Such data can be useful in the identification of biomarkers diagnostic for specific biogeochemical processes or dynamic response states when evaluated with appropriate environmental parameter information. Given the power and the promise of the proteomics model, we propose to create a platform independent and scalable software environment, called the Integrated MetaPROteomics Viewer (IMPROV), that will allow for easier interpretation of microbial community structure and function across multiple levels of biological information flow. IMPROV will provide end users with the ability to import sequence information from public data repositories as well as uploading raw sequence files and experimentally derived results from high-throughput analyses in standardized formats. It will have scalable algorithms for clustering metagenomic sequences and provide multiple coordinated visualizations highlighting different aspects of the data. Visualizations will be made available as viewers in a plug-in based architecture for exploring clustered sequences, biological pathways, comparative expression profiles, molecular taxonomy and environmental parameter data.
Project Details
Project type
Scientific Partner
Start Date
2011-02-10
End Date
2014-12-31
Status
Closed
Released Data Link
Team
Principal Investigator
Team Members