Skip to main content

A Multi-Omics Data Exploration and Integration Web Application


EMSL Project ID
51146

Abstract

In the biological and environmental sciences, data are being generated at an unprecedented rate. There are many scientific discoveries pertinent to DOE missions that are waiting to emerge from high-throughput biological data. The critical first step in analyzing or modeling data is visualization and exploration to assess data quality and the potential relationships between variables. The size and complexity of data are the main factors that pose challenges to effective exploration. Solutions are currently known to be rigid, resulting in the removal of potentially vital information in order to simplify or requiring researchers to become experts in programming methods and data processing techniques, slowing down efficiency of discovery. Additionally, it is common for researchers to generate data for multiple ‘omics data types for the same study. Efficient integration of these complex and disparate datasets requires access to and understanding of databases of known biological pathways, understanding of robust data preprocessing techniques and often statistical methods, and programming capability. Gaps in any one of these capabilities can lead to results that are not reproducible or heightened time requirements. For example, the processing of Fourier transform mass spectrometry (FT-MS) data is often done manually. Several members of this proposal team developed a web application allowing users to quickly and dynamically process and explore their data, including producing standard visualizations, in a reproducible manner; this web application, FREDA (https://msc-viz.emsl.pnnl.gov/FREDA), has been received with extremely positive feedback from EMSL users and simplified hours-long processes into tasks that can be accomplished in minutes with FREDA.
We plan to develop a web application for users to explore, visualize, and perform quality control assessments on mass spectrometry (MS) data. This includes normalization (when necessary), mapping to databases and extracting additional biomolecule properties (where possible), and integrating multiple ‘omics datasets with statistical and biological methods. The framework will be developed and released in the form of a web application where users can upload NMR, GC/MS and LC/MS metabolomics, lipidomics and proteomics data for exploration; the starting point for this workflow are datasets consisting of MS-quantified biomolecule relative peak intensities or spectral counts. The capabilities in the application will be implemented as modules with available and appropriate metrics, normalizations, and methods determined on the backend of the web application based on data characteristics rather than requiring the user to be an expert in biostatistics. Finally, a report of analysis steps, figures, data, and code for reproduction will be available for download.

Project Details

Start Date
2020-02-21
End Date
2020-10-31
Status
Closed

Team

Principal Investigator

Lisa Bramer
Institution
Pacific Northwest National Laboratory

Team Members

Albert Rivas-Ubach
Institution
Spanish National Research Council - CSIC

Daniel Claborne
Institution
Pacific Northwest National Laboratory

Aivett Bilbao Pena
Institution
Environmental Molecular Sciences Laboratory

Allison Thompson
Institution
Environmental Molecular Sciences Laboratory

Jennifer Kyle
Institution
Pacific Northwest National Laboratory

Hugh Mitchell
Institution
Pacific Northwest National Laboratory