Skip to main content
Computing, Analytics, and Modeling

Data Transformations

Many scientific breakthroughs begin with the transformation of thousands of data points into usable sets of information that showcase a process, structure, or system. Accessing and processing those data points, however, can be challenging and complex. Through the Environmental Molecular Sciences Laboratory’s Data Transformations Integrated Research Platform (IRP), researchers work with leading experts in the biological and environmental sciences to streamline experimental setup in the computational space, identify ideal computational workflows, and transform the data into usable models and visualization tools. The Data Transformations IRP collaborates closely with other IRPs, fostering a strong network to facilitate great scientific discoveries. Biological and environmental scientists, both internal and external to the IRP, engage collaboratively with computational scientists, forming a dynamic synergy that advances research in diverse fields for EMSL users. This collaborative approach enhances efficiency and effectiveness of scientific investigations, pushing the boundaries of knowledge and innovation.

Computational expertise within the Data Transformations IRP can include:

  • Computational biology
  • Statistics
  • Workflow development
  • Visualization
  • Development
  • Implementation of new tools.

The IRP team also specializes in automating computational workflows and ensuring meticulous data curation. The end goal is to allow for the seamless creation of sophisticated models and visualization tools that effectively communicate the wonders of scientific advancements. This allows researchers to present their findings in a meaningful and relatable way.

The science

The Data Transformations IRP performs three key functions:

  1. Streamline and standardize data analysis workflows, including identification, exploratory data analysis, statistical analysis, and more. This is accomplished through:
    • Provenance: tracing the origin of data from generation to storage
    • Reproducibility: the extent to which consistent results are obtained when an experiment is repeated
    • Automation: using technology to let subject matter experts focus their time on using their subject matter expertise.
  2. Increase user collaboration and awareness of EMSL capabilities. This is accomplished through:
    • Identifying areas that overlap in terms of processes/workflow steps to leverage existing capabilities into new areas
    • Gaining efficiencies through improved workflows and relationships that allow for the reduction of time and costs
    • Setting priorities for the development of new methods or tools for specific processing steps based on holistic views of existing workflows.
  3. Develop and improve the accessibility of tools for working with data produced at EMSL and generating data products. This is accomplished through:
    • Providing open-source software—it can be seen, modified, and distributed by anyone
    • Hosting user interfaces where coding is not required
    • Producing static and interactive visualizations.

How we do the science

EMSL users have access to a range of existing and emerging computational capabilities, supported by world-class expertise, to address research questions in the biological and environmental sciences.

Users work directly with EMSL staff members to set up experiments, establish best practices and workflows for their project, and pull the data they need to showcase experiment results. The Data Transformations IRP works closely with the Systems Modeling IRP to synergize development of novel platforms and modeling approaches.

Some of the tools used by the Data Transformations IRP include:

  • Open OnDemand: portal provides access to the Tahoma scientific computer so users can perform a range of computational tasks and workflows
  • Multiomics Analysis Portal (MAP): a one-stop shop of applications to meet a variety of multiomics research needs
  • CoreMS: a comprehensive mass spectrometry framework for software development and data analysis of small molecules
  • PeakDecoder: a machine learning-based algorithm that can identify a vast number of metabolites in each sample with a high level of confidence. It can be combined with modern instrumentation, including mass spectrometry, ion mobility spectrometry, and liquid chromatography.
  • NMR analysis: semi-automated capability for identifying and quantifying metabolites.

The Data Transformations IRP supports the Digital Phenome (DigiPhen) and Molecular Observation Network (MONet) strategic science objectives, including the adoption of new data transformation methods from the larger research community.