Computing, Analytics, and Modeling
Functional and Systems Biology
EMSL Tools Aim to Improve Proteomics Data Analysis
Samples can be processed faster, in larger scale
Other than bacteria, every living organism from fungi to plants has proteins known as histones.
These proteins are vital to the function of cells and regulate gene expression. Their functions are fine- tuned by chemical modifications known as “histone codes.” One histone can be simultaneously modified by multiple chemical groups, resulting in unique patterns that can be recognized by other proteins in the cell.
For scientists conducting proteomics research including histones, interpreting the histone codes is not always easy, especially if the goal is to precisely define the combination of histone codes. Top-down proteomics is an ideal method for studying intact histone molecules to get this information. However, the analysis is laborious and often requires the use of multiple software tools.
EMSL, the Environmental Molecular Sciences Laboratory, a Department of Energy user facility, offers open-source tools to automate and improve data analysis for top-down proteomics.
IsoForma, a customized software tool for analyzing modified proteins such as histones, and PSpecteR, a proteomics-focused visualization application and R package, are used in conjunction with EMSL’s mass spectrometry capabilities. Both are available to users who are successfully funded through one of EMSL’s open call opportunities.
EMSL is holding a webinar at noon on Wednesday, March 16, to walk potential users through the process of using IsoForma and PSpecteR for proteomics research.
IsoForma
EMSL developed IsoForma as a software tool for automating the quantitation of mixtures of proteoforms that cannot be easily analyzed using standard analysis pipelines. The software is used to quantify histone modifications and provide relative percentages of each proteoform. This information can be used in conjunction with genomic data to understand gene regulation patterns. For example, histones can be modified by the same chemical groups but at a different combination of locations, each potentially related to a different function. All these histone proteoforms can have the same total mass and can only be distinguished by analyzing their fragmentation data. Such analysis has been performed previously by EMSL users, including Sarah Rommelfanger, James Umen, and James Pesavento, but the process was very tedious and involved many manual steps.
Unlike the manual process, IsoForma combines four different tools into one. This greatly improves the workflow process, says Aivett Bilbao, a computational scientist who led the team that created IsoForma.
“IsoForma is more robust and automated,” Bilbao notes. “It removes manual steps that generally take up to an hour down to a total of four minutes.”
With the automated and faster processing, IsoForma can handle large-scale studies with hundreds of samples. Because prior analysis has generally been conducted manually, it opens up more possibilities for research.
“Manual analysis is time consuming,” says Mowei Zhou, an EMSL chemist who develops mass spectrometry techniques to understand proteins and genes. “This is a time saver for everyone. In addition, we have people who understand the data and we have people who understand the software development.”
Zhou will be working with users to run sample experiments using EMSL’s mass spectrometry instrumentation and will then run the IsoForma software to analyze the data.
PSpecteR
Like IsoForma, the PSpecteR application is built to help scientists understand protein fragmentation patterns, with a larger focus on observing the quality of database search tools.
Although other proteomics quality control applications do exist, PSpecteR is built with the capacity to adapt to the ever-changing needs of the top-down proteomics community, says David Degnan, a PNNL data scientist who created the application in 2019.
PSpecteR was designed as a cross platform tool with special features, including the ability to run top-down search algorithms, visualize feature maps interactively, and plot where identified peptides map to their parent proteins.
“Other applications have existed in different flavors and people make their own, but the issue has been an agnostic design, where capabilities are provided in both an application and a package for all types of users,” Degnan explains.
The open-source, Shiny web application supports data processing for top-down and bottom-up proteomics and allows for visualization of mass spectrometry and peptide mapping. The open-source code is available through the GitHub platform.
The EMSL LEARN Webinar Series presentation, Open Source Software for Top-Down Proteomics, will be held at noon, Wednesday, March 16, and will feature Degnan, Bilbao, and Zhou as speakers.