Skip to main content

Development of an Environmental-Focused DIA Pipeline for HTP Proteomics


EMSL Project ID
60905

Abstract

Currently, metaproteomics workflows are hindered by sample complexity which necessitates fractionation of the sample meaning one sample can require 12 to 24 MS analyses. Data dependent workflows (DDA) focus on the MS/MS fragmentation of one peptide at a time, thus resulting in high levels of missing data and a lack of reproducibility. Data independent methods (DIA) overcome these limitations by the simultaneous fragmentation of all the peptides eluting from the LC, thus collecting significantly more data in the same amount of analysis time. However, DIA proteomics approaches rely on a well annotated genome which allow for the prediction of the peptide fragmentation patterns a priori for peptide identification. These predictions are hindered by the presence of isoforms in the genome (as is present in poly ploidy plant species) or the redundancy of species delineated protein and peptide sequences (as are present in metagenomes). Additionally, the lack of depth in sequencing coverage of many metagenomes also create ambiguity in the peptide sequence predictions.

We will develop a DIA proteomics pipeline for environmental proteomics research, specifically focusing on developing workflows that enable whole proteome coverage for plants and microbial communities. We will test the current methods that focus on human and bacterial species to determine the extent of hindrance caused by the genome ambiguity and research workflows that will overcome these limitations. Several DIA benchmark studies have reported significant variations in identified peptides and protein groups using different DIA analysis workflows and software from the same DIA data, with significant impact on the on the overlap of protein groups identified by all the software packages used in the study. This observation necessitates a comprehensive look at all the different components of the DIA pipeline by a prospective DIA user to be able to develop a workflow that delivers proteomics data at the highest confidence level. The objective of this proposed comprehensive benchmark study is to perform a direct comparison of DDA proteomics with the major schemes available for DIA analysis using a sample set of varying biological complexity and quantifying the possible gains that DIA can deliver for environmental and botanical proteomics research.

Project Details

Start Date
2023-10-01
End Date
N/A
Status
Active

Team

Principal Investigator

Isaac Attah
Institution
Pacific Northwest National Laboratory

Co-Investigator(s)

Geremy CD Clair
Institution
Pacific Northwest National Laboratory

Paul Piehowski
Institution
Environmental Molecular Sciences Laboratory

Mary Lipton
Institution
Environmental Molecular Sciences Laboratory

Team Members

Reta Birhanu Kitata
Institution
Pacific Northwest National Laboratory

Aivett Bilbao Pena
Institution
Environmental Molecular Sciences Laboratory