|
SCFIA: Statistical Corresponding Features
Identification Algorithm for LC/MS Jian Cui 1, Xuepo Ma1, Long Chen1, Ashoka
Polpitiya 2 and Jianqiu Zhang 1* 1Department of Electrical and Computer Engineering, the
University of Texas at San Antonio, One UTSA Circle, San Antonio, TX
78249 2Center for Proteomics,Translational
Genomics Research Institute, 445 N. 5th St. 4th flr,
Phoenix, AZ 85004. Email addresses: Jian Cui: cuijian1001@gmail.com Xuepo MA:maxuepo@gmail.com Long Chen: becloned@gmail.com ------------------------------------------------------------------------------------------------------------------------------------------ ABSTRACT:
Identifying corresponding
features (LC peaks registered by the same peptide) in multiple Liquid
Chromatography/Mass Spectrometry (LC/MS) datasets plays a crucial role in the
analysis of complex peptide or protein mixtures. Warping functions are
commonly used to correct elution time shifts between two different LC/MS
datasets to identify corresponding features. Although a warping function can
correct the mean difference of elution time shifts, it alone cannot resolve
the ambiguity completely because elution time shifts are random. Instead, we
propose a Statistical Corresponding Feature Identification Algorithm(SCFIA)
based on both time shift and the similarity of LC peak shapes between
corresponding feature pairs. SCFIA first trains statistical models of
corresponding features, and then, all candidate corresponding features are
scored by these statistical models to find the maximum likelihood match of
corresponding features. We test our algorithm on public available datasets
and we compare its performance with that of warping function based methods.
The accuracy and the number of aligned features are improved significantly
with our method. Contact: cuijian1001@gmail.com,michelle.zhang@utsa.edu ConclusionsIn this paper, we proposed a new method called Statistical
Corresponding Features Identification Algorithm (SCFIA) to identify the
corresponding features in different datasets. We verify the algorithm on two
Super-SILAC datasets and the performance is better than the warping function
and OpenMS. The SCFIA is proved to be stable when
we choose different prophet score. Then we apply our SCFIA to three datasets
of two data groups. The first group is fraction data and second group is
replicate data. The result part shows that we can identify much more peptides
in three datasets than their intersection. Our algorithm is to figure out the
intervals of peptides in their union. In the future, we plan to focus on
peptide identification on multiple LC/MS datasets without LC-MS/MS
information. Data, figure, result and source code
· Data is available at https://proteomecommons.org/dataset.jsp?i=74476. · The first group is 20090608_Orbi6_TaGe_SA_TUMOR_5mix1_01.raw
(dataset Q1) 20090608_Orbi6_TaGe_SA_TUMOR_5mix1_02.raw
(dataset Q2) 20090608_Orbi6_TaGe_SA_TUMOR_5mix1_03.raw (dataset Q3) · The second group is 200090815_Velos5_TaGe_SA_Silacmix_TOP15_01.raw
(dataset Q1) 200090815_Velos5_TaGe_SA_Silacmix_TOP15_01.raw
(dataset Q2) 200090815_Velos5_TaGe_SA_Silacmix_TOP15_01.raw
(dataset Q3) ·
The demo code file: ·
Group1 Data X!tandem
verification demo ·
Group1 Data MaxQuant
verification demo ·
Group2 Data X!tandem
verification demo ·
Group2 Data MaxQuant
verification demo |