Multi-profile Bayesian Alignment Model for LC-MS Data Analysis with Integration of Internal Standards
Tsung-Heng Tsai1,2, Mahlet G. Tadesse3, Cristina Di Poto1, Lewis K. Pannell4, Yehia Mechref5, Yue Wang2, and Habtom W. Ressom1
1Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC.
2Department of Electrical and Computer Engineering, Virginia Tech, Arlington, VA.
3Department of Mathematics and Statistics, Georgetown University, Washington, DC.
4Mitchell Cancer Institute, University of South Alabama, Mobile, AL.
5Department of Chemistry and Biochemistry, Texas Tech University, Lubbock, TX.
Liquid chromatography-mass spectrometry (LC-MS) has
been widely used for profiling expression levels of biomolecules
in various "-omic" studies including proteomics, metabolomics and
glycomics. Appropriate LC-MS data preprocessing steps are needed
to detect true differences between biological groups. Retention time
alignment is one of the most important yet challenging preprocessing
steps, in order to ensure that ion intensity measurements among
multiple LC-MS runs are comparable. Current alignment approaches
estimate retention time variability using either single chromatograms
or detected peaks, whereas complementary information embedded
in the LC-MS data is often overlooked.
We propose a Bayesian alignment model (BAM) for
LC-MS data analysis. The alignment model provides estimates
of the retention time variability along with uncertainty measures.
The model enables integration of multiple sources of information
including internal standards and clustered chromatograms. We apply
the model to LC-MS metabolomic, proteomic and glycomic data.
The performance of the model is evaluated based on ground-truth
data, by measuring correlation of variation, retention time difference
across runs, and peak matching performance. We demonstrate
that the BAM improves significantly the retention time alignment
performance through integration of relevant information such as
internal standards and clustered chromatograms in a mathematically
This webpage provides the data sets,
the Matlab codes,
and the supplementary information to the main paper.