Bayesian alignment model for LC-MS data analysis

Notes on the MATLAB codes

=================================================================================

We provide the implementation of single-profile and multi-profile alignment

methods, with and without using Gaussian process prior. Applications of the

methods to the proteomic and glycomic data sets are demonstrated in the

provided MATLAB codes:

-- Single-profile alignment (without Gaussian process prior):

http://omics.georgetown.edu/alignLCMS/bioinfo_proteo_sp.m

http://omics.georgetown.edu/alignLCMS/bioinfo_glyco_sp.m


-- Single-profile alignment (with Gaussian process prior):

http://omics.georgetown.edu/alignLCMS/bioinfo_proteo_gpsp.m

http://omics.georgetown.edu/alignLCMS/bioinfo_glyco_gpsp.m


-- Multi-profile alignment: (without Gaussian process prior):

http://omics.georgetown.edu/alignLCMS/bioinfo_proteo_mp4.m

http://omics.georgetown.edu/alignLCMS/bioinfo_glyco_mp4.m


-- Multi-profile alignment: (with Gaussian process prior):

http://omics.georgetown.edu/alignLCMS/bioinfo_proteo_gpmp4.m

http://omics.georgetown.edu/alignLCMS/bioinfo_glyco_gpmp4.m


Please download associated MATLAB data matrices:

-- Base peak chromatograms for single-profile alignmnet:

http://omics.georgetown.edu/alignLCMS/bpc_proteomics.mat

http://omics.georgetown.edu/alignLCMS/bpc_glycomics.mat


-- Binned chromatograms for multi-profile alignment:

http://omics.georgetown.edu/alignLCMS/data_matrix_proteomics.mat

http://omics.georgetown.edu/alignLCMS/data_matrix_glycomics.mat


We use the SIMA model to perform the peak matching step:

http://hci.iwr.uni-heidelberg.de/MIP/Software/sima.php


Please download the peak lists to be input to the SIMA model:

http://omics.georgetown.edu/alignLCMS/proteo_sima.zip

http://omics.georgetown.edu/alignLCMS/glyco_sima.zip


For performance evaluation of the peak matching results

-- Evaluation scripts:

http://omics.georgetown.edu/alignLCMS/eval_proteo.m

http://omics.georgetown.edu/alignLCMS/eval_glyco.m


-- Ground-truth data:

http://omics.georgetown.edu/alignLCMS/ground_truth_proteomics.mat

http://omics.georgetown.edu/alignLCMS/ground_truth_glycomics.mat


=================================================================================

In order to run the MATLAB codes, you need to have the following utilities:

------------------------- Attached functions ------------------------------------

coda() to compute MCQ values

priorRatioLn() to compute prior ratio when not using Gaussian process prior

maxind() to find local maxima in a sequence


The three functions are zipped and available at

http://omics.georgetown.edu/alignLCMS/utility.zip


-------------------- Functions from elsewhere ----------------------------------


inv_posdef(), randnorm(), scale_rows(), ndsum() from Tom Minka's Lightspeed toolbox,

available at http://research.microsoft.com/en-us/um/people/minka/software/lightspeed/


randraw() from File Exchange at MATLAB Central,

available at http://www.mathworks.com/matlabcentral/fileexchange/7309


bsplinebasis() from Scott Gaffney's CCToolbox,

available at http://www.ics.uci.edu/~sgaffney/software/CCT/


apcluster() by Frey Lab,

available at http://www.psi.toronto.edu/index.php?q=affinity%20propagation


GPML toolbox (v3.1) by Carl Edward Rasmussen and Hannes Nickisch

available at http://www.gaussianprocess.org/gpml/code/matlab/doc/index.html