Metabosearch
Introduction
Mass-based search is an important step for metabolite identification in mass-spectrometry-based metabolomic analysis. The mass-to-charge ratio (m/z) value of a molecular ion of interest is searched against metabolite database(s). The metabolites having molecular weights within a specified tolerance to the query m/z value are retrieved from the databases as putative identifications. These putative identifications serve as a foundation for further metabolite verification. In addition to searching with m/z values only, the ion annotation information can be used to aid the mass-based search. Ion annotation groups the ions originating from the same metabolite together and annotates them as adducts/isotopes/in-source fragments. R package CAMERA (Collection of Algorithms for MEtabolite pRofile Annotation) was previously developed for ion annotation by Kuhl etc (Carsten Kuhl etc. CAMERA: Collection of annotation related methods for mass spectrometry data. R package version 1.10.0.). Using the ion annotation information, the appropriate mass values of ions can be calculated. Then the calculated mass values are searched against databases. This approach is expected to improve the accuracy for metabolite identification.
We developed MetaboSearch to perform mass-based metabolite search simultaneously against the four major metabolite databases: Human Metabolome DataBase (HMDB), Madison Metabolomics Consortium Database (MMCD), Metlin, and LipidMaps. The search results from these databases are integrated into a uniformly and non-redundant format based on IUPAC International Chemical Identifier (InChI) key. Chemical identifier information is provided by the software for effective reference to metabolites. Cross-referencing across multiple databases is performed when a particular identifier type is missing from a database. The comprehensive list of chemical identifiers includes PubChem Compound ID (CID), PubChem Substance ID (SID), HMDB ID, KEGG ID, InChI string, and InChI key. MetaboSearch performs mass-based search using a given list of m/z values. In addition, it can utilize ion annotation information for improved metabolite identification, as long as the ion annotation information is provided according to the CAMERA output format.
Citation
Zhou B, Wang J, Ressom HW (2012). MetaboSearch: Tool for mass-based metabolite identification using multiple databases. PLoS One 7(6): e40096. doi:10.1371/journal.pone.0040096.PMID:22768229
Two datasets described in this manuscript can be downloaded here (64bit compressed package, or 32bit compressed package).
User Guide
Input file format
MetaboSearch accepts two types of inputs: a list of m/z values or ion annotation information along with m/z values, as acquired from CAMERA.
1. A list of m/z values:
Figure presents the input data format for a list of m/z values. The first row contains the header information for each column. The "mz" column has the m/z values of the peaks. This column is compulsory. Optionally, the retention times can be put under the column "rt" and they will be preserved in the software output, although they are not used for identification. The input m/z values can be submitted by copying and pasting them directly onto the input area of MetaboSearch or uploading the data file through the "Browse" button.
2. Ion annotation information
Figure presents an output file from CAMERA and this file can be directly loaded into the software through the "Browse" button. The columns "mz" and "rt" contain the observed m/z values and the retention times for each peak. The last three columns are "isotopes", "adduct" and "pcgroup" which contain the ion annotation information acquired through CAMERA.
MetaboSearch interface
Figure demonstrates the interface of MetaboSearch. The top panel shows the four databases used for search (HMDB, Metlin, MMCD, LipidMaps). The user has the option to include or exclude any particular database from the search. Middle left panel is the input area for data uploading and searching parameters. Middle right panel displays the intermediate results. The bottom panel displays the running status and the progress.
Steps to run MetaboSearch
1. Set the ionization mode of the data by checking either 'positive' or 'negative'. Set the m/z tolerance in ppm.
2. Paste m/z values in the input area or upload them from a local file through the 'Browse' button
3. Press the 'Submit' button.
4. Once the database search is finished, a dialog box pops up to remind the user to export the result by clicking the 'Export to Local' button. The integrated results from multiple databases are exported in Excel (.xls) format. After export, another pop-up dialog box asks the user if he/she wants to open the exported Excel file immediately. The user can choose "Yes" to open the file or "No" to review the file later.
Output file format:
The output file consists of the following 16 columns:
1. Query_ID is the order number of the input m/z value.
2. Query_MZ is the input m/z value.
3. Input_RT is the input retention time if the retention times are provided.
4. Name is the common name of the metabolite.
5. Formula is the elemental formula of the metabolite.
6. Exact_Mass is the monoisotopic neutral mass of the metabolite.
7. KEGG is the compound ID in KEGG database.
8. PubChem SID is the substance accession identifier (SID) in PubChem database.
9. PubChem CID is the compound accession identifier (CID) in PubChem database.
10. HMDB ID is the accession number in HMDB database.
11. Database is the name(s) of database(s) from which the metabolite is retrieved.
12. dppm is the difference between the query mass and the metabolite mass, in ppm.
13. Delta is the difference between the query mass and the metabolite mass, in Dalton.
14. Number of possible stereoisomers is the possible number of stereoisomers as determined from InChI keys.
15. InChI String is the standard InChI string of the metabolite.
16. InChI Key is the standard InChI Key of the metabolite.
When the ion annotation information is used for identification, it is presented in the output file in these additional columns:
1. Peak No is the order number of the input peak. Each peak will have a number according to its appearance order in the input file.
2. m/z is the observed m/z value for the peak.
3. RT is the retention time for the peak.
4. Isotopes is the annotated isotope information from CAMERA.
5. Adducts is the annotated adduct and in-source fragment information from CAMERA.
6. Monoisotopic m/z is the monoisotopic, protonated (or deprotonated) m/z value for the peak, which is calculated based on its m/z value and ion annotation information. The monoisotopic m/z is used in database search for putative identifications.
7. Metabolite_group_ID is the order number of the metabolite. Peaks with the same Metabolite_group_ID are considered as originating from the same metabolite.
Questions & Answers
-Q1: MetaboSearch cannot start (the buttons do not work).
A: This could be addressed by one of the following methods:
1. Installing the appropriate JRE for your operating system.
2. If more than one Java versions are installed (e.g. having Java 6 and Java 7 at the same time), uninstall the unused Java and keep only one version.
3. If you have opened MetaboSearch directly from the web browser, you can try to download MetaboSearch to your local drive and then run the program from the local machine.
-Q2: How can I check the JRE version on Mac OS X?
A: Open 'Application'->'terminal' and type 'Java -version'.
-Q3: How to install Sun JRE in Mac OS?
A: There are two ways to install the Sun JRE:
1. Perform the 'Software Updata' under Apple logo menu. For more information, go to Apple's Updating your software page. http://support.apple.com/kb/HT1338?viewlocale=en_US
2. Download the appropriate version of Java from Apple's Download page (http://support.apple.com/kb/HT1338?viewlocale=en_US)
After installing the plug-in or upgrading, close all browser windows. Reboot the computer.
-Q4: Can I use right-button of the mouse to paste the m/z value in Mac OS?
A: No, you can't. Please use 'Command + v' to paste the m/z value into input area.
-Q5: The submit button remains grey after pasting the m/z values.
A: Hot-keys such as "ctrl+v" are not supported in MetaboSearch at this moment. Please use the right-button of the mouse for copying and pasting.
-Q6: What can I do if any of the queried databases is down or unavailable?
A: Please de-select the unavailable database and re-submit the data.
-Q7: MetaboSearch freezes during data uploading or database searching for an extended time.
A: The uploading and searching time of MetaboSearch depends mainly on the number of m/z values and the Internet connection speed. Please allow extra time if you have a long list of m/z values or your connection speed is slow. For the demo dataset, which has five to ten m/z values, it takes about five minutes to finish the searching. For datasets involving hundreds of m/z values, it may take several minutes to upload, and a few hours to finish searching the databases.
-Q8: Could I open the final result in the format of .xls in Linux?
A: We use gedit to open the excel file of the final output result.
-Q9: How do you handle the changes of the four databases MetaboSearch rely on?
A: In the next version, we will use plug in technique to deal with their changes.
-Q10: Why it does not work when I'm using MS Office Excel csv or xls format?
A: Save as your Excel file in xlsx format and try again. Sometimes there is a parsing error when reading tables saved or created with different versions of word/table processing tools.
-Q11: I updated my Java and it blocks MetaboSearch. How I can run it with the newer versions?
A: You need to add an exception for MetaboSearch in your Java security settings. To do so, open your "Java Control Panel" (Windows: Start -> Java -> Configure Java) and from the "Security" tab, select "Edit Site List". Then click add to include "http://omics.georgetown.edu" in the list. Click OK and run MetaboSearch.
-Q12: Does MetaboSearch support all the adduct annotation rules generated by CAMERA?
A: Current version of MetaboSearch can only accept the following adduct annotation rules in addition to isotope/multiple-charged ions/etc: "[M+H]+", "[M+2H]2+", "[M+Na]+", "[M+K]+", "[M+H-H20]+", "[M-H]-", "[M-2H]2-", "[M-2H+Na]-", "[M-2H+K]-", "[M-H-H20]-". We are working to update the set of annotation rules. Meanwhile, you can download and use our updated CAMERA parser developed in MATLAB.
Your comments and feedbacks are highly appreciated. Please submit your comments/feedbacks here. If you have a technical question for the software, please contact hwr@georgetown.edu.