Mass-based search is an important step for metabolite identification in mass-spectrometry-based metabolomic analysis. The mass-to-charge ratio (m/z) value of a molecular ion of interest is searched against metabolite database(s). The metabolites having molecular weights within a specified tolerance to the query m/z value are retrieved from the databases as putative identifications. These putative identifications serve as a foundation for further metabolite verification. In addition to searching with m/z values only, the ion annotation information can be used to aid the mass-based search. Ion annotation groups the ions originating from the same metabolite together and annotates them as adducts/isotopes/in-source fragments. R package CAMERA (Collection of Algorithms for MEtabolite pRofile Annotation) was previously developed for ion annotation by Kuhl etc (Carsten Kuhl etc. CAMERA: Collection of annotation related methods for mass spectrometry data. R package version 1.10.0.). Using the ion annotation information, the appropriate mass values of ions can be calculated. Then the calculated mass values are searched against databases. This approach is expected to improve the accuracy for metabolite identification.
We developed MetaboSearch to perform mass-based metabolite search simultaneously against the four major metabolite databases: Human Metabolome DataBase (HMDB), Madison Metabolomics Consortium Database (MMCD), Metlin, and LipidMaps. The search results from these databases are integrated into a uniformly and non-redundant format based on IUPAC International Chemical Identifier (InChI) key. Chemical identifier information is provided by the software for effective reference to metabolites. Cross-referencing across multiple databases is performed when a particular identifier type is missing from a database. The comprehensive list of chemical identifiers includes PubChem Compound ID (CID), PubChem Substance ID (SID), HMDB ID, KEGG ID, InChI string, and InChI key. MetaboSearch performs mass-based search using a given list of m/z values. In addition, it can utilize ion annotation information for improved metabolite identification, as long as the ion annotation information is provided according to the CAMERA output format.
Zhou B, Wang J, Ressom HW (2012). MetaboSearch: Tool for mass-based metabolite identification using multiple databases. PLoS One 7(6): e40096. doi:10.1371/journal.pone.0040096.PMID:22768229
Two datasets described in this manuscript can be downloaded here (64bit compressed package, or 32bit compressed package).
User Guide
MetaboSearch accepts two types of inputs: a list of m/z values or ion annotation information along with m/z values, as acquired from CAMERA.
1. A list of m/z values:
Figure presents the input data format for a list of m/z values. The first row contains the header information for each column. The "mz" column has the m/z values of the peaks. This column is compulsory. Optionally, the retention times can be put under the column "rt" and they will be preserved in the software output, although they are not used for identification. The input m/z values can be submitted by copying and pasting them directly onto the input area of MetaboSearch or uploading the data file through the "Browse" button.
2. Ion annotation information
Figure presents an output file from CAMERA and this file can be directly loaded into the software through the "Browse" button. The columns "mz" and "rt" contain the observed m/z values and the retention times for each peak. The last three columns are "isotopes", "adduct" and "pcgroup" which contain the ion annotation information acquired through CAMERA.
Figure demonstrates the interface of MetaboSearch. The top panel shows the four databases used for search (HMDB, Metlin, MMCD, LipidMaps). The user has the option to include or exclude any particular database from the search. Middle left panel is the input area for data uploading and searching parameters. Middle right panel displays the intermediate results. The bottom panel displays the running status and the progress.
1. Set the ionization mode of the data by checking either 'positive' or 'negative'. Set the m/z tolerance in ppm.
2. Paste m/z values in the input area or upload them from a local file through the 'Browse' button
3. Press the 'Submit' button.
4. Once the database search is finished, a dialog box pops up to remind the user to export the result by clicking the 'Export to Local' button. The integrated results from multiple databases are exported in Excel (.xls) format. After export, another pop-up dialog box asks the user if he/she wants to open the exported Excel file immediately. The user can choose "Yes" to open the file or "No" to review the file later.
The output file consists of the following 16 columns:
1. Query_ID is the order number of the input m/z value.
2. Query_MZ is the input m/z value.
3. Input_RT is the input retention time if the retention times are provided.
4. Name is the common name of the metabolite.
5. Formula is the elemental formula of the metabolite.
6. Exact_Mass is the monoisotopic neutral mass of the metabolite.
7. KEGG is the compound ID in KEGG database.
8. PubChem SID is the substance accession identifier (SID) in PubChem database.
9. PubChem CID is the compound accession identifier (CID) in PubChem database.
10. HMDB ID is the accession number in HMDB database.
11. Database is the name(s) of database(s) from which the metabolite is retrieved.
12. dppm is the difference between the query mass and the metabolite mass, in ppm.
13. Delta is the difference between the query mass and the metabolite mass, in Dalton.
14. Number of possible stereoisomers is the possible number of stereoisomers as determined from InChI keys.
15. InChI String is the standard InChI string of the metabolite.
16. InChI Key is the standard InChI Key of the metabolite.
When the ion annotation information is used for identification, it is presented in the output file in these additional columns:
1. Peak No is the order number of the input peak. Each peak will have a number according to its appearance order in the input file.
2. m/z is the observed m/z value for the peak.
3. RT is the retention time for the peak.
4. Isotopes is the annotated isotope information from CAMERA.
5. Adducts is the annotated adduct and in-source fragment information from CAMERA.
6. Monoisotopic m/z is the monoisotopic, protonated (or deprotonated) m/z value for the peak, which is calculated based on its m/z value and ion annotation information. The monoisotopic m/z is used in database search for putative identifications.
7. Metabolite_group_ID is the order number of the metabolite. Peaks with the same Metabolite_group_ID are considered as originating from the same metabolite.
Questions & Answers
A: This could be addressed by one of the following methods:
1. Installing the appropriate JRE for your operating system.
2. If more than one Java versions are installed (e.g. having Java 6 and Java 7 at the same time), uninstall the unused Java and keep only one version.
3. If you have opened MetaboSearch directly from the web browser, you can try to download MetaboSearch to your local drive and then run the program from the local machine.
A: Open 'Application'->'terminal' and type 'Java -version'.
A: There are two ways to install the Sun JRE:
1. Perform the 'Software Updata' under Apple logo menu. For more information, go to Apple's Updating your software page. http://support.apple.com/kb/HT1338?viewlocale=en_US
2. Download the appropriate version of Java from Apple's Download page (http://support.apple.com/kb/HT1338?viewlocale=en_US)
After installing the plug-in or upgrading, close all browser windows. Reboot the computer.
A: No, you can't. Please use 'Command + v' to paste the m/z value into input area.
A: Hot-keys such as "ctrl+v" are not supported in MetaboSearch at this moment. Please use the right-button of the mouse for copying and pasting.
A: Please de-select the unavailable database and re-submit the data.
A: The uploading and searching time of MetaboSearch depends mainly on the number of m/z values and the Internet connection speed. Please allow extra time if you have a long list of m/z values or your connection speed is slow. For the demo dataset, which has five to ten m/z values, it takes about five minutes to finish the searching. For datasets involving hundreds of m/z values, it may take several minutes to upload, and a few hours to finish searching the databases.
A: We use gedit to open the excel file of the final output result.
A: In the next version, we will use plug in technique to deal with their changes.
A: Save as your Excel file in xlsx format and try again. Sometimes there is a parsing error when reading tables saved or created with different versions of word/table processing tools.
A: You need to add an exception for MetaboSearch in your Java security settings. To do so, open your "Java Control Panel" (Windows: Start -> Java -> Configure Java) and from the "Security" tab, select "Edit Site List". Then click add to include "http://omics.georgetown.edu" in the list. Click OK and run MetaboSearch.
A: Current version of MetaboSearch can only accept the following adduct annotation rules in addition to isotope/multiple-charged ions/etc: "[M+H]+", "[M+2H]2+", "[M+Na]+", "[M+K]+", "[M+H-H20]+", "[M-H]-", "[M-2H]2-", "[M-2H+Na]-", "[M-2H+K]-", "[M-H-H20]-". We are working to update the set of annotation rules. Meanwhile, you can download and use our updated CAMERA parser developed in MATLAB.
Your comments and feedbacks are highly appreciated. Please submit your comments/feedbacks here. If you have a technical question for the software, please contact hwr@georgetown.edu.