Method Article
This article describes AMOS, the web-based Analytical Methods and Open Spectra database, a cheminformatics application designed to provide researchers with easy access to analytical methods and spectral data.
Analytical methods can range from detailed regulatory documents to simpler summaries. Regulatory methods may include information on amenable analytes, supported matrices, required reagents, statistical performance, interlaboratory validation, and other specifics. Summaries typically provide a general overview of reagents, instrumentation, and often a short list of analytes. Analytical methods from U.S. government bodies, including the U.S. Environmental Protection Agency (USEPA), U.S. Geological Survey (USGS), U.S. Department of Agriculture (USDA), the Food and Drug Administration (FDA), and others, offer detailed procedural information. Instrument vendors such as Agilent, Shimadzu, Thermo Fisher Scientific, Sciex, and others also provide access to hundreds of application notes, which may be considered summary methods. This study has developed a cheminformatics-enabled database of methods in which chemicals are extracted from method documents, with identifiers (names and/or Chemical Abstracts Service registry numbers (CASRN)) mapped to chemical structures. The resulting database, containing approximately 7,000 methods, is searchable by identifier, chemical structure, and structural similarity, and is supplemented by approximately one million public domain spectra (LC/MS, GC/MS, NMR, and IR). The application supports searching of analytical methods and filtering based on analytes, functional usage, method sources, and other related metadata.
Web-based delivery of chemistry data to the community is exemplified by applications such as PubChem1, ChemSpider2, and the CompTox Chemicals Dashboard (CCD)3. Efforts have been made to circulate analytical method details published in journal articles, released by instrument vendors as technical application notes, provided by government agencies as standard operating procedures or regulatory methods, and issued by standards organizations such as the International Organization for Standardization (ISO). Tens of thousands of chemicals have been studied by these sources under a wide range of conditions and analytical techniques. This extensive body of sources covers diverse substances and includes scenarios ranging from the quantification of a single chemical in a specific matrix (e.g., blood), to mixtures of pesticides and their residues in specific crops, to hundreds of chemicals identified in drinking water. While many analytical methods can be discovered via public search engines, not all are freely available or open-access.
Locating specific information of interest can be challenging. General-purpose search engines are not optimized for chemistry data, and their ranking algorithms may obscure high-quality content intended for narrow audiences. Searches across journal websites can yield more targeted results, but access is often restricted, with only abstracts publicly available, making it difficult to assess a method's usefulness. Furthermore, critical parameters-such as sample matrices, limits of detection, and quantitation-are often not stored in a structured format. Another significant challenge lies in the variation and inconsistency of chemical identifiers, names, and synonyms associated with a single chemical. The lack of structured methods data limits the development of software tools that could leverage decades of accumulated analytical chemistry knowledge and related publications.
As a result of these challenges and limitations, there is a need for a curated, chemistry-oriented application for harmonizing and searching analytical methods-one that was not identified elsewhere. To address this gap, the U.S. Environmental Protection Agency developed AMOS, the Analytical Methods and Open Spectra database and web-based application. AMOS currently collects and organizes three types of data records: analytical methods, various analytical spectra, and a broad category of supplemental documents collectively referred to as fact sheets. Each record is linked to the method's target chemical analytes and reagents. The data are searchable in multiple ways, including by text queries, chemical structure, and structural or spectral similarity.
The AMOS application primarily focuses on delivering open access and open data records. Where possible, records in the database are hyperlinked to their original sources. Records not under open licensing and therefore not stored directly in the database can still be integrated and accessed via URL, provided they are otherwise available. This applies to two types of records: analytical methods that are behind paywalls, typically from journals or standards organizations to which the EPA has access, and spectra that are available but require login access.
Data sources vary in how records are structured, necessitating substantial effort in extraction and curation to assemble and harmonize the content. Most records provide substance identifiers (e.g., CASRN, DTXSID, InChIKey, common names), and in many cases, extraction is straightforward. However, matching these identifiers to chemical structures and substance details can be complex. Some identifiers can be directly matched to entries in the EPA's Distributed Structure-Searchable Toxicity (DSSTox) database4; when matches are not found, identifiers are linked to existing substances, or new substances are registered. The AMOS initiative has consequently led to the expansion of the DSSTox database, improving the foundational data supporting other EPA databases and applications, such as the CompTox Chemicals Dashboard3.
Manual curation is required for certain valuable additional information. For analytical methods, experimental parameters such as limits of detection and quantitation, sample matrix, and analytical methodology are not organized in a standardized way, and automated tools cannot identify this information due to its inconsistent storage.
Two elements of record information, the media associated with the sample and the functional use of the analyte, are highly relevant to ongoing efforts to monitor hazard and exposure concerns from contaminants. As such, considerable attention was given to structuring these attributes within the record data. An ontology of functional use classifications was developed for this project. This ontology organizes the functional uses of substances into a hierarchical structure, ranging from more general 'parent' uses to more specific 'child' uses. The ontology facilitates the exploration of substances from an application perspective, supporting research initiatives that emphasize functional uses as a means to assess exposure and hazard5,6. Additionally, methods were labeled according to their samples' harmonized media category, as specified in the EPA's multimedia monitoring database (MMDB)7. This categorization enables the search for chemicals based on their occurrence in specific media, streamlining the development of solutions focused on detecting chemicals in specific environmental or biological samples. These annotations enhance the integration of AMOS into exposure- and hazard-oriented workflows under development within the EPA.
In assembling the spectra, the challenge of processing various file formats-some of which are only nominally standardized-and parsing accompanying metadata often requires custom handling. In cases where spectral collections are linked to a publication, details documented within the publication may need to be manually extracted for data loading. This effort has resulted in a database that integrates and structures these disparate spectra, allowing researchers to avoid the need for laborious curation in future endeavors.
As of March 2025, the database contains approximately 935,000 spectra, with nearly 99% being mass spectra and smaller collections of NMR (~2,000) and IR (~400). Additionally, there are approximately 770,000 externally linked spectra (connected to the SpectraBase database8), ~36,000 fact sheets, and ~7,400 analytical methods. The substances integrated into the application are a subset of those from the DSSTox database, which is incorporated into the CompTox Chemicals Dashboard (CCD) and contains over 1.2 million substances.
The majority of AMOS's functionality can be broken down into three categories: searching for records for given substances, searching for certain collections of substances, or searching among categories of records. The individual pages for these functionalities can all be accessed from the navigation bar at the top of every page. The application is currently deployed at https://hcd.rtpnc.epa.gov/#/ via the AMOS module. The software tools utilized in this study are listed in the Table of Materials.
1. Searching for records for specific substances
Figure 1: Search results for records containing cholesterol. A general search for "cholesterol" displays a list of matching records in the table (left). A selected record's mass spectrum is shown on the right. Please click here to view a larger version of this figure.
Figure 2: Batch search interface. The search field contains two substances identified by their DTXSIDs. Default search options are selected for the query. Please click here to view a larger version of this figure.
Figure 3: Structure search results for 1P-LSD. The table lists methods containing structurally similar substances. A selected method is displayed on the right. No bolded entries in the table indicate that 1P-LSD does not appear in any listed method. Please click here to view a larger version of this figure.
2. Searching for substances
Figure 4: ClassyFire classification search results. Results include substance-level information and the number of records per classification group. Please click here to view a larger version of this figure.
Figure 5: Partial identifier search results for "trazine." The search retrieves substances with preferred names or synonyms containing the substring "trazine." Two of the three results include "trazine" only in their synonyms, not their preferred names. Please click here to view a larger version of this figure.
3. Searching through records
Figure 6: Filtered list of analytical methods. The table is filtered by analyte and matrix, displaying only methods related to PFAS (per- and polyfluoroalkyl substances) in water. The corresponding list of fact sheets closely resembles this layout. Please click here to view a larger version of this figure.
Figure 7: Spectrum similarity search results. A caffeine spectrum from the AMOS database is used as the input. Similar spectra are grouped by substance, with a maximum similarity score of 1.0. The mirrored plot shows the input spectrum (top) and a selected database spectrum (bottom). Light blue peaks are unique to the input, orange peaks to the database match, and dark blue peaks are shared. Please click here to view a larger version of this figure.
Figure 8: Functional use classification visualization. The hierarchical structure is shown with the cursor hovering over the "industrial chemicals" node (outlined in yellow). Its child classes are outlined in green. Please click here to view a larger version of this figure.
Figure 9: Soil ternary plot visualization. The plot displays compositional data for soil samples. A tooltip in the top-right shows the precise composition of the region currently under the cursor. Please click here to view a larger version of this figure.
The screenshots of AMOS shown above show typical outcomes from the individual searches in the application, including both searches for substances of interest and among spectra, fact sheets, and methods. The variety of ways of interrogating the database is intended to cover the most likely and most useful kinds of searches in ways that allow for deeper investigation into the data and the substances that they relate to.
To assist a user's searching, much of the functionality is interconnected in ways intended to support deeper examination of the available data. As an example workflow, the functional use classification visualization links to views of the methods and fact sheets that are related to that functional class, from which lists of substances can be extracted and fed into the batch search, or individual documents can be examined, and individual substances in those documents can be investigated further. Since many substances in methods also have experimental mass spectra in the database, this can allow a researcher to quickly go from a category of substances to a set of methods and spectra that can test for the presence of a specific substance (see Figure 9).
Since the results will depend heavily on what is being searched for and which search or searches are run, representative results for the entire application are difficult to define. Overall, it may be more accurate to describe a "success" in terms of user experience; in that case, it is hoped that the following will generally hold true: That the methods of searching and filtering (and the ability to move between different searches and filters) are effective at identifying which subsets of information a user wants; that the results that the user finds are accurate and useful. Figure 10 depicts an example workflow demonstrating AMOS functionalities.
Figure 10: Example workflow demonstrating AMOS functionalities. The workflow begins with a functional use classification (respiratory drugs), filters methods related to respiratory drugs in blood, examines one specific method, and identifies spectra for a substance included in that method. Please click here to view a larger version of this figure.
While many projects and applications focus on collecting and standardizing information from a single type of record, such as methods, fact sheets, or a specific kind of spectra, AMOS is the first tool identified that compiles and integrates large volumes of information across multiple record types. The unification, harmonization, and structuring of data from these diverse sources result in a database that can be more readily incorporated into workflows requiring access to analytical chemistry methodologies. The ability to search the database in several complementary ways enables efficient retrieval of information that might otherwise require extensive manual effort across multiple websites or tools.
Before public release, the utility of AMOS was demonstrated through its use by EPA staff to support a wide range of projects. The EPA has a sustained interest in the application of mass spectrometry for non-targeted analysis10,11, and multiple initiatives have leveraged the experimental mass spectra in AMOS to enhance searches against a large in silico spectral library generated from DSSTox chemicals12,13. Other projects have used structural similarity searching to identify starting points for developing new methods, examined existing methods to assess detection and quantitation limits, and analyzed collections of chemicals linked to methods to evaluate the extent of chemical space coverage.
AMOS' aggregation of potential training data further supports the development of quantitative models of amenability for analytical methods14, a core need for advancing non-targeted analysis (NTA) workflows. The curation efforts within AMOS also facilitate initiatives to model, explore, and visualize chemical spaces associated with methodological coverage14.
While the core functionality of AMOS is mature, ongoing development is guided by user feedback. Current tasks include the incorporation of additional data, curation of further metadata for enhanced filtering, and expansion of search capabilities. In collaboration with EPA stakeholders, application programming interfaces (APIs) are under development to enable programmatic access, addressing use cases where the graphical user interface (GUI) may be inefficient. A release notes page has been integrated into the application to track and communicate code updates over time.
New data records and chemicals are currently added on a weekly basis; however, a slower release schedule is anticipated following the public launch. While significant effort is made to ensure the accuracy of records and associated metadata, much of the data originates from public databases. As such, complete verification of every record is not feasible, and users should be aware that absolute data accuracy cannot be guaranteed.
This paper does not necessarily represent the views or policies of the U.S. Environmental Protection Agency.
The authors thank the curation team for all their work curating chemicals for the database, and Joshua Powell, Asif Rashid, and Freddie Valone for technical support in the construction and deployment of AMOS. We also thank Charles Lowe for his review of the manuscript.
Name | Company | Catalog Number | Comments |
Git | N/A | https://git-scm.com/ | Open-source version control system. |
JavaScript | N/A | https://ecma-international.org/publications-and-standards/standards/ecma-262/ | Programming language. Defined by ECMA International standards. |
PostgreSQL | PostgreSQL Global Development Group | https://postgresql.org/about/licence | Open-source database management system. |
Python | Python Software Foundation | https://www.python.org/ | Open-source programming language. |
Request permission to reuse the text or figures of this JoVE article
Request PermissionThis article has been published
Video Coming Soon
Copyright © 2025 MyJoVE Corporation. All rights reserved