Enhancing Evidence Interpretation and Database Integration Via Semantic Matching
On this page:
As part of implementing systematic review, the US Environmental Protection Agency’s (EPA) Integrated Risk Information System (IRIS) program extracts data from ~150 studies per year across 15-20 chemical assessments that are in the development phase. These data are stored in the Health Assessment and Workspace Collaborative (HAWC, https://hawcprd.epa.gov/about/ ) a free, open-source, and web-based application. Data extraction of author reported health findings have introduced a data consistency and semantic challenge because terms reported by authors are inconsistent (e.g., cytotoxicity, cell death; programmed cell death; and cell viability). Inconsistent language may lead to duplication and/or misinterpretation of study findings, make it difficult to efficiently retrieve information from HAWC, and pose a significant barrier to data exchange across different databases used to store toxicity findings.
To address these data inconsistencies, the author reported terms managed within EPA HAWC were matched to ontologies and ontology classes within Bioportal (https://bioportal.bioontology.org/ (a comprehensive repository of medical ontologies) to create a controlled vocabulary and ontology useful for expressing relationships between terms. The results (between the input [author term] and Bioportal ontology classes) were scored as: 1 = perfect match, 0.5 = synonym, and other values (0–1) for partial matches. The matching process returns other parameters (e.g. ontology, preferred name, synonym, class definition, class parent, parent definitions) that were used along with the numerical score to annotate author terms into a HAWC controlled vocabulary. The controlled vocabulary is critically important to unify study data managed by the HAWC database, whereas ontologies are used to query the database for relationships between those terms. The result is increased transparency and consistency in identifying and retrieving pertinent evidence during evidence synthesis. The EPA HAWC vocabulary and ontology are interoperable with other databases such as the Adverse Outcome Pathway (AOP) knowledge base and by class matching and ontology mapping can be integrated and used for advanced querying of potential relationships between exposure and outcome. The views expressed in this abstract are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA.
To address these data inconsistencies, the author reported terms managed within EPA HAWC were matched to ontologies and ontology classes within Bioportal (https://bioportal.bioontology.org/ (a comprehensive repository of medical ontologies) to create a controlled vocabulary and ontology useful for expressing relationships between terms. The results (between the input [author term] and Bioportal ontology classes) were scored as: 1 = perfect match, 0.5 = synonym, and other values (0–1) for partial matches. The matching process returns other parameters (e.g. ontology, preferred name, synonym, class definition, class parent, parent definitions) that were used along with the numerical score to annotate author terms into a HAWC controlled vocabulary. The controlled vocabulary is critically important to unify study data managed by the HAWC database, whereas ontologies are used to query the database for relationships between those terms. The result is increased transparency and consistency in identifying and retrieving pertinent evidence during evidence synthesis. The EPA HAWC vocabulary and ontology are interoperable with other databases such as the Adverse Outcome Pathway (AOP) knowledge base and by class matching and ontology mapping can be integrated and used for advanced querying of potential relationships between exposure and outcome. The views expressed in this abstract are those of the authors and do not necessarily reflect the views or policies of the U.S. EPA.