Skip to main content
U.S. flag

An official website of the United States government

Here’s how you know

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

HTTPS

Secure .gov websites use HTTPS
A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Environmental Topics
  • Laws & Regulations
  • Report a Violation
  • About EPA
Risk Assessment
Contact Us

An Environmental Health Vocabulary and its Semi-Automated Curation Workflow

On this page:

  • Overview
  • Downloads
The environmental health vocabulary (EHV) represents manually curated terminologies developed by the US Environmental Protection Agency’s (EPA) Chemical Pollutant Assessment Division (CPAD) for standardizing reporting of health effect information. Recognizing that manual data curation is a resource bottleneck, a semi-automated curation workflow was realized. The objectives of this work are to describe the manual creation of the EHV and improve the efficiency of manual data curation by implementing a new semi-automated curation workflow that minimizes manual review using a sequence of computational text analysis and quality assurance/quality control (QA/QC) steps with a high level of accuracy. To facilitate semi-automated curation a sequence of computational text analysis and manual steps were developed. Described are (1) a series of computational text processing steps to normalize and match extracted terms to the EHV, (2) a QA step of the computationally identified matches; (3) a manual review of unmatched terms; and (4) curation of the EHV that includes completion of missing hierarchical data and related metadata. The EHV was manually created to promote data aggregation, integration, accessibility and transparent data exchange across EPA partners by normalizing the data extracted into the EPA Health Assessment Workplace Collaborative (HAWC). The workflow described here removes the manual curation bottleneck by transforming data curation into a streamlined semi-automated process powered by computational text processing steps. This semi-automated curation method offers several advantages to the environmental health community including (but not limited to) efficiency by automating repeating data management tasks, scalability to a large volume of terms and terminology resources, and better integration with other data sets and artificial intelligence (AI) and machine learning (ML) models.

Impact/Purpose

This sub-product is documentation of the curation workflow for EHV as implemented in HAWC.

Citation

Angrish, M., S. Burns, J. Cleland, C. Foster, S. Kovach, K. Markey, B. Schultz, A. Shapiro, M. Taylor, G. Woodall, AND S. Watford. An Environmental Health Vocabulary and its Semi-Automated Curation Workflow. Taylor & Francis Group, London, UK, 3(1):2485111, (2025). [DOI: 10.1080/2833373X.2025.2485111]

Download(s)

DOI: An Environmental Health Vocabulary and its Semi-Automated Curation Workflow
  • Risk Assessment Home
  • About Risk Assessment
  • Risk Recent Additions
  • Human Health Risk Assessment
  • Ecological Risk Assessment
  • Risk Advanced Search
    • Risk Publications
  • Risk Assessment Guidance
  • Risk Tools and Databases
  • Superfund Risk Assessment
  • Where you live
Contact Us to ask a question, provide feedback, or report a problem.
Last updated on April 07, 2026
United States Environmental Protection Agency

Discover.

  • Accessibility Statement
  • Budget & Performance
  • Contracting
  • EPA www Web Snapshots
  • Grants
  • No FEAR Act Data
  • Privacy
  • Privacy and Security Notice

Connect.

  • Data
  • Inspector General
  • Jobs
  • Newsroom
  • Open Government
  • Regulations.gov
  • Subscribe
  • USA.gov
  • White House

Ask.

  • Contact EPA
  • EPA Disclaimers
  • Hotlines
  • FOIA Requests
  • Frequent Questions

Follow.