Novel text analytics approach to identify relevant literature for human health risk assessments: A pilot study with health effects of in utero exposures
Systematic review methods improve the transparency and objectivity of literature-based evaluation in human health risk assessments. To adequately address human health effects for multiple organ systems across a broad dose range and various routes of exposure the approach to developing these assessments must evolve to accommodate the expansive literature that must be assessed to characterize risk fully. This is particularly true when evaluating cumulative health risks from multiple agents and stressors. Human health risk assessment literature searches assessing cumulative risk typically identify many studies (often tens of thousands of studies). Reviewing large bodies of literature in the context of resource constraints requires innovative approaches for identifying relevant literature for review and data extraction. We demonstrate that machine learning techniques including supervised clustering, a type of semi-supervised learning, and machine learning can be used to eliminate the need to manually screen most search results for a comprehensive search to identify human health impacts resulting from in utero exposure to environmental chemicals. Supervised methods of machine learning require training data that can be time consuming to gather. We use a novel approach for our initial training corpus that appropriates a readily available, expert-curated set of studies from US EPA's Integrated Risk Information System (IRIS) program. The machine learning techniques that we used were found to be comparable to expert review of literature and were demonstrated using a case study approach.