Proof‑of‑concept for using machine learning to facilitate data extraction for human health chemical assessments: a study protocol
Background
Systematic review (SR) methods are relied upon to develop transparent, unbiased, and standardized human health chemical assessments. The expectation is that these assessments will have discovered and evaluated all of the available information in a trackable, transparent, and reproducible manner inherent to SR principles. The challenge is that chemical assessment development relies on mostly literature-based data using manual approaches that are not scalable. Various SR tools have increased the efficiency of assessment development by implementing semi-automated approaches (human in the loop) for data discovery (literature search and screening) and enhanced data repositories with standardized data collection and curation frameworks. Yet filling these repositories with data extractions has remained a manual process and connecting the various tools together in one interoperable workflow remains challenging.
Objectives
The objective of this protocol is to explore incorporation of a semi-automated data extraction tool (Dextr) into a chemical assessment workflow and understand if the new tool improves overall user experience.
Methods
The workflow will use template systematic evidence map (SEM) methods developed by the Environmental Protection Agency for the identification of included studies. The methods described focus on the data extraction component of the workflow using a fully manual or a semi-automated (human in the loop) data extraction approach. Both the manual and semi-automated data extractions will occur in Dextr. The new data extraction tool will be evaluated for user experience and whether the data extracted using the automated approach meets or exceeds metrics (precision, recall, and F1 score) for a fully manual data extraction.
Discussion
Artificial intelligence (AI) and machine learning (ML) methods have rapidly advanced and show promise in achieving operational efficiencies in chemical assessment workflows by supporting automated or semi-automated SR methods, possibly improving the user experience. Yet incorporating advances into sustainable workflows has remained a challenge. Whether using a tool like Dextr improves operational efficiencies and the user experience remains to be determined.