Informing environmental health policy on complex mixtures - what we need vs. what we currently get from machine learning methods:development of semi-automated data extraction models for human health assessments.
On this page:
EPA is working to streamline the human effort required to identify and extract salient data from environmental epidemiology studies and summarize information on study design and results. These streamlining efforts are being achieved by adopting automated and semiautomated processes for data extraction such as Named Entity Recognition (NER) and various machine learning techniques. This presentation discusses an approach for developing extraction algorithms which overviews the use of extraction software tools to annotate studies to create the datasets used to train the extraction algorithms (to identify epidemiologic data of interest), the issues faced in that process and the next steps for developing the data extraction model.