Data Standardization for Improved Data Discovery
On this page:
Scientific information provides the foundational elements needed to support scientific research and decision frameworks. Yet sorting through the vast amount of information to find the relevant and high-quality data remains a challenge. With the internet at our fingertips and artificial intelligence (AI) tools in hand, we are no doubt immersed in an era of information overload. From large-scale epidemiological experiments to multi-omics new approach methodologies, the potential to create new data seems limitless while its application in rapidly developing AI interfaces holds little boundary. And yet, for many of our key decision-making processes, finding needed data remains a challenge. Why? Because language matters in data discovery (and documentation), and when language is not standardized information can be difficult to locate and access leading to impediments in application and learning by both machines and humans. Therefore, this presentation covers language-based and computationally intelligent solutions to data discovery (and reporting) currently used during human health assessment workflows. Illustrative case studies include AI/ML techniques for 1) data search strategies, 2) labeling and categorization of search results, and 3) the importance of data standardization. Additional key themes are embedded such that participants will learn about the importance of community as well as community-based adoption of best practices in the development and implementation of data standards while adhering to the broader concepts of FAIR (Findable, Accessible, Interoperable, and Reusable) principles of data management and stewardship. The presentation will close with hints at future AI and machine learning (ML) applications in data discovery workflows implemented by chemical assessment teams, while encouraging participants to consider opportunities for engagement within the broader environmental health science language community. The views expressed are those of the authors and do not necessarily represent the views or policies of the US Environmental Protection Agency.