Skip to main content
U.S. flag

An official website of the United States government

Here’s how you know

Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

HTTPS

Secure .gov websites use HTTPS
A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

  • Environmental Topics
  • Laws & Regulations
  • Report a Violation
  • About EPA
Risk Assessment
Contact Us

Nationwide lake classification to predict harmful algal blooms (HABs) via machine learning models

On this page:

  • Overview
Detecting and managing HABs is of high importance, with efforts moving forward for both acute management through recreational advisories and long-term management to protect designated uses. While remote sensing technology has enabled quantification of HABs at frequencies and spatial scales that were previously infeasible (e.g., EPA’s CyAN Network), there is also a need to predict HABs in smaller lakes that are neither resolvable with satellite imagery nor sampled frequently. The overarching goal of this work was to forecast HABs using information about lake catchment characteristics, morphometry, climatic drivers, and nutrient sources. This project represented the first phase of this work, with objectives to (1) compile a database of publicly available data relevant to HAB abundance and related parameters, and (2) classify lakes into common conditions to inform future modeling efforts. The second phase of work will develop a forecasting model that will predict HAB abundance from a suite of watershed and lake variables. Nationwide data were compiled from the National Lakes Assessment, the LakeCat database, and the national nutrient inventories, encompassing in-lake water quality, topography, morphometry, and watershed metrics including land use, climate, and nutrient sources. The classification effort divided lakes across the U.S. into similar groups with respect to HAB abundance and nutrient responses via tree-based machine learning methods (CART and TREED regression). When supplied with 30 potential classification variables, the machine learning models identified the variables and their values to account for the most variability in responses. In each model, the machine learning algorithms identified up to 8 lake classes. Across models, common classification variables included climate (temperature and precipitation), topography (elevation and watershed slope), hydrology (lake maximum depth and groundwater influence), and agricultural nutrient inputs. The selected variables sometimes differed among models but were often correlated with one other. Further, the spatial patterns in classification results illuminated common geographic category membership across models despite selection of different predictor variables. This machine learning approach provides a data-driven method to classify HAB abundance and nutrient responses in lakes across a wide geographic distribution and forms a foundation for predictive modeling that provides nuance and flexibility beyond a one-size-fits-all approach.

Impact/Purpose

Detecting and managing HABs is of high importance, with efforts moving forward for both acute management through recreational advisories and long-term management to protect designated uses. While remote sensing technology has enabled quantification of HABs at frequencies and spatial scales that were previously infeasible (e.g., EPA’s CyAN Network), there is also a need to predict HABs in smaller lakes that are neither resolvable with satellite imagery nor sampled frequently. This project compiled a database of publicly available data relevant to HAB abundance and related parameters, and classified lakes into common conditions to inform future modeling efforts. using information about lake catchment characteristics, morphometry, climatic drivers, and nutrient sources. The classification effort divided lakes across the U.S. into similar groups with respect to HAB abundance and nutrient responses via tree-based machine learning methods (CART and TREED regression).

Citation

Salk, K., M. Fernandez, B. Pickard, S. Lee, Jim Carleton, M. Pennino, AND R. Sabo. Nationwide lake classification to predict harmful algal blooms (HABs) via machine learning models. National Monitoring Conference, Virginia Beach, VA, April 24 - 28, 2023.
  • Risk Assessment Home
  • About Risk Assessment
  • Risk Recent Additions
  • Human Health Risk Assessment
  • Ecological Risk Assessment
  • Risk Advanced Search
    • Risk Publications
  • Risk Assessment Guidance
  • Risk Tools and Databases
  • Superfund Risk Assessment
  • Where you live
Contact Us to ask a question, provide feedback, or report a problem.
Last updated on September 26, 2023
United States Environmental Protection Agency

Discover.

  • Accessibility Statement
  • Budget & Performance
  • Contracting
  • EPA www Web Snapshots
  • Grants
  • No FEAR Act Data
  • Privacy
  • Privacy and Security Notice

Connect.

  • Data
  • Inspector General
  • Jobs
  • Newsroom
  • Open Government
  • Regulations.gov
  • Subscribe
  • USA.gov
  • White House

Ask.

  • Contact EPA
  • EPA Disclaimers
  • Hotlines
  • FOIA Requests
  • Frequent Questions

Follow.