Nationwide lake classification to predict harmful algal blooms (HABs) via machine learning models
On this page:
Detecting and managing HABs is of high importance, with efforts moving forward for both acute management through recreational advisories and long-term management to protect designated uses. While remote sensing technology has enabled quantification of HABs at frequencies and spatial scales that were previously infeasible (e.g., EPA’s CyAN Network), there is also a need to predict HABs in smaller lakes that are neither resolvable with satellite imagery nor sampled frequently. The overarching goal of this work was to forecast HABs using information about lake catchment characteristics, morphometry, climatic drivers, and nutrient sources. This project represented the first phase of this work, with objectives to (1) compile a database of publicly available data relevant to HAB abundance and related parameters, and (2) classify lakes into common conditions to inform future modeling efforts. The second phase of work will develop a forecasting model that will predict HAB abundance from a suite of watershed and lake variables. Nationwide data were compiled from the National Lakes Assessment, the LakeCat database, and the national nutrient inventories, encompassing in-lake water quality, topography, morphometry, and watershed metrics including land use, climate, and nutrient sources. The classification effort divided lakes across the U.S. into similar groups with respect to HAB abundance and nutrient responses via tree-based machine learning methods (CART and TREED regression). When supplied with 30 potential classification variables, the machine learning models identified the variables and their values to account for the most variability in responses. In each model, the machine learning algorithms identified up to 8 lake classes. Across models, common classification variables included climate (temperature and precipitation), topography (elevation and watershed slope), hydrology (lake maximum depth and groundwater influence), and agricultural nutrient inputs. The selected variables sometimes differed among models but were often correlated with one other. Further, the spatial patterns in classification results illuminated common geographic category membership across models despite selection of different predictor variables. This machine learning approach provides a data-driven method to classify HAB abundance and nutrient responses in lakes across a wide geographic distribution and forms a foundation for predictive modeling that provides nuance and flexibility beyond a one-size-fits-all approach.