TAXONOMIC HARMONIZATION MAY REVEAL A STRONGER ASSOCIATION BETWEEN DIATOM ASSEMBLAGES AND TOTAL PHOSPHORUS IN LARGE DATASETS
On this page:
Diatoms have been collected in large-scale biological assessments, such as the U.S. Environmental Protection Agency’s National Rivers and Streams Assessment (NRSA). The effectiveness of diatoms as indicators of biological condition may suffer if inconsistent taxon identifications across different analysts obscure the relationships between assemblage composition and environmental variables. To minimize the explanatory power of analyst, we used random forest to detect taxa with high analyst signals and used QA/QC data to justify the combining of taxa into slash groups. We reiterated random forest to detect remaining problematic taxa and applied coarser adjustments (e.g., elevating to genus or omitting the taxon). Relative to the original dataset, the revised dataset had less variation in assemblage composition explained by analyst and more than double the variation in assemblage composition explained by total phosphorus, a high priority environmental variable for managing nutrient pollution. Examination of variation in assemblage data explained by analyst and taxonomic harmonization may be necessary steps for improving the quality of large datasets. Open access to R tools and documentation of taxonomic revisions are available to assist other researchers working with inconsistent datasets.