A harmonized dataset of sediment diatoms from hundreds of lakes in the northeastern United States
Sediment diatoms are widely used to track environmental histories of lakes and their watersheds, but merging datasets generated by different researchers for further large-scale studies is challenging because of the taxonomic discrepancies caused by rapidly evolving diatom nomenclature and taxonomic concepts. Here we collated five datasets of lake sediment diatoms from the Northeastern USA using a harmonization process which included updating synonyms, tracking the identity of inconsistently identified taxa and grouping those that could not be resolved taxonomically. Each harmonization step led to an increase in the amount of variation in diatom count data explained by environmental variables and a parallel reduction of variation attributable to taxonomic inconsistency. To maximize future use of the data and underlying specimens for lake and diatom studies we provide the original and harmonized counts for 1327 core samples from 607 lakes, name translation schemes, sample metadata, specimen museum locations and the Northeast Lakes Voucher Flora, which is a set of light microscope diatom images grouped into 1154 morphological operational taxonomic units. The harmonized dataset includes over 900 samples from EMAP, NLA 2007, and NLA 2017.