Integrating state and NARS data to improve spatial interpolations of water quality
On this page:
The USEPA's National Aquatic Resources Survey collects thousands of samples from streams and lakes on a five year cycle. Previous work paired these data with geospatial watershed information of natural and anthropogenic landscape factors to model and interpolate the probable condition of 1.1 million streams and 290K lakes across the conterminous US (CONUS). We created these models with random forest modeling. Random forest modeling has several advantages, including robustness to small multi-collinearity of predictor variables and ability to handle non-linear relationships and interactions. We will give a primer on random forest models as part of this presentation. Additionally, we illustrated how these model interpolations can be combined with other landscape factors and queried to identify potential candidate streams for conservation or restoration. Despite being based on >1,000 sites across the conterminous US, any state has only ~25 sample sites for any given model, on average. Because of this density, it is unknown how representative the interpolated values are for all streams within a state and raises several questions: (1) Can state and national data be combined to provide a fuller data set to improve models? (2) How good are these predictions when validated with state data? (3) What is the density of sites needed to provide confidence in the interpolations provided by this approach? In this presentation, we propose research that will address these questions with a state collaborator. The purpose of this presentation is to identify potential state collaborators with data that can be used in combination with NARS data to address these research questions.