Spatial Analysis and Statistical Modeling with R and spmodel
On this page:
Statistical models often assume that the data are independent. Incorrectly assuming data independence can harm models, resulting in incorrect slope estimates, misleading p-values, and poor predictions. The independence assumption is often inappropriate for spatial data, as spatial observations close together tend to be more similar than spatial observations far apart (Tobler’s Law). Statistical models for spatial data that incorporate spatial dependence tend to notably outperform similar models that rely on independence. Unfortunately, building spatial dependence directly into statistical models is challenging, both from theoretical and computational perspectives, limiting the use of these models in ecological settings. However, recent advances in R software, which we will discuss throughout the workshop, make acquiring spatial data and building spatial models much more accessible.In this workshop, we will first focus on R tools for accessing and handling the spatial data required to build models, highlighting R data libraries like EPA’s StreamCatTools, FedData, prism, and other data web services. Then we will focus on using these data to build spatial statistical models using the R package spmodel (https://usepa.github.io/spmodel/). With spmodel, ecologists can seamlessly incorporate spatial dependence into their statistical models. spmodel implements user-friendly syntax that builds from the lm() and glm() functions familiar to base-R users, which significantly eases the transition from fitting independence models to fitting spatial models. We will practice using spmodel to fit these spatial statistical models, interpret the model fit and inspect model diagnostics, perform model selection, and make predictions at unobserved locations. We also discuss some advanced spmodel tools and extensions to modeling binary, count, and skewed data, implementing random forests, and incorporating dependence via non-Euclidean distance measures like neighborhood distance or stream distance.