County-to-county migration modeling in the United States: the effects of data source and model selection
Internal migration plays a critical role in shaping demographic and economic landscapes, yet the ability to model migration flows accurately remains a methodological challenge. This study evaluates the performance of different migration models applied to three key U.S. data sources: the Internal Revenue Service (IRS) migration data, the American Community Survey (ACS), and the Census long-form data. While these datasets provide valuable insights into county-to-county migration, they differ in temporal coverage, flow suppression thresholds, and demographic granularity, each introducing unique challenges to migration modeling. Using a comparative framework, this study assesses the impact of data source selection on the accuracy and bias of widely used migration models, including the gravity model, Poisson regression, and the radiation model. Our findings highlight the trade-offs inherent in each dataset, demonstrating that IRS data yield lower prediction errors in aggregate flow estimates but lack demographic specificity, whereas ACS and Census data offer richer demographic detail and capture a larger number of distinct migration streams, though they may introduce noise due to small-flow estimates and suppression thresholds for confidentiality. The results underscore the importance of aligning data selection with research objectives and contribute to broader discussions on best practices for migration modeling.