Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas
Links
- More information: Publisher Index Page (via DOI)
- Download citation as: RIS | Dublin Core
Abstract
In the United States, private wells are not federally regulated, and many households do not test for Arsenic (As). Chronic exposure is linked with multiple health outcomes, and risk can change sharply over short distances and with well depth. Coarse maps or sparse sampling often miss exceedances. Most existing models operate at ∼1 km resolution and use groundwater chemistry or detailed geologic logs, which limits their use in undersampled areas where improved guidance is most needed. We overcome these limitations by developing a machine learning model for Minnesota, USA, that predicts As exposure risk using only surficial variables from remote sensing and global data sets. Variables related to surface water hydrology and geomorphology are selected based on mechanistic links that control redox conditions and As mobilization. Local training was essential, and surficial geology variables that are more sensitive to local conditions were needed to maximize model accuracy. The resulting complete model was sufficiently sensitive to generate accurate and detailed risk maps and depth profiles of As concentrations above the 10 μg/L maximum contaminant level. Accuracy depended on local training data density. We identified a training data density of 0.07 wells/km2 as a practical target for stable county-level performance. Maps of exceedance probabilities highlight priority areas for testing that are particularly important in rural communities that have received less sampling. These results support public health action by guiding where to install wells and where to test them, how much new sampling is needed, and where treatment outreach is most urgent.
Suggested Citation
Azad, S., Stahl, M.O., Erickson, M., DeYoung, B.A., Connolly, C.T., Chillrud, L., Schilling, K., Navas-Acien, A., Basu, A., Mailloux, B., Bostick, B.C., Chillrud, S.N., 2026, Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas: GeoHealth, v. 10, no. 1, e2025GH001666, 18 p., https://doi.org/10.1029/2025GH001666.
Study Area
| Publication type | Article |
|---|---|
| Publication Subtype | Journal Article |
| Title | Surface variable‐based machine learning for scalable arsenic prediction in undersampled areas |
| Series title | GeoHealth |
| DOI | 10.1029/2025GH001666 |
| Volume | 10 |
| Issue | 1 |
| Publication Date | January 23, 2026 |
| Year Published | 2026 |
| Language | English |
| Publisher | American Geophysical Union |
| Contributing office(s) | Upper Midwest Water Science Center |
| Description | e2025GH001666, 18 p. |
| Country | United States |
| State | Minnesota |