Preliminary Machine Learning Models of Manganese and 1,4-Dioxane in Groundwater on Long Island, New York
Links
- Document: Report (5.93 MB pdf) , HTML , XML
- Data Release: USGS data release - Data and model archive for preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York
- Download citation as: RIS | Dublin Core
Abstract
Manganese and 1,4-dioxane in groundwater underlying Long Island, New York, were modeled with machine learning methods to demonstrate the use of these methods for mapping contaminants in groundwater in the Long Island aquifer system. XGBoost, a gradient boosted, ensemble tree method, was applied to data from 910 wells for manganese and 553 wells for 1,4-dioxane. Explanatory variables included soil properties, groundwater flow, land use, and other features that describe the hydrogeology and geochemistry of the aquifer system. Four models were developed to predict the probability of manganese concentrations greater than a detection level of 10 micrograms per liter (μg/L) and greater than three threshold concentrations (50, 150, and 300 μg/L) relevant to drinking-water quality. One model was developed to predict the probability of 1,4-dioxane concentrations greater than a detection level of 0.07 μg/L. The 1,4-dioxane model was limited geographically to Suffolk County because of data availability. Predictions were made for two layers in the upper glacial aquifer and three layers in the Magothy aquifer, which are the upper two of the three major aquifers of the Long Island aquifer system.
The objective of the study described in this report was to demonstrate the application of the methods rather than to develop precise estimates of manganese or 1,4-dioxane concentrations at any given location. The predictive models developed in the study are considered preliminary in the sense that they are an initial effort at developing these kinds of models specifically for Long Island. The models could be improved by the inclusion of additional data, by the use of methods to improve the modeling of infrequent high concentrations of manganese and 1,4-dioxane (above threshold concentrations), and by including more explanatory variables that specifically describe conditions and contaminant sources on Long Island. Nonetheless, the distribution of model predictions and the influence of explanatory variables in the models were consistent with the expected relations between contaminant concentrations and groundwater-flow-system characteristics and the distribution of manmade sources.
Mapped predictions indicated that manganese detections were more probable in the upper glacial aquifer and along the southern shore of Long Island, consistent with the distribution of anoxic conditions in groundwater in the Long Island aquifer system. Manganese was infrequently predicted at concentrations greater than thresholds of concern for drinking-water quality in any of the aquifer layers. Detections of 1,4-dioxane were predicted in the western, more highly developed parts of Suffolk County, in the upper glacial aquifer and the top and middle layers of the Magothy aquifer, and in northwestern Suffolk County in the bottom layer of the Magothy aquifer. Although preliminary in nature and based on limited data, these mapped predictions can be used to generally identify areas where manganese and 1,4-dioxane may be present at concentrations of concern to prioritize areas for future monitoring and to guide future modeling and mapping efforts.
Suggested Citation
DeSimone, L.A., 2023, Preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York: U.S. Geological Survey Scientific Investigations Report 2022–5120, 34 p., https://doi.org/10.3133/sir20225120.
ISSN: 2328-0328 (online)
Study Area
Table of Contents
- Abstract
- Introduction
- Data Compilation
- Machine Learning Modeling Methods
- Manganese and 1,4-Dioxane Concentrations in Groundwater From Wells
- Predictive Models of Manganese and 1,4-Dioxane
- Summary
- References Cited
- Appendix 1. Explanatory Variables and Ranking in the Machine Learning Models
Publication type | Report |
---|---|
Publication Subtype | USGS Numbered Series |
Title | Preliminary machine learning models of manganese and 1,4-dioxane in groundwater on Long Island, New York |
Series title | Scientific Investigations Report |
Series number | 2022-5120 |
DOI | 10.3133/sir20225120 |
Year Published | 2023 |
Language | English |
Publisher | U.S. Geological Survey |
Publisher location | Reston, VA |
Contributing office(s) | New England Water Science Center, Advanced Research Computing (ARC) |
Description | Report: vii, 34 p.; Data Release |
Country | United States |
State | New York |
Other Geospatial | Long Island |
Online Only (Y/N) | Y |
Additional Online Files (Y/N) | N |
Google Analytic Metrics | Metrics page |