Beware of spatial autocorrelation when applying machine learning algorithms to borehole geophysical logs

Groundwater
By: , and 

Links

Abstract

Although many of the algorithms now considered to be machine learning algorithms (MLAs) have existed for nearly a century (e.g., Rosenblatt 1958), interest in MLAs has recently increased exponentially for solving data-driven problems across a variety of fields due to the expanded availability of large, complex datasets that may be difficult to interrogate using other methods, increases in computing power, and a growing library of easily implemented machine learning tools. While MLAs are often similar to statistical methods, there are key differences in the approach to problem solving. Namely, statistical methods are more concerned with generating informative models from “long” data (i.e., many more observations than explanatory variables), whereas MLAs are typically concerned with generating accurate predictions from “wide” data (i.e., a large number of variables with relatively fewer observations, Bzdok et al. 2018). In hydrogeologic studies, such wide datasets may be available from boreholes, where various types of geophysical, geochemical, and lithological information may exist. Borehole datasets are therefore a tempting target for MLAs to reveal hidden relations among gathered data and parameters of interest (e.g., contaminant concentration), and as a method of parameter reduction (e.g., reduce costs by collecting fewer datasets).

Publication type Article
Publication Subtype Journal Article
Title Beware of spatial autocorrelation when applying machine learning algorithms to borehole geophysical logs
Series title Groundwater
DOI 10.1111/gwat.13081
Volume 59
Issue 3
Year Published 2021
Language English
Publisher Wiley
Contributing office(s) New York Water Science Center, WMA - Earth System Processes Division
Description 5 p.
First page 315
Last page 319
Google Analytic Metrics Metrics page
Additional publication details