USGS - science for a changing world

Scientific Investigations Report 2007–5179

U.S. GEOLOGICAL SURVEY
Scientific Investigations Report 2007–5179

Back to Table of Contents

Methods

General Study Design

Data used in this study (fig. 2) were obtained from 19 NAWQA land-use studies in the Western United States (appendixes A-D). These studies were designed to examine the influence of agricultural and urban land-use activities on the quality of shallow ground water by using data collected with consistent sampling protocols. The national study design was intended to provide data useful to understanding water-quality conditions in areas of intensive agricultural or urban land use at both national and regional scales (Gilliom and others, 2006). In most areas, approximately 20 to 30 relatively shallow wells within the same aquifer were sampled (Gilliom and others, 1995; Koterba and others, 1995). Ground-water sampling networks were established in areas where shallow ground water would reflect recent land-use activity and the attenuation properties of the aquifer (Squillace and others, 2004).

Nutrient enrichment has been documented in agricultural areas. In the Eastern United States, periods of drought were shown to increase nitrogen concentrations in shallow ground water beneath agricultural areas near Chesapeake Bay (Stevenson and others, 1986). Nutrients were sampled during the NAWQA land-use investigations because of the increasing concentrations in shallow ground water in areas influenced by agriculture and other human-related activities as noted in previous studies (Hirsch and others, 1988).

The 86 pesticides targeted for investigation were selected on the basis of their agricultural and nonagricultural use, potential environmental significance, and the ability of the USGS National Water Quality Lab (NWQL) to quantify them (Larson and others, 1999; U.S. Geological Survey, 1999). The 55 VOCs targeted for analysis were selected on the basis of available information on their occurrence, human and ecological health concerns, ozone depletion potential, use as a fuel additive, and analytical capabilities of the USGS NWQL (Bender and others, 1999).

All samples were analyzed by the USGS NWQL in Denver, Colorado. In circumstances where concentrations of an analyte are sufficiently low to be considered below a detectable quantity, the NWQL reports these concentrations as less than the Laboratory Reporting Limit (LRL). When using the LRL, the risk of reporting a non-detectable analyte concentration when the analyte is actually present (false negative) is less than 1 percent (Childress and others, 1999). The LRL typically is twice the long-term method detection level (LT-MDL). The NWQL LT-MDL is derived by using the standard deviation of at least 24 measurements of spiked-matrices containing the analyte(s) of interest. The actual LT-MDL is the minimum analyte concentration(s) measured with 99-percent confidence that the concentration is, in fact, greater than zero. The NWQL reports estimated analyte concentrations when concentrations fall between the LT-MDL and LRL (Childress and others, 1999). Improvements in analytical methods over the years have resulted in changes in LT-MDLs. In circumstances where the LT-MDL for an analyte varies during the period of data collection, concentration data for that constituent is assessed on the basis of the greatest LT-MDL during that period. All constituent concentrations below this LT-MDL are considered to be below detection. This procedure is called recensoring. Recensoring is done so that old data, typically with higher LT-MDLs, can be compared with more-recent data collected using lower limits. All nutrient concentrations were recensored to the greatest LRL within the data set (table 3) for each respective nutrient species because the LT-MDL did not cover the entire sampling period (July 1993 to September 2004).

For the purposes of this report, estimated pesticide concentrations reported by the NWQL were considered detectable concentrations. Because of improvements in analytical sensitivity over the years, VOC data were recensored to a value of 0.2 µg/L (Zogorski and others, 2006). Although recensoring VOC data to 0.2 µg/L reduced the number of detections of some VOCs, this procedure did not substantially change relations among data when compared to recensoring data using maximum LT-MDLs.

Creation of Databases

Chemical data used in this report are from samples collected between 1993 and 2004 for the NAWQA land-use studies (appendixes A-D). Samples from the 273 agricultural and 181 urban wells were collected following NAWQA protocols (Koterba and others, 1995). Most data are stored in a central database (NAWQA Data Warehouse) and for this investigation were compiled into Excel spreadsheets. Data from an agricultural area east of Reno, Nevada, collected as part of a 2001 investigation of a childhood leukemia cluster (Seiler and others, 2005) were included in data from the NVBR study unit. Data for the additional agricultural wells for the NVBR study unit were obtained from the NWIS database. Land use near the wells was classified by using a three-tiered approach that incorporated watershed characteristics using the National Land Coverage Dataset (NLCD) that was enhanced for the purposes of NAWQA (NLCDE) (David Mueller, U.S. Geological Survey, written commun., 2006).

Ancillary data used in this analysis included well-specific data such as the depth of the screened interval, and geospatial data in GIS databases related to land use and soil/aquifer characteristics (Hitt 1994; Solley and others, 1998b; Nakagaki and Wolock, 2005). A 500-meter buffer around each well was used to characterize land use around each well. Geospatial data included agricultural pesticide use, irrigation practices, general crop types, fertilizer use, percent urbanization, population density, soil characteristics, general lithology, and average annual precipitation (tables 1, 4). Soil-hydrologic groups and soil and permeability characteristics were obtained from modified STATSGO data (U.S. Department of Agriculture, 1994). Agricultural factors, such as irrigation and crop type, were obtained from Enhanced National Land-Coverage Datasets (NLCDE) by using 30-meter land-cover grids, published in 2005.

Regionally, pesticide-use data is available on a county-scale basis and extrapolated into the 500-meter buffer zones around each of the agricultural wells (Thelin, 2005a, b). The extrapolation was based on the average annual-application rate of pesticides state-wide, the percent acreage of a crop treated with a pesticide, and the percentage of agricultural-land use within the buffer area (Naomi Nakagaki, U.S. Geological Survey, written commun., 2005). Unfortunately, this level of pesticide-use information was not sensitive enough for the purposes of the data analyses contained in this report. Selected California pesticide-use data are available at a township-and-range scale. Pesticide-use data available at township-and-range scale were summed and extrapolated over each township-and-range area. The effectiveness of using township-and-range-scale data with respect to county-scale pesticide-use data was evaluated by using pesticide data after extrapolation into 500‑meter buffer areas around each of the wells. An assumption was made that the township-and-range data, being at a finer scale, was more accurate than the available county-level data.

Ancillary data for each well was compiled into an Excel® spreadsheet (appendix D). An Access relational database was created from the Excel spreadsheets that linked chemical and ancillary data so that relations between contaminants and land use could be evaluated.

Statistical Analyses

Recensored data were used for the frequency of detection analyses. Univariate correlation analyses between concentration and (or) detection frequencies were made by using the Spearman Rank method. Where concentration and detection-frequency comparisons were made among land uses and (or) study units, the nonparametric Kruskal-Wallis statistical method was used (SYSTAT Software Inc., 2004).

Statistical summaries of chemical quality for the NAWQA study units can be biased if water from some wells is sampled more frequently than that from other wells. Where water from wells was sampled multiple times, bias from multiple measurements at the same site was removed by using the most recent analysis. Selecting the most recent analysis is a simple process that uses the best available analytical methods and is unbiased toward high or low values when there are seasonal or annual trends in the data. An alternative method for removing bias by calculating median and average values for a given well was rejected because robust methods are required to calculate summary statistics when censored data are present. Estimation methods could not be used, however, because many of the wells had only two or three analyses, which is insufficient for their use.

Multivariate-logistic regression was used (Steinberg and Colla, 2004) to evaluate a dependent variable (response variable) to one or more independent variables (predictor variables). Logistic regression builds a predictive model by evaluating the relation between observed and predicted values and is a valuable tool for evaluating data containing censored data (Rupert, 2003). Logistic regression is an iterative technique in which potential explanatory variables are added and removed from the model in order to achieve the model with the highest predictive power. The best model is one that has the greatest overall predictive power using the fewest variables (Hosmer and Lemeshow, 1989). Generally, “maximum likelihood” numerically indicates the probability of obtaining observed results in a given dataset given a set of parameters. The “log-likelihood” function is essentially “maximum likelihood” on a logarithmic scale (Hosmer and Lemeshow, 1989). Two models containing different explanatory variables can be compared by evaluating their log-likelihoods (Hosmer and Lemeshow, 1989). The McFadden’s rho-squared is similar to the correlation coefficient, R2, in linear regression. The higher the McFadden’s rho value, the stronger the correlation is between the independent and dependent variable(s) (Steinberg and Colla, 2004). A logistic model is considered sufficiently strong when the log-likelihood ratio is maximized (Hosmer and Lemeshow, 1989; Rupert, 2003) and the overall McFadden’s rho-squared lies between 0.2 and 0.4 (Steinberg and Colla, 2004). In this study, logistic regression was used to examine potentially significant explanatory variables for the detection of pesticides and VOCs. A predictive model for the exceedance of the USEPA 10-mg/L MCL for nitrate also was created.

Data Limitations

Given the regional scope of this investigation, possible complications in data analysis could make it difficult to make definitive conclusions. These complications include, but are not limited to, the quality and availability of necessary GIS datasets, previous undetected human activities that may have influenced shallow ground-water quality, and unknown upgradient contaminant sources (Squillace and others, 2004).

Wells were added and dropped from established NAWQA ground-water networks from 1993 to 2004, resulting in changing sampling sites and irregular sampling intervals. The number of wells sampled during the different NAWQA land-use studies differed, and simple statistical summaries of the data cannot provide reliable descriptions of contaminant concentrations in the regional-study area. For example, water from only 9 agricultural wells was sampled in the CAZB study unit, whereas 110 ground-water samples were collected in the SANJ study unit. The national land-use study design targeted agricultural and urban areas to evaluate contaminant occurrence and distribution resulting from these types of land-use activities (Gilliom and others, 2006). Only three study units, NVBR, RIOG, and SACR, had both agricultural and urban areas represented whereas the other study units were representative of either urban or agricultural land use, but not both.

Generally, ground-water redox potential (pE) and common redox-couple concentrations (arsenite and arsenate; sulfide and sulfate; and ferrous- and ferric-iron species) were not measured during NAWQA land-use studies. Given that redox condition is an important environmental factor influencing the occurrence of many contaminants, the lack of this information in conjunction with the other aforementioned limitations makes it difficult to identify cause-and-effect relations between contaminant occurrence and available ancillary data. Although the redox data were largely unavailable for data analyses, a generalized ground-water redox condition was estimated by using dissolved oxygen, nitrite and nitrate, and iron concentrations in ground water.

Regionally, pesticide-use data were available on a county-scale basis. The extrapolation of these data into 500-meter buffer areas surrounding each well inherently included a margin of error. When localized pesticide-use data are extrapolated throughout a larger area, such as an entire county, areas where a pesticide was not used may be identified as an area of pesticide use. Alternatively, areas where there is more-intensive pesticide use may be identified as areas with less intensive use. The ability to show cause-and-effect relations is greatly reduced. Differences in pesticide use patterns, crop types, and availability of refined GIS datasets made it difficult to identify relations among various land-use characteristics and pesticide occurrence.

Back to Table of Contents

AccessibilityFOIAPrivacyPolicies and Notices

Take Pride in America logoUSA.gov logoU.S. Department of the Interior | U.S. Geological Survey
URL: https://pubs.usgs.gov/sir/2007/5179
Page Contact Information: Publications Team
Page Last Modified: Thursday, 01-Dec-2016 19:50:09 EST