U.S. Department of the Interior
National Irrigation Water Quality Program
U.S. Geological Survey

Using Geochemical and Statistical Tools to Identify Irrigated Areas That Might Contain High Selenium Concentrations in Surface Water

Irrigated agriculture has a long history in the Western United States, beginning with Native American Indians. After passage of the Reclamation Act of 1902, the United States Government began building and subsidizing irrigation projects to foster settlement and development of the arid and semi-arid areas of the Western United States (National Research Council, 1989). Precipitation in the mountainous areas of the West (fig. 1) is stored in reservoirs and used for irrigation of farmland. With the development of irrigated agriculture, unforeseen environmental problems have occurred.


(Click on image for a larger version, 211K)

Figure 1. U.S. Geological Survey hydrologist and able assistant surveying Upper Fremont Glacier in the Wind River Range of Wyoming. This glacier is a source of irrigation water to the Riverton Reclamation Project in central Wyoming.

As an example, concern is increasing about the quality of surface and subsurface water draining from irrigated land and the potential effects of irrigation drainage on human health, fish, and wildlife. Incidences of mortality, congenital defects, and reproductive failures in waterfowl were discovered by the U.S. Fish and Wildlife Service in 1983 at Kesterson National Wildlife Refuge in western San Joaquin Valley, California. Because of concern about possible adverse effects from irrigation drainage elsewhere in the United States, the U.S. Department of the Interior (DOI) implemented the National Irrigation Water Quality Program (NIWQP) in 1985 and to date (1996) has evaluated 26 DOI irrigation projects in the Western United States (fig. 2).

Figure 2

(Click on image for a larger version, 127K)

Figure 2. Location of National Irrigation Water Quality Program study areas, 1986-96.

Initial evaluation of the data from the NIWQP study areas indicates that selenium is the constituent most commonly found at elevated concentrations in water, bottom sediment, and biota (Feltz and others, 1991; Presser and others, 1994; and Presser, 1994). Because of these consistently elevated concentrations of selenium, factors controlling the occurrence of selenium in water need to be better defined. Water-quality data collected from the NIWQP study areas throughout the Western United States provide an opportunity to determine important and regionally significant geochemical sources and processes that control the concentration of selenium in irrigation drainwater. This information can then be used to predict where other "selenium problem areas" might occur in other parts of the Western United States and can be extrapolated to irrigated areas around the world.

Geochemical and Statistical Tools Used to Identify Surface Water with Elevated Concentrations of Selenium

The NIWQP summary data base contains results of thousands of water-quality analyses and each analysis includes many different physical and chemical properties, such as water temperature and selenium concentration. Geochemical and statistical techniques can be used to extract information from this large multivariate data set. For example, concentration of chemicals such as major ions might be used to predict whether or not a water sample from an irrigated area contains an elevated concentration of selenium.

The geochemical computer program SNORM (Bodine and Jones, 1986) was the "geochemical tool" used to calculate concentrations of simple salts from the results of the chemical analyses of surface-water samples from NIWQP study sites. The 12 different simple-salt concentrations calculated with the SNORM program for each water sample are representative of the minerals ("salts") that would remain after a water sample was completely evaporated. To evaporate the thousands of water samples collected at all the NIWQP study areas and determine the amount and type of salt remaining would be quite labor intensive; instead, the SNORM computer program simulates this evaporation almost instantly. The abundance and type of simple salts calculated with the SNORM program provides useful geochemical information on what kind of minerals (selenium or non-selenium bearing) the water might have contacted and partially dissolved. This is analogous to determining where a person lives on the basis of his or her accent.

After the concentrations of simple salts were calculated for the thousands of NIWQP water samples, the next step was to determine if groupings of simple salts could characterize water samples with high or low selenium concentrations. A "statistical tool" called pattern-recognition modeling (Meglen and Sistko, 1985) was used to evaluate the large simple-salt data base generated with the SNORM computer program. Pattern-recognition modeling techniques enhance the interpretation of large multivariate data bases such as the one used in the NIWQP data synthesis study. Pattern-recognition modeling uses statistical and graphical techniques to "chemically fingerprint" the groups of simple salts representing different types of minerals (selenium or non-selenium bearing) that might have contacted and partially dissolved in water sampled during the NIWQP. The selenium concentration for all the NIWQP samples was already known; thus, calculating the association between the simple-salt group and the selenium concentrations in water samples was explicit. This simple-salt group/selenium association could then be applied to other areas in the Western United States for which selenium concentrations in water were not measured to predict if selenium toxicity problems might exist.

Pattern-Recognition Modeling of Simple-Salt Data Identifies Samples with Elevated Selenium Concentrations

A three-dimensional plot showing the results of the pattern-recognition modeling of the simple-salt data generated from the NIWQP data base is shown in figure 3A. The three axes are linear combinations of the original simple-salt data set and can be thought of as a new set of plotting axes. Instead of each axis representing the concentration of a particular simple salt, each axis represents combinations of the simple salts, thereby representing a property of interest (affected by the type and amount of many simple salts). In this particular application, the property of interest is the types and amounts of simple salts that would most likely identify water samples with high or low concentrations of selenium.

Figure 3A AND 3B

(Click on image for a larger version, 71K)

Figure 3. Results of pattern-recognition modeling of the National Irrigation Water Quality Program data base.

The individual points in figure 3A show where the simple salts for each of the NIWQP water samples plot on these newly defined axes. The points are grouped into three distinct clusters when viewed in three dimensions and are classified as groups 1, 2, and 3. Water samples in group 1 have the greatest percentage of selenium concentrations higher than 5 micrograms per liter (\f5 m\f4 g/L) (fig. 3B). Group 1 samples are distinguished from group 2 and 3 samples by elevated concentrations of the sulfate simple salts and sodium chloride simple salt combined with very low concentrations of the calcium carbonate simple salt. On the basis of the simple salt "chemical fingerprint" for group 1 samples, selenium is mobilized during irrigation by the oxidation of sulfide minerals or re-solution of sulfate salts that have selenium substituted for sulfate in the mineral structure.


(Click on image for a larger version, 103K)

Figure 4. Proportion of samples from each National Irrigation Water Quality Program study area falling within the group 1 scores cluster and groups 2 and 3 scores cluster shown in figure 3A.

Simple-salt groupings that identify water samples with elevated selenium concentrations also are evident at individual NIWQP study areas. The majority of study areas with at least 25 percent of water samples belonging to the group 1 simple-salt association consistently contained the greatest percentage of samples with selenium concentrations higher than 2.0 \f5 m\f4 g/L (fig. 4). With the exception of the Riverton Reclamation Project, Tulare Lake Bed Area, and Salton Sea Area, all study areas have irrigated soils developed on upper Cretaceous marine sediments. These sediments contain the necessary sulfide minerals required for sulfide weathering reactions, and the resulting simple-salt "chemical fingerprint" for group 1 sites. Most study areas with at least 75 percent of water samples classified in groups 2 or 3 consistently contain the greatest percentage of samples with selenium concentrations less than 2.0 \f5 m\f4 g/L (fig. 4) and do not have irrigated soils developed from upper Cretaceous marine sediments. On the basis of these pattern-recognition modeling results, the simple-salt "chemical fingerprint" of water samples can help differentiate samples with elevated selenium concentrations from samples with low selenium concentrations.


Classification Model Successfully Identifies Surface Water with Elevated Concentrations of Selenium in Utah and Wyoming

Pattern-recognition modeling of the simple-salt data has indicated that selenium- and non-selenium-producing areas can be identified using only simple-salt data calculated with the SNORM geochemical program. On the basis of these results, a classification program called Soft Independent Modeling by Class Analogy (Wold and Sjostrom, 1977), or SIMCA for short, was applied to the NIWQP simple-salt data set to construct a classification model. Once constructed, the classification model can be used to determine the selenium-producing potential of areas with unknown or poorly known selenium concentrations in water samples.

The first step in building a SIMCA classification model is to "train" the model to recognize the sample groups to be classified by the simple-salt concentrations. The data are referred to as a training data set and consist of the NIWQP data set used in the pattern-recognition modeling described previously, with each of the samples classified according to groups 1, 2, or 3 (fig. 3A). After the classification model is constructed and optimized with the training data set, the model is ready to classify unknown samples in what is termed a "test" data set.

The test data set used to evaluate the performance of the SIMCA model consisted of data from more than 2,000 samples of surface water from Wyoming and Utah compiled from the National Water Data Storage and Retrieval (WATSTORE) System of the U.S. Geological Survey. Data from Wyoming and Utah were selected because of the diverse geologic environments represented in these areas and the documented presence of selenium in surface-water samples. The model classified more than 75 percent of the samples with elevated selenium concentrations (higher than 2.0 \f5 m\f4 g/L) correctly into group 1. More than 80 percent of the samples with low selenium concentrations were correctly classified into groups 2 or 3.

Classification Model Can Be Used to Assess Other Irrigated Areas

The demonstrated use of the classification model in differentiating selenium-producing areas from non-selenium-producing areas indicates that the model can be applied successfully to areas where selenium concentrations have not been determined. Numerous circumstances contribute to the lack of selenium data for water samples. Examples of the applicability of the model include (1) areas within the United States or other countries that might be expanding agriculture and where there is a limited selenium data base; (2) situations in which the analytical reliability of selenium data from an area might be in question but the major-ion data used to calculate the simple-salt data are of good quality; and (3) situations in which the concentrations of selenium and other chemical constituents are determined simultaneously and the detection limits for selenium are so high that only the very high concentrations of selenium are measured.

(Click on image for a larger version, 114K)

Shocking fish at Stewart Lake, Wildlife Management Area near Vernal, Utah. Stewart Lake is the Middle Green River Basin, Utah, study area.

(Click on image for a larger version, 170K)

Gill-netting operations during reconnaissance-phase investigations at a wetland site in the Middle Green River Basin, Utah, study area.

Sources of Additional Information

The following publications contain additional information on the occurrence of selenium in irrigation drainwater and the geochemical and statistical tools used to identify and predict the occurrence of selenium-producing landscapes. Details on pattern-recognition and classification modeling of the NIWQP data sets can be found in Naftz (in press).

Bodine, M.W., and Jones, B.F., 1986, The salt norm: A quantitative chemical- mineralogical characterization of natural waters: U.S. Geological Survey Water- Resources Investigation Report 86-4086, 130 p.

Feltz, H.R., Sylvester, M.A., and Engberg, R.A., 1991, Reconnaissance investigations of the effects of irrigation drainage on water quality, bottom sediment, and biota in the Western United States, in Mallard, G.E., and Aronson, D.A., eds., Proceedings of the U.S. Geological Survey Toxic Substance Hydrology Program, Monterey, Calif., March 11-15, 1991: U.S. Geological Survey Water-Resources Investigations Report 91-4034, p. 319-323.

Meglen, R.R., and Sistko, R.J., 1985, Evaluating data quality in large data bases using pattern-recognition techniques, in Breen, J.J., and Robinson, P.E., eds., Environmental Application of Chemometrics, American Chemical Society Symposium Series 292, Wash., D.C., American Chemical Society, p. 18-33.

Naftz, D.L., in press, Pattern-recognition analysis and classification modeling of selenium-producing areas: Journal of Chemometrics.

National Research Council, 1989, Irrigation- induced water quality problems, Wash., D.C., National Academy Press, 157 p.

Presser, T.S., 1994, The Kesterson effect: Environmental Management, v. 18, no. 3, p. 437-455.

Presser, T.S., Sylvester, M.A., and Low, W.H., 1994, Bioaccumulation of selenium from natural geologic sources in Western States and its potential consequences: Environmental Management, v. 18, no. 3, p. 423-436.

Wold, Svante, and Sjostrom, Michael, 1977, SIMCA: A method for analyzing chemical data in terms of similarity and analogy, in Kowalski, B.R., ed., Chemometrics Theory and Application, American Chemical Society Symposium Series 52, Wash., D.C., American Chemical Society, p. 243-282.

Information on technical reports and hydrologic data related to the National Irrigation Water Quality Program can be obtained from:

Richard A. Engberg, Manager
National Irrigation Water Quality Program
Department of the Interior (6640-MIB)
1849 C Street, NW
Washington, DC 20240


--David L. Naftz

from U.S. Department of the Interior, U.S. Geological Survey, Fact Sheet FS-077-96

For more information contact any of the following:

This fact sheet is available in two formats:

html, which you're presently viewing, and
FS-077-96.pdf, 231K

Help in using USGS pages.
USGS home page
For comments and questions, contact <>
Last modified: 06/9/97 ghc