BSP reference samples represent a limited number of the characteristics of the stream water-quality samples collected at national network stations. As such, the error estimates obtained from the reference samples apply only to certain chemical constituents, stream environments, and stages of the measurement process.
Although BSP samples are generally representative of the range of concentrations found in natural waters for selected chemicals, the samples represent a narrower range of physical and chemical properties than those found in national network stream samples. BSP samples are mixes or dilutions of filtered, natural water matrices. The source of these natural waters is predominantly surface water in Colorado. Consequently, the chemical and physical matrix of BSP reference samples is not representative of the range of matrices found in the stream water samples analyzed at national network stations. Moreover, BSP measurements most appropriately describe the measurement accuracy associated with dissolved chemical forms because the reference samples are filtered. Organic constituents, suspended sediment, and whole water (i.e., unfiltered) samples are not analyzed in the program because stable and homogeneous matrices cannot be maintained for these constituents. Because the matrix of a sample can affect analytical results, differences between the matrices of national network stream samples and those of BSP samples could potentially lead to inaccuracies in using BSP data to estimate the measurement error associated with national network stream water-quality data. Comparisons of national network and BSP data are most appropriate using national network measurements for filtered (i.e., dissolved) stream samples.
BSP reference samples provide error estimates only for the latter stages of the measurement process beginning with the transport of samples to the laboratory. These error estimates include all sources of systematic and random errors that may originate in these stages with the exception of matrix-related errors that are not detectable in the less complex BSP sample matrix. BSP samples are not exposed to field sampling and processing methods, and thus, the effects of field methods on measurement error are not contained in BSP data. Moreover, the BSP program does not provide estimates of accuracy of field determinations including pH, dissolved oxygen, specific conductance, alkalinity, temperature, and fecal bacteria. It is important to recognize that field methods could potentially represent a significant portion of the total measurement error in environmental data, but quality-control data are not available for the national networks to quantify this source.
In selecting BSP data to estimate the measurement bias and variability of national network stream water-quality data, particular attention must be given to the laboratory and analytical methods used to analyze the reference samples, the range of chemical concentrations of the reference samples, and the time period of analysis.
The laboratory and the analytical methods used to measure BSP and environmental samples must be identical before BSP data can be used to evaluate the accuracy of the stream water-quality measurements. We have provided BSP data analyzed at the NWQL (laboratory analysis agency code 80020) during water years 1985- 95. The NWQL performed all national network stream water-quality analyses after April 1986. BSP data from October 1, 1984 to April 1986 apply only to national network stream water-quality data collected at stations west of the Mississippi River; at eastern stations, BSP samples were analyzed at the Atlanta laboratory (laboratory analysis agency code 80010). From October 1, 1984 to September 30, 1995, when laboratory analysis agency codes and method codes are reported, other laboratories and analytical methods may have been used for some of the national network stream water-quality data published on the CD-ROM. These cases reflect local data collection activities that have been conducted at national network stations. To ensure proper use of the BSP data, the user should verify that the laboratory agency analysis codes and the method codes associated with the stream water-quality data agree with those of the BSP data listed in Table 9.
The proper use of BSP data requires the selection of data for a range of concentrations similar to those observed in the environmental data. This is important because the magnitude of measurement error frequently depends on the concentration of the water-quality sample. Concentration-dependent errors may be caused by, for example, interferences from the physical/chemical matrix of the sample or the adsorption of analyte to the walls of sample containers and analytical equipment. Although concentration-independent sources of error related to contaminants in laboratory solutions, instruments, and airborne particles may be detected by BSP reference samples, concentration-dependent errors are common in measurements of BSP samples (Alexander and others, 1993). Moreover, the variance of a measurement process expressed in absolute units frequently depends on the concentration of the sample (Thompson, 1988; Garden et al., 1980). Thus, users of BSP data should verify that the range of concentrations covered by the BSP and environmental data are similar to ensure that the estimates of laboratory measurement error accurately describe method performance for the environmental samples.
Finally, users should select BSP and stream water-quality data for the same years and months to ensure that the estimates of measurement error reflect method performance during the period of analysis of the environmental data. More precise pairing of BSP and environmental measurements requires assumptions about the travel time of environmental samples from the field site to the laboratory. The recorded date of the environmental samples is the date of sample collection from the stream, whereas the recorded date of the BSP samples is the date of entry (login date) at the laboratory. The travel time of environmental samples to the laboratory likely ranges from one to five days.
BSP data can be used in two ways to evaluate the accuracy of national network environmental data. First, reference sample data can be used to evaluate the laboratory measurement process. This function typically involves the use of statistical standards (e.g., multiples of the standard deviation of the method) to determine whether measurement errors in laboratory methods are sufficiently "in control" to yield precise, unbiased environmental measurements. Evidence of "out of control" behavior may lead one to suspect the validity of an analytical method during certain time periods, which may suggest the need for editing of the environmental data or at least caution in the interpretation of the environmental data.
A second function of the BSP data is to assist in the interpretation of the environmental data by providing estimates of the magnitude of variations in the environmental data caused by laboratory measurement error. Although statistical standards provide important information about the stability of the measurement process, these standards are not necessarily relevant to the magnitude of variations observed in the environmental data or the intended uses of the environmental data. A direct comparison of environmental data and BSP measurement error data can guide interpretations of the environmental observations, providing additional protection against Type I and Type II statistical errors. [In an example where the null hypothesis (or presumed real condition) is that there is no method-related contamination in a water-quality sample, the Type I error can be stated as the probability of falsely rejecting the null hypothesis and incorrectly concluding that the sample contains the contaminant. For this same example, the Type II error can be stated as the probability of falsely accepting the null hypothesis and incorrectly concluding that the sample does not contain the contaminant.]
The following two subsections discuss these two approaches to using BSP data to evaluate the accuracy of national network environmental data.
Evaluation of laboratory methods and network water-quality data
The evaluation of laboratory methods involves determining whether analytical methods provide consistent or "in control" measurements of water quality. The consistency of method performance is frequently evaluated by testing whether quality- control measurements fall within generally accepted statistical limits of control. Evaluation can be conducted using graphical methods or a variety of statistical tests.
One conventional approach is the use of a control chart. Control charts provide a graphical method for monitoring the mean and variability of a measurement process over time. Charts rely on the statistical distribution of historical measurements of quality-control samples to define an acceptable range of variability in measurement error--that is, a range that reflects variations in methods due to background noise or random variations in the analytical results. Once established, this acceptable variance can be applied graphically to determine when measurements are "in control" or "out of control." These statistical procedures effectively amount to a graphical application of a hypothesis test such as the t-test. A variety of control charts are available for identifying errors in the various components of a measurement process (see Montgomery, 1991). Many of these charts are used in combination with visual and quantitative techniques, designed to identify temporal patterns in errors that may signal the existence of bias in the measurement process or to detect changes in bias.
The statistic, NSD (number of standard deviations), reported for the BSP data gives the number of standard deviations that the measured concentration differs from the MPV, and can be used directly to assess whether the method exhibits statistical control (see Fig. 8 for an example of a control chart application of this statistic). A statistical limit of two standard deviations has been used by the BSP to specify the limit of control (see for example, Maloney, et al., 1994). The NWQL has used a limit of 1.5 standard deviations as part of their internal quality-assurance program (Pritt and Raese, 1992). Values beyond these limits may indicate time periods when laboratory methods were performing abnormally because of random or systematic errors. In the example for sulfate (Fig. 8), there is evidence of greater variance (i.e., less precision) in the analytical measurements of BSP samples from early 1990 to late 1991 related to a change in the method of laboratory analysis, although most NSD values lie within the +/- 2 control limit. In examining a control chart, any persistent systematic behavior in the reference sample measurements may indicate a time period when bias was present in the measurement process. In the sulfate example (Fig. 8), a positive bias exist from early 1990 to the end of 1991 as indicated by the greater frequency and magnitude of NSD values above zero as compared to those below zero.
A variety of formal hypothesis tests are also available to the user to determine whether a laboratory analytical method contains statistically significant bias over a specified time period. Three of these tests are classified as matched-pairs tests, and are suitable for evaluating whether the laboratory measurement bias is significantly different from zero. The first is the paired t-test, a parametric test (Ott, 1993). Two nonparametric tests for paired data include the sign test and the signed-rank test (Helsel and Hirsch, 1992). Although not classified as a matched-pairs test, another nonparametric test, the binomial probability distribution method (Friedman, et al., 1983), can be used to test for the presence of statistically significant bias. This statistical method compares the number of analytical determinations exceeding the control limits to the number expected for a specified alpha level according to the binomial distribution. Larger than expected exceedences may be indicative of statistically significant bias in the analytical method. In using these parametric and nonparametric tests with BSP reference samples, a two sided test is recommended because prior knowledge as to whether the errors are exclusively positive or negative is unlikely to exist. It should be noted that the power of these tests to detect the presence of bias will increase with the number of error measurements.
In general, the use of parametric procedures to test for the presence of measurement bias can be problematic unless proper transformations of the data are made. Both concentration-dependent and concentration-independent measurement errors can produce distributions of measurement error that are non-normal and heteroscedastic (i.e., display nonconstant variance). These distributional characteristics violate the required assumptions of the t-test. In the case of concentration-dependent errors, considerable asymmetry can appear in the distribution of measurement errors due to changes in the magnitude of errors with different concentrations of the reference samples. In the case of concentration-independent errors (e.g., contamination), an error distribution with a lower bound of zero is not appropriately modeled as normal. Failure to satisfy the assumption of normality will lower the power of the t-test, thereby reducing the probability of detecting significant deviations of measurement bias from zero in cases where the average measurement process is actually biased. Two alternatives exist in these situations. First, logarithmic or other transformations of the estimates of measurement error can frequently yield an approximately normal distribution. Error estimates expressed as percent of the MPV may also yield a more normal distribution. Second, nonparametric tests as referenced above are not as sensitive to non-normally distributed data as parametric tests (but can be sensitive to heteroscedastic data). In cases of non-normal data, nonparametric tests have greater power than the paired t-test, expecially for small sample sizes (i.e., fewer than 30).
Tests for temporal changes in measurement bias and variability can also be important in evaluating the laboratory measurement process. In the study of long-term trends in stream water quality, it is especially important that changes in laboratory methods over time do not produce false trends or mask true trends in ambient water-quality measurements. There are examples in the literature where interpretations have been compromised by such method changes (see for example, Shapiro and Swain, 1983; Bruland, 1983; Flegal and Coale, 1989; Schertz and others, 1994). In estimating trends in measurement bias, a variety of parametric and nonparametric procedures are available to test for the presence of abrupt (i.e., step) changes (Hirsch and others, 1992) or linear and monotonic changes with time (Hirsch and others, 1992; Alexander and others, 1993).
Available tests for step changes in measurement variability include the parametric F test (Ott, 1993) for evaluating whether analytical method variances are similar for two different time periods, and the nonparametric test, the Wilcoxon-Mann-Whitney Rank-Sum test (Helsel and Hirsch, 1992), for determining whether the distribution of measurement variability has changed between two time periods. The Siegel-Tukey test (Siegel and Tukey, 1960) provides an alternative to the Rank-Sum test that is more sensitive to differences in the spread of two distributions.
Evaluation of laboratory analytical methods on the basis of statistical criteria provides important confirmation that analytical methods are producing consistent environmental measurements. The type of consistency required by users may include assurance that measurement errors are unbiased and unchanging with time. The exact nature of these requirements will ultimately depend on the intended uses of the environmental data (e.g., percent of observations detected, estimation of a mean concentration or trend in concentration). Detectable inconsistencies in laboratory methods based on statistical evaluations of BSP data may prompt a user to edit environmental data for particular time periods, disregard an entire environmental record, or simply exercise caution in interpretations of environmental data. These decisions on the part of data users are inherently subjective.
It is important to recognize that the above methods of statistical evaluation do not directly indicate the practical significance of laboratory measurement error for environmental observations. It may be of interest to know whether measurement errors (either "in-control" or "out-of-control") are sufficiently large to influence interpretations of the environmental data. Even a consistently "controlled" measurement process can exhibit subtle variations in bias that are detectable over long time periods and may have relevance to interpretations of environmental data (see, for example, Alexander and others, 1993). The effects of statistically significant measurement errors on the interpretation of environmental observations can only be evaluated through direct comparisons of estimates of measurement error and environmental data. The next section presents several ways of accomplishing this task.
Correction of water-quality data for measurement error
BSP data can assist the interpretation of national network water-quality data by providing information for evaluating and correcting the stream water-quality data for estimates of laboratory measurement error. The procedures discussed here use information on laboratory measurement bias and variability to provide additional protection against Type I and Type II statistical errors in the interpretation of the stream water-quality data. The accuracy of these correction methods will, of course, depend on how well the BSP references samples replicate the characteristics of the stream water- quality data. The intrinsic physical and chemical characteristics of the reference samples (and their limitations) as previously discussed should be considered when using the following statistical methods.
The first two methods help to identify and resolve cases where a bias-induced artifact in the stream water-quality data may be falsely interpreted as a real environmental signal (Type I error), and cases where measurement bias may prevent the detection of an environmental signal (Type II error). The first method provides a general approach to correct for the effects of measurement bias on categorical analyses of water-quality data. A second method describes a technique for computing an unbiased estimator of mean water-quality concentration or trend in concentration and for computing the variance of the estimator adjusted for measurement error. A final method evaluates laboratory measurement variability in the stream water-quality data. This method describes how to determine whether measurement variability contributes significantly to the total variability observed in the water-quality data and how to obtain an estimate of the natural variability (i.e., total variability corrrected for measurement variability).
Correction for the effects of measurement bias on categorical analyses The first method provides a general approach to correcting individual water-quality observations for measurement bias for the purpose of classifying stream concentrations as being above or below an environmentally- or statistically-relevant threshold. Some examples of the environmental questions for which this approach may be useful for evaluating and correcting the effects of measurement bias include: (1) What fraction of the time are stream concentrations above a particular water-quality standard or criterion? (2) Is a particular water-quality contaminant present in a stream (i.e., how frequently is the concentration above the method reporting limit)? (3) Have water-quality concentrations increased or decreased over time? The bias-correction method described here is satisfactory for obtaining a general impression of the effects of measurement bias on the water-quality concentration data. Any hypothesis test conducted with the bias-corrected data would not be accurate, however, because errors in the bias estimates are not systematically incorporated in the test (see the subsequent method for addressing this issue).
In addressing the first two questions, the effects of positive bias on interpretations of the environmental data, for example, may be evaluated by lowering stream water-quality concentrations by an amount equal to the estimated bias. These adjusted concentrations may, in turn, be compared to the water-quality standard or method reporting limit for the purpose of computing exceedence statistics. Error in the estimate of measurement bias may be partially accounted for by incorporating the probability that the expected value of bias based on sample measurements differs from the true bias. To ensure that there is a very small probability that measurement bias caused a water-quality concentration to exceed the standard or reporting limit, water-quality concentrations could be adjusted downward by an amount equivalent to the expected value of positive bias plus some multiple of the expected value's standard error (e.g., the upper 95 percent confidence interval value). By accounting for error in bias in this manner, the approach is aimed at reducing the probability of a Type I error.
Investigations of the question of whether concentrations have increased or decreased over time could potentially include an evaluation and adjustment for the effects of estimated trends in measurement bias on the detection of trends in stream water-quality concentrations. The magnitude of trend declared to be statistically or environmentally significant could be increased or decreased to account for estimated trend in measurement bias. This approach is equivalent to correcting each observation for the estimated magnitude of trend in measurement bias, although the method ignores error in the estimate of trend in bias. A study of trends in dissolved solids (ROE--residue on evaporation) at USGS long-term stream monitoring sites from 1980-88 (Smith and others, 1993) adjusted the threshold for declaring significant trends to account for the effect of an estimated change in measurement bias on stream concentration trend slopes. In view of evidence of a measurement bias trend of about 2 to 3 percent at a USGS laboratory from 1981- 89, statistically significant stream-concentration trend slopes less than about 3 percent were assigned to a "no trend" class. This technique provided additional protection against Type I errors by reducing the probability that a small, statistically significant trend resulting from long-term changes in laboratory measurement error would be incorrectly identified as a real trend in the dissolved solids concentration of a river. The objective of this approach is not to estimate an unbiased trend slope in water-quality concentrations (see the next section), but to reduce the probability that a trend artifact related to method changes could result in a water-quality trend being mis-classified as statistically significant.
Correction of an environmental estimator for measurement bias A second method for correcting stream water-quality data for estimates of laboratory measurement bias can be used to compute an unbiased estimator of water-quality (e.g., mean, trend slope) and to adjust the variance of the estimator for measurement error. This approach addresses both the Type I and Type II errors associated with statistical testing of the environmental data. The approach recognizes and incorporates the effects of random measurement errors on the quantification of bias and the correction of environmental data for bias. However, if random measurement errors associated with the estimation of bias are large, it may be difficult to use BSP data to clearly identify a measurement bias signal in the environmental data.
The approach (see Alexander and others, 1993 for a detailed explanation) decomposes the laboratory measurement bias in an environmental record into two components. The first component is a persistent, long-term (multi-year) error affecting all the water- quality measurements, and can be described by a mean bias or a bias trend slope. The second bias component is sporadic, but of sufficiently long duration to affect all water-quality samples analyzed over a shorter time period, possibly spanning days or months. This component reflects random variations in measurement bias of long enough duration that a systematic error appears in all water-quality samples analyzed at the laboratory during this interval of time. In correcting the mean environmental concentration or trend slope, it is necessary to adjust for the first error component. Accounting for the second error component can lower the variability of estimates of the bias-corrected mean or bias- corrected trend slope for the environmental data. This systematic error component can be estimated by the covariance between the measurement error and environmental data. If this bias actually affects stream water-quality measurements, then a portion of the random variability in the water-quality record will be positively correlated with the estimates of measurement error.
Accordingly, an estimate of a bias-corrected mean water- quality concentration (bcmean) can be computed as
bcmean = wqmean - biasmean (5)
where wqmean is the estimate of mean water-quality concentration and biasmean is the estimate of mean measurement bias. The standard error of the bias-corrected mean water-quality concentration (SEbc) is computed as
SEbc = sqrt( (1/n) sigma2w + sigma2b - 2 cov(w,b) ) (6)
where sigma2w is the estimated variance of the water-quality concentrations, sigma2b is the estimated variance in measurement bias, and the covariance can be estimated as
Cov(w,b) = (1/n) [summation(i=1,n)] (wi - wqmean) (ei-biasmean) (7)
where wi is the water-quality concentration for the ith time period, ei is the estimate of measurement error (equation 1) for the ith time period, and n is the number of concentration and measurement error pairs.
If random error in the measurement of bias (sigma2b) is large or the covariance between the environmental and measurement error data is small, the standard error of the bias-corrected mean environmental concentration may be larger than that of the biased (i.e., uncorrected) mean environmental concentration. This situation is not uncommon for national network data, and may occur for several possible reasons including difficulties in properly pairing the environmental and measurement error data (see Alexander and others, 1993). Thus, bias-correction may improve the estimate of mean water-quality, but it may also increase the variability associated with the bias-corrected mean estimate. This may lower the power of the statistical test to detect a particular signal in the environmental data. In these cases, the user must decide which type of estimate of mean concentration best satisfies their requirements-- a biased, precise estimate or an unbiased, imprecise estimate.
This method can be extended to correct estimates of trend in environmental data for trend in measurement bias. The method of "seemingly-unrelated" regression (SUR) is a parametric statistical method well suited for this task. The reader is referred to Alexander and others (1993) for a detailed description of how this technique can be used with BSP data to correct water-quality trends in NASQAN data for the estimated effects of measurement bias.
Assessment of laboratory measurement variability BSP data may also be used to assess whether variability in laboratory analytical determinations (i.e., random variations) may be large enough to have prevented the detection of an environmental signal. This information may be particularly important to the interpretation of water-quality data for streams with very low concentrations in proximity to the method reporting limit. Whether environmental observations actually measure the effects of real environmental processes is influenced by the size of the measurement variability relative to the total variability of the data. If measurement variability constitutes a sizeable fraction of the total variability observed in the environmental data, the probability of a Type II error, a failure to detect a real environmental signal, may increase. Evidence of significant measurement variability at the beginning of an investigation would normally indicate the need for improvements in methods or increased numbers of environmental samples. A retrospective statistical evaluation of variability involves comparing the total variability (as measured by stream concentrations) to the laboratory measurement variability in an F test. The presence of a significantly lower laboratory measurement variance relative to the total variance would indicate that laboratory measurement error is not likely to have accounted for the failure to detect an environmental signal in the data. Method variability related to the collection and processing of the environmental sample; however, could potentially account for an important source of variability not included in BSP estimates of measurement error.
The correction of estimates of total variability in environmental data for laboratory measurement variance could be of interest in cases where a reliable estimate of the natural variability of water quality at a stream site is required. This might include compliance monitoring situations where knowledge of the natural variability is necessary to assess the effect of incoming pollutants on stream water quality and to develop discharge limits. If measurement variability were to represent a sizeable fraction of the total variability, then a correction might be necessary. The estimated natural variability (sigma2n) can be computed as
sigma2n = sigma2t - sigma2l (8)
where sigma2t is the estimated total variance as measured by the environmental samples and sigma2l is the estimated laboratory method variance as measured by BSP samples.
Statistical analysis of measurement error estimates
As indicated in the section "Estimates of measurement accuracy" (p. 41), BSP error estimates computed according to equation 1 are potentially subject to uncertainty because of the effect of laboratory rounding rules on the numerical precision of reported water-quality measurements. In reporting instrument readings, the laboratory rounds water-quality measurements to the nearest whole number or decimal fraction according to the sensitivity of analytical methods as specified by the method reporting limit. As a result, the accuracy of individual estimates of error would be expected to decline as the magnitude of the error becomes smaller (i.e., as the magnitude of the error decreases relative to the size of the reporting limit). The highest levels of inaccuracy occur for error estimates that are less than the rounding threshold, or reporting limit, because the uncertainty in water- quality measurements exceeds the magnitude of the error in this range. The sign of error values smaller than the reporting limit in size cannot be accurately determined, but is estimated according to equation 1 in reporting values of NSD and percent bias in the CD-ROM data sets.
Error estimates below the method reporting limit (denoted as "less than" the reporting limit in the CD-ROM data base) vary considerably for the chemical constituents analyzed in the BSP. The percentage of less-than nutrient error values ranges from a low of 17 percent for ammonia to a high of 75 percent for nitrate plus nitrite; approximately 20 percent of phosphorus error values fall below the reporting limit. Percentages of less-than error values for major dissolved ions range from 4 to 50 percent with most constituents displaying percentages below 20 percent. Virtually all of the percentages of less-than trace-element error values exceed 35 percent with most above 50 percent.
As the proportion of error values significantly affected by rounding (i.e., small errors and errors below the reporting limit) increases, summary statistics such as the bias (i.e., mean error), variance in bias, and measurement variance (equation 4) are likely to become increasingly less accurate. Unfortunately, the effect of rounding on the accuracy of these statistics cannot be quantified from available information. As a general guide to addressing potential inaccuracies, we advise users to exercise caution in the interpretation of error summary statistics for those constituents and time periods where a majority of the error values are less than the reporting limit. Control charts may be helpful in these cases to graphically examine variations in the measurement process, especially the infrequent occurrence of errors exceeding the reporting limit.