Most of the nutrient and sediment water-quality data used in this assessment are from monitoring networks that were not designed for estimating transport. This section examines the validity of load estimates from these data sets by evaluating the issues of sampling error caused by sparse data and sampling bias, the suitability of the data sets to Cohn's Estimator model assumptions, and calibration error.
Successful calibration of a regression model requires a minimum number of observations for each regression variable--a general rule of thumb for data sufficiency is 10 observations for each variable (or regression coefficient), with at least 20 percent of the observations above the minimum reporting level (MRL) (T.A. Cohn, U.S. Geological Survey, written commun., 1995). This data-sufficiency criterion translates to a minimum of 70 observations for the seven-parameter regression analysis in Cohn's Estimator model. This requirement was relaxed for this application because many of the data sets had fewer than (in some cases, less than half) the prescribed 70 observations; the accuracy of the load estimates for these data sets is expected to be lower. The estimates of dissolved orthophosphorus at site 11 (Flint Creek near Falkville) are subject to substantial error because fewer than 20 percent of observations were above the MRL.
The data sets are relatively unbiased with respect to season of sampling (fig. B1), because samples were collected on a quarterly basis. Quarterly sampling leaves to chance the representation of high flow events: the hydrologic condition during which a large percentage of transport at riverine sites occurs for many nutrient species. Examination of the distribution of nutrient samples within the streamflow distribution (fig. B2) shows that the data sets are biased against higher streamflows (the 0-10 deciles of streamflow). The under-representation of higher streamflows, especially noticeable at site 6 (Duck River above Hurricane Mills) among the riverine sites, reduces the accuracy of model calibration for high streamflows, and thus substantially reduces the accuracy of load estimates.
Regression results were examined for two standard assumptions in least squares theory (Draper and Smith, 1981): normality of residuals and constancy in error variance throughout the range of values of regression variables (homoscedasticity). These assumptions were satisfied in only about half of the data sets examined, which leads to uncertainty that the log-linear regressions were able to adequately model constituent transport from these data. Data sparseness and sampling bias may contribute to this problem.
The calibration error statistics of Cohn's Estimator regression models can be used as a partial indication of model accuracy and precision, although these statistics cannot account for the errors introduced by sampling bias, data sparseness, and data characteristics that do not match model assumptions. The coefficient of determination, r2, represents the amount of variance in the concentration data that is explained by the regression variables; therefore, the value of r2 is a measure of the fit of the regression model to the data. A high value of r2 indicates that the regression equation can estimate daily concentration, and thus daily and annual load, with a high degree of accuracy. The standard error, s, is the estimate of standard deviation about the regression. The smaller the value of s, the more precise the estimates of daily concentration and load. The upper and lower bounds of the 95-percent confidence interval for each load estimate are calculated from s, and are similar measures of the precision of the estimate given the values of the independent variables. The values of r2 ranged from 0.09 to 0.78, and the values of s (log units) ranged from 0.20 to 1.55. Although load estimates are reported for all data sets, regardless of values of r2 and s, the estimates from data sets with small r2 and large s are probably less accurate.
Despite these limitations, the accuracy of the estimates of instream load presented in this report are considered to be the best possible based on the available data. The model-calculated errors in individual estimates generally are less than differences among sites for a single year, and among years with different hydrologic conditions (wet, dry). Therefore, interpretations with these data of broad spatial patterns of instream load and comparison of instream load to input are considered valid.
Next | Title Page | Table of Contents | List of Illustrations | List of Tables | Conversion Factors and Vertical Datum | Glossary |