Skip Links

USGS - science for a changing world

Scientific Investigations Report 2010–5008

Use of Continuous Monitors and Autosamplers to Predict Unmeasured Water-Quality Constituents in Tributaries of the Tualatin River, Oregon

Relations Between Continuous Monitor Data and Selected Water-Quality Constituents

Autosampler Deployment Dates and Conditions

Streams were sampled with autosamplers during storms from spring 2002 through autumn 2003 (table 9). Because of differences in hydrologic characteristics and responses among sites, inconsistent spatial extent of storms, and resource availability, only a few streams were sampled during any individual storm. To obtain the desired number of samples (approximately 48–50) covering reasonably broad ranges of field parameters and to develop robust regression models, some streams were sampled during more storms than others. For example, Rock, Chicken, and Fanno Creeks were sampled during three storms each, whereas Gales Creek was only sampled during one storm.

In some cases, the storms sampled by autosamplers at individual sites represented different seasonal conditions. For example, Beaverton and Rock Creeks were both sampled in summer (June) and autumn (December). This difference in season helped increase the range of field and laboratory constituent values obtained, which ordinarily would be useful for deriving strong correlations. For some field values at some sites, however, the same temporal differences also resulted in bimodal distributions that likely reduced the quality of the autosampler-derived regression models. For example, specific conductance at some sites was less variable during individual storms than between seasons, which abnormally skewed regressions that relied on specific conductance at those sites. For this reason, where bimodal distributions were observed in the initial graphical analysis, those parameters were removed from consideration for regressions at the respective locations. For the most part, bimodal distributions occurred primarily at the non-target sites, whereas the autosampler deployments at Fanno Creek were primarily during late spring in 2002 and 2003, and the deployments at Dairy Creek occurred in autumn 2003 (table 9) possibly serving to narrow the resulting range of constituent values.

In the following sections, figures are provided to show the percentage of time during the study period that the field values measured during the autosampler deployments were exceeded. Monitors were installed primarily during the late spring, summer, and early autumn because of reduced access when winter flows were high. Therefore, the data used to determine the percentage of time a given value was exceeded do not include the typically higher-discharge in winter, when it could be expected that, on average, specific conductance generally would be lower but also highly variable, and turbidities would be higher than during the months of monitor deployment.

The Fanno and Dairy Creek sites were sampled during relatively moderate storms during the study period (fig. 3), resulting in a smaller but sometimes bimodal range for specific conductance and turbidity. Overall, the range of physical conditions encountered while sampling was somewhat narrow. Discharges increased moderately during storms but mostly did not represent the highest peaks that commonly occur during some years; likewise, turbidity and other field parameters showed only moderate ranges during the sampled storms. Caution must be exercised when using regression equations from this analysis if conditions are outside the range documented during this study (tables 6 and 7). Extrapolation of regression equations beyond the bounds of the data used to formulate them is considered a potentially large source of error and is not recommended (Helsel and Hirsch, 1992). To a certain extent, the validation datasets in this study allow evaluation of the error introduced when the regression models are applied to conditions beyond the range of the input datasets. However, the validation datasets are limited in the range of conditions encompassed and therefore do not provide much additional information about the adequacy of the regression models to address many of the higher flow conditions.

Fanno Creek at Durham Road

Autosampler Data

The storms were sampled by autosamplers at Fanno Creek in late spring or early summer. The sampled peak discharges (94–134 ft3/s, table 9) covered a narrow range of potential discharges for this site (fig. 3); although storms of this size are fairly representative for May–June in most years, peak discharges exceeded these amounts at least 15 times on other dates during the study period (http://waterdata.usgs.gov/or/nwis/, accessed November 17, 2005). Fanno Creek is in a highly urbanized basin and responds quickly to rainfall making it a challenge to anticipate and react to storms to collect high-flow samples. This situation is indicative of the need for automatic sampling and increases the likelihood that extreme events will not be adequately sampled.

Several critical constituents exhibited less variability during individual autosampler storm events than between sampling events. This resulted in bimodal data distributions of the autosampler data, as illustrated for specific conductance, in figure 4. Data points tended to be clustered into small groups, and generally represented average conditions rather than the rarely occurring extremes usually indicative of storm conditions.

Specific conductance measured during sample collection, for example, was exceeded from about 25 to more than 95 percent of the time during the study period. The duration curves for specific conductance are different than for turbidity because the sources and mechanisms affecting the two are different. During base flow, specific conductance at Fanno Creek is relatively high (>200 µS/cm) but rainfall is dilute (about 0–20 µS/cm) so dilution by rain can cause a large range in responses. Turbidity caused by suspended particles may come from upland sources transporting particles to the stream, or from within the stream from erosion or resuspension. Turbidity values measured during autosampler deployment were more indicative of the higher and more continuous range during the study than were specific conductances, and were exceeded only about 2 to 34 percent of the time during the entire study period. Model results from conditions of specific conductances, and turbidities beyond those actually measured during the samplings in the Scenario 1–3 datasets (table 6) cannot be verified.

The relations between Fanno Creek field parameters and laboratory sample results from the autosampler deployments and the Scenario 3 dataset show some useful patterns (fig. 5). In figure 5A, the symbols for a given storm and constituent combination sometimes show different patterns indicating the differences between storms. Possible linear relations for the autosampler dataset are indicated for combinations of turbidity or discharge with TSS, TP, and E. coli bacteria. In the larger dataset represented by Scenario 3, linear relations are indicated between turbidity and TSS, TP, and E. coli bacteria. Several other potential relations are indicated, particularly for discharge and TSS or TP; however, considerable scatter is also apparent.

The occasional bimodal data distributions among storms and the narrow range of peak flows sampled (figs. 3, 4, and 5) indicate that the autosampler-derived data may be inadequate to develop robust regression models between field parameters and laboratory sample results for Fanno Creek near Durham. Patterns in the autosampler-derived data, however, also support the possibility that such models might be constructed with a more comprehensive dataset. The main limitations of the autosampler data, beyond any serial correlation issues, are that they do not represent all seasons or the high flow conditions, and that some constituent data are bimodal. The incorporation of USGS-NWIS and Clean Water Services ambient monitoring datasets into Scenario 2 and 3 datasets for Fanno Creek, in addition to the autosampler data, was an attempt to overcome these limitations (table 6). Outliers observed in the scatter plots were removed to prevent unacceptable leverage on the regression computations.

Total Suspended Solids

Several regression models for TSS at Fanno Creek near Durham are listed and characterized in table 10. The preferred models produced with each scenario included turbidity and specific conductance as explanatory variables. Discharge (or stage, as a surrogate for discharge) was included as an explanatory variable for Scenarios 1 and 2, but added little information in Model 5, as compared to Model 4, in Scenario 3. The values of the model coefficients for turbidity (0.01 and 0.009) and specific conductance (-0.003 and -0.003) did not change much in Scenario 3, whether or not discharge (Q) was included, nor was there any substantive change in the BCF or adjusted-R2. Sine and cosine terms were not significant (p > 0.05) for the models, indicating that seasonality was either unimportant or was already captured by the continuously monitored variables; these terms are therefore not shown in table 10.

Log transformation of the dependent variable was especially helpful for producing estimated TSS concentrations using continuous monitor data, despite requiring the use of a BCF when transforming the estimated values into normal, non-logarithmic space. In Scenario 1, the coefficients of determination for log-transformed (Model 1, adjusted R2  =  0.936) and non-transformed (Model 2, adjusted R2   =  0.956) TSS are good. However, many non-transformed values of TSS predicted from continuous monitor data using Model 2 (not shown) were negative, particularly outside of the specific calibration period of the autosampler storms; negative predicted TSS values are an unacceptable outcome and render Model 2 unusable. In subsequent regression calculations, the log-transformed values of TSS were always used for the dependent variable.

The use of different data scenarios for developing regression models met with mixed success but illustrates the need for more comprehensive input data. As expected, Scenario 1, using only the autosampler data, produced regressions with high adjusted-R2 (> 0.90), most likely a result of serial correlation in the autosampler-only data and a small range of environmental conditions sampled; however, the Scenario 1 regressions also had relatively large mean error and validation RMSE values, and non‑randomly signed residuals from the sign test, when compared with the broader validation dataset. Model 3 (Scenario 2), combining autosampler data with USGS-NWIS historical data, had similar calibration statistics to Model 6 (Scenario 3) but still may have been affected by serial correlation in the autosampler data. Nonetheless, from the validation process for Model 3, the mean error was intermediate (although indicating a high bias rather than a low bias) and the z-statistic from the sign test was considerably lower than the models from Scenario 1; also, the coefficient of determination (0.83) was the among the highest of all models. Despite the more randomly signed residuals, the Nash-Sutcliffe coefficient for Scenario 2 indicates that the predictive power of Model 3 may be worse than using the mean of the laboratory data. For Scenario 3 (which uses high‑flow data from USGS and Clean Water Services, with monthly Clean Water Services ambient monitoring data and the peak discharge samples from the autosampler deployment), the regression coefficients, correlation statistics, and validation statistics were similar with or without discharge (Models 4 and 5). Model 6 was evaluated to test the importance of log transformation of the explanatory variables in Scenario 3, but this transformation increased the mean error and RMSE for the validation statistics. All Scenario 3 models had poor coefficients of determination (<0.1) and Nash-Sutcliffe coefficients (<0.1), suggesting that they did not reproduce the validation data well, and that the means of the validation data would provide estimates that were as good or better than the model estimates. However, because the validation data are heavily weighted towards base‑flow conditions, and the objective of the modeling exercise is primarily to predict the high constituent concentrations during stormflows, these coefficients probably do not adequately reflect the utility of the model.

Model 5 produced diagnostic statistics equivalent to Model 4 but used an extra variable, discharge (Q), indicating that Model 5 probably is overfitted and therefore less robust (Helsel and Hirsch, 1992), and that Model 4 may be the most appropriate functional form given the available datasets. Conversely, specific conductance has little physical relevance to TSS other than as a surrogate for discharge, yet it was an important variable in all models. VIF values were less than 5 for all independent variables in the models shown; however, the largest VIF for Model 3 (for logQ) exceeded the VIFcrit. LogQ was highly significant in the model (p <0.0001, not shown); whereas logSC was only slightly significant (p = 0.07). The inclusion of discharge as an independent variable may be needed to represent mid-winter, high-flow conditions. Specific conductance was a significant (p <0.05) model coefficient in all other models. The relatively low VIFcrit for Model 3 may be a reflection of the Scenario 2 dataset and the model’s relatively low adjusted-R2.

Plotting a time series of the measured TSS concentrations against those predicted using the regression models allows the overall results of the model to be evaluated qualitatively. Individual data from the Scenario 3 calibration and validation datasets and the results from spring 2002 until summer 2003 including the 95 percent prediction interval of Model 4, are shown in figure 6. This period includes the two storms when the autosampler was deployed (table 9), and encompasses January 2003 when the station was moved from Durham City Park to Durham Road (fig. 6). Upon a cursory inspection, the model seems to predict slightly lower baseline TSS concentrations prior to moving the station. However, this period also predominantly encompasses spring–autumn, 2002, with naturally lower discharges; whereas the period after January 2003 is predominantly winter, characterized by higher discharges, so it is reasonable to expect higher baseline TSS concentrations in the winter. Calibration and verification data, which (except for the autosampler data) were collected at the Durham Road site, also show this shift, indicating that moving the station had a negligible effect on model calibration and predictions.

Although Scenario 3 demonstrates the type of dataset that may be most appropriate for developing robust regression models (that is, data that are independent, year-round, and include high-flow samples), validation of results from Models 4, 5, and 6 is hampered because few high-flow samples are included in either the calibration or validation datasets. Many high TSS concentrations are predicted but few calibration or validation data points are present during the high TSS events for comparison (fig. 6). Model 4 appears to be the most robust model for TSS at Fanno Creek, on the basis of up-front assumptions about the value of the different potential calibration datasets, and on the results in table 10 (the relatively low coefficient of determination [0.04] from the validation dataset notwithstanding). Visually, figure 6 shows that the storm-related predictions are relatively accurate for the moderate-sized storms represented in the available datasets. The log-scale used on the y-axis in figure 6 can cause a misperception in the magnitude of errors: during base-flow conditions, the model appears to slightly overpredict TSS concentrations; however, the actual errors are small compared to those at higher concentrations. Validation of these models at higher concentrations cannot be accomplished with the available data.

Comparing the measured and predicted values directly provides additional perspective into the uncertainties and limitations of the available datasets and models (fig. 7). That comparison, using results from Model 4, shows that the indicated prediction interval spans a range of almost an order of magnitude (~0.75 log units) for a given measured value. Available measured-TSS data are relatively well represented up to about 102 (or 100) mg/L, with a few additional samples at slightly higher concentrations up to about 102.5 (or 316) mg/L.

Total Phosphorus

Example regression models for TP at the Fanno Creek site are shown in table 11. As for TSS, initial results for TP produced many negative predictions when TP was not log transformed (Scenario 1, Model 3), despite the adjusted-R2 being relatively high (0.905). Bias correction factors for all models with log-transformation of TP (that is, all except Model 3) were similar, ranging from 1.01 to 1.03. Log transformation of explanatory variables was evaluated in Model 2 and does not seem to provide any benefit. The resulting adjusted-R2 is slightly lower than for non-transformation of the same variables (Model 1), and the regression RMSE is slightly larger. Validation statistics for Model 2 also are poorer than for Model 1. Sine and cosine terms were not significant, indicating that no seasonal cycles were present that were not already expressed by the other independent variables.

Adjusted-R2 values for Scenario 1 (calibration using autosampler data only) generally were higher than for Scenarios 2 and 3, which may be an artifact of serial correlation in the autosampler data, and fewer data representing the range of variability from different seasons or more high-flow events. Although turbidity seems to be the most directly linear predictor of TP when examined graphically from the autosampler data alone (fig. 5), specific conductance also was repeatedly an important independent variable in the regression process (table 11). Discharge, which often is significantly correlated with turbidity and TSS as well as TP, was evaluated for Fanno Creek in the scenario using high-flow data (Scenario 3, Model 6). Model results using discharge were not noticeably better than those in Model 5 without it, and its inclusion in models would likely result in overfitting. Although Model 5 is presumed, like Model 4 for TSS, the most robust because the input dataset is the most comprehensive and the least affected by serial correlation, the model’s correlation and validation statistics were relatively poor. In particular, the coefficients of determination for the initial model calibration (adjusted-R2  =  0.575) and for the validation (<0.01) indicate substantial room for improvement, although the validation dataset may not have a large enough range in TP to be useful for an R2 determination. Turbidity and specific conductance in Model 5 have VIF values exceeding the VIFcrit, but both variables are highly significant in the model (p <0.0001) indicating that the VIFcrit which is low because of the low adjusted-R2, does not accurately reflect the severity of multicollinearity. In Model 6, discharge is not a significant variable (p = 0.89, not shown) and seems to contribute to multicollinearity problems.

The predicted results from Model 5 were biased low compared to the validation dataset in Scenario 3, which could be a result of the large amount of calibration data representing baseline rather than high-flow conditions. These goodness‑of‑fit statistics, like those for TSS at the Fanno Creek site, are a measure primarily of the base flow rather than high-flow conditions, due to a relative lack of high-flow data for comparison. The model, therefore, is not necessarily as poor as the goodness-of-fit statistics might indicate, and the model predictions during storms probably still have some value.

Models 1, 4, and 5 are of the form logTP= f (Turb, SC), using datasets from Scenarios 1, 2, and 3, and their respective model coefficients and BCFs are relatively similar (table 11). The same is true of their model validation statistics; although the coefficient of determination and the Nash-Sutcliffe coefficient are the best for Model 4, no model’s validation statistics are particularly good. Although the VIFs indicate possible multicollinearity in Model 5, this is a result of the model’s adjusted-R2 being relatively low. It is, therefore, reasonable that this model form is appropriate for TP at Fanno Creek near Durham, but that gathering larger, more applicable datasets with high-flow samples will result in more refined model coefficients (the values for a, c, d, and BCF in table 11) rather than changing the functional form of the model.

Despite the relatively poor correlation and validation statistics mentioned above, the predicted results from Model 5 appear to capture the overall pattern of TP concentrations in Fanno Creek reasonably well (fig. 8), especially at low flow conditions. Most observed values are well within the 95 percent prediction interval for the model, despite often being separated from the model’s prediction line. A few high spikes in concentration are predicted during winter events (fig. 8A) when concentrations may reach 10 mg/L or higher; however, the accuracy of these spikes cannot be evaluated because no samples representing those events were available. The model also achieved a reasonable representation of TP during storms when the autosamplers were in use, June 2002 and May 2003.

Baseline conditions in the data are well represented in the model, with relatively constant average concentrations about 0.1–0.15 mg/L. Observed concentrations during summer 2002 were slightly higher than the predicted values, whereas in Winter 2002–03 they were slightly lower than those predicted, but the differences are within the uncertainty range indicated by the prediction interval. These differences are unrelated to the relocation of the continuous monitor from Durham City Park (site 1a in fig. 2) to Durham Road (site 1b in fig. 2), because that relocation would only affect the predicted concentrations rather than the observed. Instead, with generally higher discharges during winter samplings, the TP concentrations may be diluted during certain conditions such as the falling limb of storm hydrographs as reported by Anderson and Rounds (2003). With the exception of the autosampler data, most data in the Scenario 3 calibration and validation datasets were collected without regard to storm conditions and many were collected immediately after storm discharge peaks. Discharge is not included as an independent variable in Model 5, and generally was not significant in the candidate models (table 11). Future formulations of a model for TP at Fanno Creek may be enhanced by evaluating a model calibrated specifically for winter periods.

A comparison of measured and predicted TP concentrations illustrates that most of the available data occupy a relatively narrow range of TP concentrations, from about 10-1.2 to 10-0.8 (or about 0.06 to 0.16) mg/L, with only a few at higher concentrations being represented (fig. 9).

Escherichia coli Bacteria

Model results for E. coli bacteria were moderately successful (table 12), with several model forms having similar coefficients and reasonably strong adjusted-R2 values (0.586–0.713). No data on E. coli bacteria were available from the USGS databases, so the data for calibration were available for Scenarios 1 and 3 only, modified by the lack of USGS data. Models 1–5 used log-transformation of E. coli bacteria counts, and each model used turbidity as an explanatory variable. In Models 1, 3, 4, and 5, in which turbidity is not transformed, the regression coefficients for turbidity (0.014– 0.016) and the intercepts (2.17–2.53) vary only slightly. The addition of discharge as an explanatory variable, whether or not it was transformed, was not particularly useful, as evidenced by the lack of changes in the coefficients for turbidity, minor increases in the models’ adjusted-R2, and substantial increases in validation RMSE when discharge was added. Bias correction factors were all relatively large, particularly for the Scenario 3 datasets (about 1.4–1.5), indicating substantial negative bias in the uncorrected values. Once again, sine and cosine terms were not significant, so no models using them are shown. Goodness-of-fit statistics indicate relatively large uncertainty in the predicted values compared to the validation data, with a large (negative) mean error and RMSE values measured that mostly are about 1,000 or more for the Scenario 3 dataset. Model 6 had the highest adjusted-R2 for calibration in Scenario 3, and the coefficient of determination and Nash‑Sutcliffe coefficient were high (0.69 and 0.69, respectively). However, the explanatory and dependent variables were untransformed, and many predicted values during the modeled 2002–03 time period were negative, rendering Model 6 unusable for general purposes. Regression coefficients for the independent variables and intercept in Model 6 were 2–3 orders of magnitude higher than those in the other models, which is an artifact of the lack of log transformation.

All models with more than one independent variable were possibly affected by multicollinearity, despite maximum VIF values that were less than 3, again reflecting the relatively low adjusted-R2 values for the models. In each case, the statistical significance of the discharge term was poor ( p = 0.003, 0.438, 0.233, and 0.0025 for Models 2, 4, 5, and 6, respectively). Thus, the addition of discharge increased multicollinearity without an offsetting gain in model confidence.

Given the results from table 12, with previous assumptions that Scenario 3 represents the most appropriate input data available, Model 3 then represents the presumed best available model for E. coli bacteria. Predicted data from Model 3, together with the 95 percent prediction interval and the calibration and validation datasets are shown in figure 10. The prediction interval is substantially larger for models predicting E. coli bacteria than for TSS or TP, spanning almost 2 orders of magnitude. The model predicts baseline E. coli bacteria counts of about 300 colonies/100 mL during summer 2002, which is close to the single‑sample water quality standard of 406 colonies/100 mL. It also predicts closer to 400–500 colonies/100 mL during Winter 2003, overpredicting most calibration and validation samples and indicating that baseline predictions may have little utility. The model predicts numerous peaks of 10,000–100,000 colonies/100 mL, and mirrors the pattern of turbidity and discharge in Fanno Creek. Because most Clean Water Services monitoring data are from relatively low-flow conditions, few data are available to confirm these high counts, although the model accounts reasonably well for the variability observed over the hydrographs during storms sampled by the autosamplers in June 2002 and May 2003 (fig. 10B and C). Quantitation of E. coli bacteria at concentrations greater than about 1,000 colonies/100 mL is not routinely done by the Clean Water Services laboratory owing to the difficulty of differentiating tightly packed colonies grown on agar, or the nonconservative nature of large dilutions for bacterial growth (J. Miller, Clean Water Services, oral commun., June 2008). For that reason, obtaining reliable data from storms to calibrate or validate these or subsequent E. coli bacteria models may be difficult.

Prediction intervals for E. coli bacteria from Model 3 range almost 2 orders of magnitude (fig. 11). Measured data (that is, samples) are predominantly at low E. coli bacteria counts, with only a few from counts greater than 103 (1,000) colonies/100 mL. The model captures some trends in the measured bacteria concentrations, showing that the explanatory variable (turbidity) has some predictive information, but the uncertainty is large enough that this particular model has limited application until a better dataset becomes available. On the other hand, if use of the model were limited to predicting periods when bacterial counts exceed a threshold value such as 1,000 colonies/100 mL, rather than quantifying the actual peak values, model 3 might be adequate.

Dairy Creek at Highway 8

Autosampler Data

The response of Dairy Creek to storm runoff is different than that of many other streams included in this study. The Dairy Creek basin is predominately agricultural and is relatively insensitive to runoff from small- to medium-sized storms unless antecedent rainfall is high, a characteristic that is likely related to the small amount of impervious land upstream of the sampling site and the relatively low surrounding topographic relief. Prolonged dry conditions during summer cause streamflow to recede and several consecutive days to weeks of rain are usually required for streamflow to increase.

Once the soils are well saturated in autumn, streamflow in Dairy Creek tends to increase rapidly to relatively high levels and remains sustained for long periods during the winter until conditions begin to dry in late spring. Also potentially contributing to the hydrological and chemical response of the drainage basin, about 36 percent of the agricultural land in the Dairy Creek basin uses subsurface drainage or tile drains (U.S. Department of Agriculture, 1995). Tile drains are used where soil drainage is poor, allowing cultivation on lands that might otherwise preclude agricultural activities. However, tile drains also can provide an effective route for preferential flow of water and solutes to streams, speeding hydrologic response and reducing chemical transformations such as uptake, adhesion, or degradation (Stone and Wilson, 2006). The actual effect of tile drains was not evaluated directly in this study.

During the study period, the continuous monitor at Dairy Creek was deployed seasonally. The monitor was installed in spring when stage receded to allow wading, and removed in the autumn when stage was expected to become high. Backwater from the Tualatin River was sometimes the cause of high stages that limited access to the creek. As a result of the winter high stages, neither continuous monitor nor discharge data are available at Dairy Creek during the winter and early spring months (about November–May), limiting the ability to make predictions during those periods. Stream stage, which was the only continuously recorded parameter during the winter months in the study period, can be a useful surrogate for discharge.

Individual storms were sampled during October 2003 by autosamplers at Dairy Creek, with relatively little antecedent rainfall, and November 2003, with slightly wetter antecedent conditions than the October storm. The pattern of events sampled resulted in streamflow and water-quality conditions that were different between storms (although the stage during storms remained less than 10 ft). Therefore, data for discharge, stage, turbidity, and specific conductance had bimodal distributions that were dominated by storm-to-storm differences when all samples were included (fig. 12A). Specific conductance, in particular, showed little variability during the October 2003 storm, and varied only by about 5 percent during the November 2003 storm. In contrast to the autosampler data, the Clean Water Services ambient monitoring dataset includes numerous samples with stages greater than 10 ft (maximum 22.42 ft) during 2002-04. Linear relationships between turbidity and TSS were evident in both the autosampler-only and larger combined (Scenario 2) datasets; however, few other constituent pairs have apparent linear relations at the Dairy Creek site.

The sampling during storm 1 was triggered by an individual turbidity value (32 FNU) from the continuous monitor that was greater than the autosampler threshold value used to initiate sampling (25 FNU); however, turbidity values were less than 10 FNU in most subsequent samples. A slight increase in stream stage accompanied this storm and the samples were retained for analysis, despite the relatively modest overall storm response. Consequently, the sampler had been removed for cleaning before a larger storm several days later, which may have produced a broader range of values for field parameters and laboratory constituents. Soils in the drainage basin were apparently well saturated by the November 2003 storm, and stream stage increased to levels that were too high to collect samples.

Specific conductance during autosampler deployments was representative of mostly average conditions, ranging from 117 to 140 μS/cm, values that were exceeded about 40–70 percent of the time during the study period (fig. 13A). The sampled turbidity data represented a broader range of conditions at Dairy Creek, ranging from about 6 to 30 FNU, values that were exceeded between 1 and almost 90 percent of the time during the study period (fig. 13B). Recall, however, that the dataset used to determine these exceedances was derived from monitoring data that did not include winter high‑flow conditions.

Total Suspended Solids

Models for the autosampler-only data and the Clean Water Services ambient monitoring data were selected for high flow and the first routine samples from each month (the presumed best calibration data available) (table 13). Adjusted-R2 values were similar in models from Scenarios 1 and 2 (0.695–0.758). However, coefficients of determination for the goodness-of-fit validations were poor, less than 0.2 for all models. Bias correction factors were 1.02–1.03 for all models. Seasonal factors evaluated by inclusion of sine and cosine terms were insignificant and models using them are not shown.

Turbidity was an important explanatory variable for all models in both scenarios. In Scenario 1, the addition of discharge as an independent variable caused small increases in the adjusted R2 in Models 2 and 4 over Models 1 and 3. However, the addition of discharge generally increased the error when predicted values were compared to the validation dataset. The addition of discharge also increased the level of multicollinearity, with VIFs for both turbidity and discharge exceeding the calculated VIFcrit in Model 4. In contrast, the addition of discharge to Scenario 2 models had little effect on the calibration of the models, provided only a minor benefit to the models’ validation, and incurred possible multicollinearity in Model 6. High stages were not experienced during the autosampler deployments so discharge data were available and meaningful for all the Scenario 1 samples. However, stages greater than 10 ft were recorded for several samples in the Scenario 2 dataset. No discharge data were available for these samples from high stages, which explains the lack of benefit of discharge as an explanatory variable for TSS in Models 6 and 8.

The inclusion of specific conductance data was not statistically significant for any Scenario 1 or 2 models, and reduced the fit in almost all cases. The model coefficients for turbidity were the same in Models 1 and 3 (a = 0.025), and in Models 2 and 4 (a = 0.019), regardless of the addition of specific conductance, with similar effects in Scenario 2.

Scenario 2 may represent the most robust input datasets available for Dairy Creek, and Models 5 or 6, therefore, may represent reasonable initial models for TSS, although their respective goodness-of-fit statistics were poor. Models 5 and 6 provide similar results (hence, Model 6 is not shown in figure 14), and capture the baseline conditions (about 10 mg/L) moderately well for some periods in each summer during 2002–04. Model 1, derived from autosampler-only data and, therefore, limited by the range of conditions observed and by serial-correlation issues, overestimates the baseline conditions more than Models 5 or 6, especially when the actual TSS values drop to less than about 8 mg/L. Model 1 also has much greater variability and higher peak values than Models 5 or 6.

Results from the Scenario 1 and Scenario 2 models indicate that the most robust model form for TSS at Dairy Creek will probably be logTSS= f (Turbidity), although the addition of discharge (or stage) may be beneficial, especially at high discharges. Backwater issues will make discharge a difficult variable to use in the winter. Although stage data remain accurate during backwater conditions, the presence of these conditions still may require development of separate models for free-flowing and backwater conditions. Assuming that Scenario 2 uses more representative and thorough datasets than Scenario 1, the presumed best model for TSS at Dairy Creek, given the available data, is currently Model 5. Model 5 may appear to underestimate TSS during base flow (fig. 14), but this is an artifact of the superposition of the Model 1 results onto the graph, where the Model 5 line is obscured by the Model 1 line. The base-flow calibration and validation data are relatively well represented by the Model 5 results, and furthermore they are well within the 95 percent prediction interval for Model 5. Given this, Model 5 may perform adequately during summer.

A comparison of predicted and measured TSS concentrations from Model 5 (fig. 15) further illustrates the predominance in the Scenario 2 dataset by samples at relatively low TSS concentrations. Most measured concentrations were in the range of 100.6 (or about 4.0) to 101.5 (or about 32) mg/L, Importantly, this comparison also reveals the relatively large uncertainty of the model, with prediction intervals that encompass about a full order of magnitude. The clustering of predicted values at about 101.0 (or about 10) mg/L despite a moderate range in measured values likely is an indication that a model based on turbidity alone is insensitive to some of the factors contributing to raised TSS, and that inclusion of other independent variables such as stage or discharge or separation of models based on a stage threshold such as 10 ft will be beneficial.

Total Phosphorus

Models for TP at Dairy Creek from Scenario 1 were primarily dependent on turbidity, with discharge and specific conductance playing a lesser role. Using the Scenario 2 data, however, each model includes specific conductance which exerts a stronger role than either turbidity or discharge (table 14), and which may be a result of the more expansive range of specific conductances encompassed by the Scenario 2 dataset. Sine and cosine terms again were insignificant, suggesting that seasonal considerations were unimportant or already incorporated with the other independent variables. Coefficients of determination (adjusted-R2) for calibration of Scenario 1, Models 1 and 2, were substantially better than those from all other models examined, regardless of input datasets, potentially owing to serial correlation. Coefficients for turbidity in Scenario 1 ranged from 0.012 to 0.021, and coefficients for discharge and specific conductance were an order of magnitude less, varying little between models where they were used. Bias correction factors were relatively low, ranging from 1.01 to 1.04 among all models and both scenarios.

In Scenario 2, coefficients for specific conductance were essentially unchanged (0.003–0.004) between models, and were slightly less than one half the value of the respective Scenario 1 models (0.009). The number of observations (n) in the Scenario 2 models were, for the most part, fewer than those used in the Fanno Creek models (Scenarios 2 and 3, table 11). The number of observations was less because the Scenario 2 dataset includes many samples from mid-winter during 2002–04, when stage in Dairy Creek was greater than 10 ft (discharge unavailable), and (or) the continuous monitor at the Dairy Creek site had been removed for the winter. The use of stage as an explanatory variable in Scenario 2, in place of discharge, allowed the inclusion of eight additional samples in the calibration dataset but resulted in a lower adjusted-R2 for the model.

On the basis of calibration and validation statistics, no model from either scenario is strong enough for predictive purposes. Mean errors are relatively small, especially for Scenario 2 models. Possible multicollinearity was indicated for Models 1, 3, and 8, although the maximum VIFs were relatively low. Coefficients of determination for the model validation exercise were highest (0.55) for Model 1, and were otherwise poor (<0.1–0.38) for all other models. Many of the Nash-Sutcliffe coefficients were near zero or negative (max = 0.26), indicating that the means of the laboratory data may be as good a predictor as the models derived from it. The goodness‑of‑fit statistics are more reflective of base flow than high-flow conditions because of the paucity of high-flow data, especially for the validation dataset; therefore, model performance could not be properly evaluated. Additional data would be needed to refine and evaluate the models.

The predicted results of Scenario 2 models for TP at Dairy Creek captured the general seasonal pattern of summer baseline concentrations relatively well, but did not appear to capture the shorter term variability associated with events or other factors (fig. 16). Although Scenario 2 is assumed to represent a more robust calibration scheme than Scenario 1, Model 1 results tracked better with the laboratory data from the Storm 2 hydrograph than either Model 5 or Model 7, and may better represent the range of variability experienced under normal conditions. Model 1 also predicted high storm peaks of TP, sometimes exceeding 1 mg/L, but the accuracy of these predictions could not be evaluated. Results from Model 8, which incorporate stage rather than discharge, are not shown in figure 16 because they were almost identical to those of Model 7. Likewise, Model 6 (not shown in figure 16), the highest ranked model from Scenario 2 that used turbidity, did not capture the Storm 2 increases in TP and produced only minor variations compared to Models 5–8.

Comparisons of measured and predicted TP values for Model 1 have less variability and stay within the prediction intervals better than the values for Model 5 (fig. 17). Model 6 results were similar to those from Model 5. However, Model 1 used input data from Scenario 1 and Model 5 used input data from Scenario 2, so the two models are not directly comparable.

Generally, Scenario 1 models, particularly Model 1, were slightly more useful than those from Scenario 2 for estimating TP concentrations at Dairy Creek, but no model provided acceptably accurate predictions using the available datasets. Model 1 may overestimate variability in stream TP concentrations, but could be useful for understanding the overall pattern of TP resulting from changes in streamflow, turbidity, or specific conductance. However, it must be stressed that maximum TP concentrations encompassed by the Scenario 1 input data were less than 0.25 mg/L, so the model cannot be relied upon for predictions of concentrations greater than 0.25 mg/L. Furthermore, the reliance on discharge will be a limitation during high stages unless backwater conditions are comprehensively understood at the Dairy Creek site at Highway 8. The inclusion of specific conductance in almost every model implies that much of the TP in Dairy Creek may come from dissolved sources or may be associated with the movement of solutes in the basin, which is consistent with known groundwater inputs of dissolved phosphorus to Tualatin River basin streams during summer. Alternatively, because specific conductance is sometimes correlated with discharge, its presence in the models might also reflect erosion and solute sources, including phosphorus, at higher flows.

Escherichia coli Bacteria

Models for E. coli bacteria at Dairy Creek were primarily functions of specific conductance; only Model 4 in Scenario 1, using the autosampler data, did not include specific conductance (table 15). Furthermore, seasonal aspects were unimportant, with sine and cosine terms again being insignificant. The coefficients for specific conductance varied little between the individual models within a specified scenario, ranging from -0.025 to -0.029 for Scenario 1 models, and from 0.011 to 0.013 for Scenario 2 models. Coefficients for specific conductance were negative in Scenario 1 and positive in Scenario 2, suggesting that the response of E. coli bacteria during the storms sampled by autosamplers in autumn 2003 was different than in the long term Scenario 2 dataset. Multicollinearity, which can result in coefficients with signs different than expected, may have contributed to the results of Models 7b or 8, despite the low maximum VIFs of 1.4 and 1.2, respectively. The low VIFcrit values for these models reflect the poor adjusted-R2 values for the Scenario 2 dataset—all VIF values are well below the general rule-of-thumb values sometimes used by other investigators (Helsel and Hirsch, 1992); likewise the Condition Index (not shown), an alternate measure of multicolinearity (Draper and Smith, 1998) was about 50 for Models 1 and 3 but less than 20 for Models 6 and 7, in the range previously described as acceptable (Draper and Smith, 1988).

Although bacteria often are associated with particles in streams, E. coli bacteria regression models resulting from this study were only a function of turbidity in a few cases, primarily from Scenario 1. Log transformation introduced substantial bias when predicting E. coli bacteria, resulting in BCF values for Scenario 2 models from 1.29 to 1.45, indicating corrections of about 29 to 45 percent. Scenario 1 BCFs were lower (1.05–1.12) than those in Scenario 2, but remain mostly higher than for TP (table 14) and TSS (table 13). Comparatively high BCFs were also determined for E. coli bacteria models for Fanno Creek (table 12).

Adjusted-R2 values for calibration of Scenario 1 models for E. coli bacteria were substantially greater than for Scenario 2 models, which was also the case for models for TSS and TP. In contrast, model validation statistics, particularly the coefficients of determination and the Nash-Sutcliffe coefficients, were all poor. None of the models’ validation statistics were within the optimal ranges for errors generated by the prediction of E. coli bacteria counts. The highest Nash-Sutcliffe coefficient was only 0.21 for the Scenario 2 model using stage instead of discharge (Model 7b) as an independent variable, and all the coefficients of determination were less than or equal to 0.1.

The negative coefficients for specific conductance from the Scenario 1 models were opposite in direction to those from Scenario 2. As a consequence, the pattern from models using specific conductance in Scenario 1 also were opposite in direction to those from Scenario 2. All models overestimated bacteria counts in storm 1 (fig. 18). Data collection for storm 1 preceded the largest stream response by about 1 day, although Model 4, which was a function of turbidity only, predicted bacteria counts that were closest to the measured values. Model 1 (not shown), with a negative coefficient for specific conductance, predicted summer E. coli counts that were below the baseline and were not realistic. Model 4 also performed better during storm 2 than either Scenario 2 model shown (fig. 18C), mimicking the temporary increases in bacteria counts during the storm. Models 5 and 7b were selected for plotting because they represented the best Scenario 2 calibration model according to the Mallow’s Cp selection scheme and the best validation according to the Nash-Sutcliffe coefficients, respectively. Both models performed almost identically; the line from either model obscures the other in figure 18, reflecting the influence of specific conductance as an independent variable.

Comparison of measured and predicted values for E. coli bacteria models show considerable variation and generally poor predictions, especially by Model 5 (fig. 19B). Uncertainty around Model 5 was greater than for Model 4, particularly at low and high bacteria counts. Neither model demonstrated acceptable abilities to reproduce the measured values; all models should be considered preliminary. It is possible that the sources and dynamics of E. coli bacteria in Dairy Creek cannot be well characterized by variables such as flow, stage, specific conductance, and turbidity. If so, then this approach of using continuous monitors to estimate E. coli bacteria concentrations in Dairy Creek may fail. More data are needed to make such a conclusion.

Given the model calibration and validation statistics, and the performance of the models at predicting time series data and reproducing the original measured values used in the correlations, reliable predictions for E. coli bacteria at Dairy Creek at Highway 8 is a possibility, but several challenges remain. Although the continuous monitor at that site was converted to a permanent installation in 2004, the potential for backwater at stages greater than 10 ft during winter causes several concerns. Stage instead of discharge may be required as an explanatory variable, and separate models may be needed for winter and for summer. E. coli bacteria (and other constituents) may respond differently to backwater conditions than to unimpeded flow conditions, including potential issues of particle settling or upstream sources. One potential problem with backwater, the introduction of water from the downstream receiving waters at the sampling location, does not occur at the Highway 8 site (C. Beaman, Oregon Water Resources Department, written commun., April 22, 2009).

Non-Target Sites

Preliminary model forms were identified for TSS, TP, and E. coli bacteria at the non-target sites using autosampler‑only data (table 16) and minimization of Mallow’s Cp. Other than removal of outliers that cannot be resolved, and log-transformation of dependent variables, no attempt was made to verify homoscedasticity, use additional data, compensate for autocorrelation, or otherwise optimize the models. Model coefficients and adjusted-R2 values are not shown because the objective of this exercise was to evaluate the likelihood that models that are more robust could be developed if data representative of the range of environmental conditions at these sites become available.

Results in table 16 indicate a high probability that robust regression models can be developed for TSS and TP at all non-target sites, but that E. coli bacteria may be difficult to predict at sites other than Gales Creek at Old Highway 47. In almost all other cases, models of the form logX= f (Turb, SC,Q), where X is the dependent variable, may be constructed and would provide acceptable predictions. For the Rock Creek site, a regression model with specific conductance and discharge may be sufficient to predict logTP. When E. coli bacteria model results are not applicable, the adjusted-R2 values of the functional models were much lower than 0.5 as indicated in table 16; most were less than 0.1, and the available data were insufficient for predicting E. coli bacteria. The adjusted-R2 values for all other indicated models were greater than 0.7, which in some cases indicate even stronger correlations than the models for Fanno and Dairy Creeks. However, some of the limitations of the available data and the stream responses at the non-target sites should be considered:

  1. Discharge (and stage) was not continuously measured at either the Rock Creek or Beaverton Creek sites, so instantaneous values at those sites were reconstructed by simple routing of upstream discharges at existing stream gages. Considerable error likely was inherent in the timing and magnitude of the resulting hourly estimates of the storm hydrographs, especially considering the dynamic and variable nature of stream responses to different storms.
  2. No large storms were sampled at Chicken Creek, where the relatively undeveloped drainage basin muted the stream response to storms. Indeed, the stream response may represent an increase in groundwater input after storms rather than direct runoff, as indicated by minimal increases in turbidity and increases (rather than the expected decreases) in specific conductance. Nonetheless, Chicken Creek does, at times, respond to large storms, and warrants future study.
  3. Storm responses at the Gales Creek site were more muted than at several of the other sites and had lower flows than originally intended, and the range of values of the potential explanatory variables (turbidity, discharge, and specific conductance) was small.
  4. The Rock Creek at Woll Pond Way site was just downstream of an anomalous sediment source, where the streambank was observed to episodically calve into the river during high flow and cause short-term pulses of high turbidity that may have been poorly mixed. It is likely that the turbidity and sediment response at this site were not necessarily reflective of larger drainage basin processes. Subsequent to this study, the monitor at the Rock Creek site was moved downstream to a bridge crossing and reinstalled with a more permanent (all-season) design, so any attempt to predict water quality in Rock Creek would benefit from collection of data at the new site.

Given these limitations, additional efforts to further refine the model results at the non-target sites are not warranted without additional data collection specifically from high flow conditions and during several seasonal periods. Nonetheless, preliminary results in this study indicate that reasonably robust models for some constituents can be developed if appropriate data become available.

First posted June 18, 2010

For additional information contact:
Director, Oregon Water Science Center
U.S. Geological Survey
2130 SW 5th Ave.
Portland, Oregon 97201
http://or.water.usgs.gov

Part or all of this report is presented in Portable Document Format (PDF); the latest version of Adobe Reader or similar software is required to view it. Download the latest version of Adobe Reader, free of charge.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http:// pubsdata.usgs.gov /pubs/sir/2010/5008/section4.html
Page Contact Information: Contact USGS
Page Last Modified: Thursday, 10-Jan-2013 19:11:33 EST