Scientific Investigations Report 2012–5200
Appendix C. Regression Model EvaluationSeven linear regression models were evaluated for the Gresham and Milwaukie stations (table 2). Linear regression analysis requires normality of the resulting error distribution (Ott and Longnecker, 2001). Initially, the raw data sets used in the regression models were heavily skewed (a violation of normality). To minimize skew and to produce an error distribution approaching normality, the values of SSC, streamflow, and turbidity were transformed to base-10 logarithmic values. Other models were run with square root transformation or no transformation for comparative purposes. When appropriate, it is beneficial to use the same transformation and parameters for models at multiple stations, which tends to result in more congruent SSL computations between stations. The same basic model structure should be used at both stations unless diagnostic results warranted otherwise. Models were evaluated on the basis of two criteria:
Linear regression diagnostic comparisons are meaningful only if the dependent variable is the same for all models. This assumption is violated when some model variables have been log- or square-root transformed. Therefore, when required, diagnostic values were transformed back into linear space before being evaluated. Diagnostic Linear Regression StatisticsThe coefficient of determination (R2) estimates the proportion of variability explained by the regression model. Similarly, the Adj R2 estimates the proportion of variability explained by the regression model while accounting for the number of explanatory variables. RMSE is an unbiased estimator that quantifies the difference between values implied by an estimator and the true values of the quantity being estimated. MAE is a metric for measuring how far predicted values deviate from true values. MAPE expresses error in generic percentage terms. As regression models approach Adj R2 values of 1.0, the models approach perfect correlation. Similarly, as regression models approach RMSE, MAE, and MAPE values of zero, the models approach perfect estimation (residual values of zero). For the Gresham station, models using streamflow and turbidity as independent variables had higher Adj R2 values and lower RMSE, MAE, and MAPE values than models using a single independent variable (table 2). For the Milwaukie station, models using only turbidity as an independent variable performed as well as or better than models using both streamflow and turbidity as independent variables. However, the improvement in regression diagnostics gained by using a model with only turbidity as an independent variable at Milwaukie was much smaller than the overall advantage gained by using both independent variables at Gresham. Consequently, if the same basic model structure were to be maintained at both stations, the model using both independent variables would provide better overall results. Evaluation of Linear Regression Model ResidualsOne of the assumptions of linear regression is that the residual errors are normally distributed. Violations of this assumption compromise the estimation of coefficients and the calculation of prediction intervals. The Jarque-Bera (JB) test for normality (Jarque and Bera, 1980) was used on the residuals of each model. The JB test is a goodness-of-fit test that examines the skewness and kurtosis of a distribution and compares it to a matching normal distribution. The JB test statistic has a chi-squared distribution with two degrees of freedom. The P-values associated with the computed JB test statistic are shown in table 2. Using a significance level of 0.05, values greater than 0.95 suggest a statistically significant departure from normality in the distribution of residuals for the model. For the Gresham and Milwaukie stations, all models failed to reject the null hypothesis of normally distributed residuals. Linear regression models assume homoscedasticity (constant variance) of the resulting error distribution. Violations of the homoscedasticity assumption can result in inaccurate forecast error and prediction intervals. Violations also can result in too much weight given to a small subset of the data, such as the group of measurements with the largest SSC values. The Breusch-Pagan (BP) test can be used to measure heteroscedasticity in a linear regression model (Breusch and Pagan, 1979). The BP tests the residuals of an error distribution by regressing the squared residuals with the independent variables. The BP test is chi-squared with k degrees of freedom, where k is the number of independent variables. The P-values associated with the computed BP test statistic are shown in table 2. Values closer to 0 suggest a stronger departure from homoscedasticity in the distribution of residuals for the model. At the Gresham station, models 1, 2, and 5 failed to reject the null hypothesis of homoscedasticity at a significance level of 0.05. For all other models, the null hypothesis is rejected, and the model residual distribution is considered heteroscedastic. At the Milwaukie station, models 2, 3, and 5 failed to reject the null hypothesis of homoscedasticity at a significance level of 0.05. For all other models, the null hypothesis is rejected, and the model residual distribution is considered heteroscedastic. Selection of ModelModels 2 and 5 were eliminated from consideration due to their relatively poor results from the linear regression diagnostic statistics. No models were eliminated based on the JB test. The BP test of homoscedasticity was rejected for most models. Models with turbidity as an independent variable appear to be less homoscedastic in their error distributions (table 2). However, models with turbidity as an independent variable tend to provide more accurate estimates (that is, lower RMSE, MAE, and MAPE) than models not employing turbidity as an independent variable. The extra accuracy gained by including turbidity as a regression variable far outweighs any diminished accuracy in forecasts and prediction intervals resulting from heteroscedasticity. Each diagnostic linear regression statistic was ranked for each station between models (for example, model 7 provided the lowest MAE value at the Gresham gaging station and was ranked first), and the average rank for each model computed. Model 6 was selected because it had the lowest average rank. References CitedBreusch, T.S., and Pagan, A.R., 1979, Simple test for heteroscedasticity and random coefficient variation: Econometrica (The Econometric Society), v. 47, no. 5, p. 1,287–1,294. Jarque, C.M., and Bera, A.K., 1980, Efficient tests for normality, homoscedasticity, and serial independence of regression residuals: Economics Letters, v. 6, no. 3, p. 255–259. Ott, R.L., and Longnecker, M., 2001, An introduction to statistical methods and data analysis: Pacific Grove, Calif., Wadsworth Group, 1,152 p. |
First posted October 3, 2012 For additional information contact: Part or all of this report is presented in Portable Document Format (PDF); the latest version of Adobe Reader or similar software is required to view it. Download the latest version of Adobe Reader, free of charge. |