Skip Links

USGS - science for a changing world

Scientific Investigations Report 2010–5008

Use of Continuous Monitors and Autosamplers to Predict Unmeasured Water-Quality Constituents in Tributaries of the Tualatin River, Oregon

Table 8. Goodness-of-fit statistics used for evaluation of regression model predictions.

[Abbreviations: NA, not applicable; n, number of samples]

Description of test Acceptable range Explanation
Number of points compared NA; more points Number of validation data points with comparable predicted values from the regression. Analogous to ‘n’ in regression model, determines degrees of freedom.
Mean error (ME) Near zero—exact range depends on constituent Average error between predicted values and laboratory values in validation data set—a good measure of bias.
Mean absolute relative error (percent) 0–50 Average absolute difference between predicted and laboratory value as normalized to laboratory values.
Root mean square error (RMSE) Depends on constituent; a number closer to zero is better A measure of the square of the mean plus the square of the standard deviation. If the mean error is zero, then the RMSE is equal to the standard deviation of the errors—a good measure of the magnitude of the typical error of the prediction. A high R2 with a poor fit based on the RMSE is possible if the range of the data is large.
Coefficient of determination (R2) Approximately 0.6–1.0, although user defined. Analogous to coefficient of determination for regression, based on differences of predicted and known values of independent variables.
Nash-Sutcliffe coefficient Approximately 0.6–1.0, although user defined. Also called the Coefficient of Model Fit Efficiency—it is the proportion of variance in the measured values that is explained by the predicted values, and is a more rigorous fit statistic than the coefficient of determination. A value of 1.0 is a perfect fit. A value of 0 indicates that the model predictions are only as accurate as the mean of the observed data. Anything less than zero means that the observed mean is a better predictor than the model. However, note that the value of this coefficient is highly dependent on the available validation data. If validation data are insufficient to characterize the response variable, this coefficient may under-represent the true fit of the model.
Number of negative differences Similar to number of positive differences Number of predicted values that are less than the relevant laboratory value.
Number of positive differences Similar to number of negative differences Number of predicted values that are greater than the relevant laboratory value.
Probability from sign test > Typically greater than 0.05 Using the sign test on the residuals, this is the statistical probability that the number of positive and negative differences could have resulted if the errors were truly random in direction.
z-statistic from sign test < Typically less than 1.96 Using the sign test on the residuals, the z-statistic provides a statistical measure that determines whether the number of positive and negative differences could have resulted if the errors were random in direction.

First posted June 18, 2010

For additional information contact:
Director, Oregon Water Science Center
U.S. Geological Survey
2130 SW 5th Ave.
Portland, Oregon 97201
http://or.water.usgs.gov

Part or all of this report is presented in Portable Document Format (PDF); the latest version of Adobe Reader or similar software is required to view it. Download the latest version of Adobe Reader, free of charge.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http:// pubsdata.usgs.gov /pubs/sir/2010/5008/table8.html
Page Contact Information: Contact USGS
Page Last Modified: Thursday, 10-Jan-2013 19:12:01 EST