|Ohio Water Science Center|
U.S. Geological Survey, Techniques and Methods 6–B5
By Donna S. Francy and Robert A. Darner
In Cooperation with the Cuyahoga County Board of Health, Northeast Ohio Regional Sewer District, Ohio Water Development Authority, and Ohio Lake Erie Office
This report is available as a 34-page PDF for viewing and printing.
The report cover (11" by 17" tabloid) is also available for viewing and printing.
State recreational water-quality standards are based on concentrations of indicator organisms, such as Escherchia coli (E. coli). Because the analytical methods for enumerating E. coli take at least 18–24 hours to complete, some agencies have turned to predictive modeling to obtain near-real-time estimates of recreational water quality. The USGS has been working with local agencies to develop empirical predictive models for five Lake Erie beaches in Ohio. One beach, Huntington, is used as example in this report to describe in a step-by-step fashion how data for models were collected and how models were developed and evaluated. These steps are not the only procedures that can be used to develop predictive models for beaches; rather, they are the methods used by the authors for the reported datasets.
The steps to develop predictive models are data collection; exploratory data analysis; model development, selection, and diagnosis; determination of model output values; and model validation and refinement. For Huntington, the predictive model was based on data collected during the recreational seasons of 2000–2004. The explanatory variables were wave height, weighted rainfall in the past 48 hours, and log10 turbidity; the model explained 38 percent of the variability in E. coli concentrations. Two outputs from the model were calculated: (1) the predicted E. coli concentration and (2) the probability that the E. coli single-sample maximum bathing-water standard of 235 colony-forming units per 100 milliliters (CFU/100 mL) will be exceeded. A threshold probability of 29 percent was established for the Huntington 2000–2004 model. The threshold probability is the probability associated with too great a risk to allow swimming and is established by examining historical data. The model was validated in 2005 and yielded more correct responses and better predicted exceedance of the bathing-water standard than did the current method for assessing recreational water quality (using the previous day’s E. coli concentration).
The procedures described in this report can be used to develop and test predictive models at other beaches. Predictive modeling is a dynamic process meant to augment existing beach-monitoring programs, not to replace them. Models should be continuously validated and refined to improve predictions and better protect public health. If validation tests are successful, a beach manager may decide to develop an Internet-based system that provides model predictions to the beach-going public. This type of system, called “nowcasting,” was implemented at Huntington on May 30, 2006.
Procedures for Developing Predictive Models
Exploratory Data Analysis
Model Diagnostics and Selection
Model Output and Validation
The Future of Predictive Modeling
Examples From Beach Studies at Huntington Reservation, Bay Village, Ohio
Appendix 1--SAS commands to determine the best 50 models and to obtain individual model parameters and Fortran program to determine the probability of exceeding the single-sample maximum bathing-water standard
1. Map showing locations of five Lake Erie beaches used to test development of predictive models.
2-8. Graphs showing:
2. Huntington, 2000–2004, relations between Escherichia coli concentrations and turbidity and day of the year.
3. Huntington, 2000–2004, Escherichia coli concentrations in water, by wave height and 24-hour wind direction.
4. Partial residual plots of explanatory variables for the Huntington 2000–2004 model: weighted 48-hour rainfall, wave height, and log10 turbidity.
5. Predicted Escherichia coli concentrations and residuals for the Huntington 2000–2004 model.
6. Measured and predicted Escherichia coli concentrations for the Huntington 2000–2004 model.
7. Establishment of the threshold probability for the Huntington 2000–2004 model.
8. Huntington 2005, performance in assessing recreational water-quality of Huntington 2000–2004 model: probability output and predicted Escherichia coli output compared to the current method.
1. Example of computing “wind direction 24” by vector addition of hourly wind directions and wind speeds for the 24-hour period preceding sampling for Huntington.
2. Summary statistics of Escherichia coli concentrations at Huntington, 2000–2005.
3. Pearson’s r correlations between log10 Escherichia coli concentrations and explanatory variables for Huntington, 2000–2005.
4. Pearson’s r correlations among explanatory variables for Huntington, 2000–2004.
5. List of possible models and the Mallows’ Cp test for Huntington 2000–2004.
6. Huntington 2000–2004 model, statistics and parameter estimates.
7. Huntington, numbers of correct responses and the sensitivities and specificities of model responses with indicated thresholds and predicted Escherichia coli (E. coli) concentrations compared to previous day’s E. coli concentrations.
This document is available in Portable Document Format (PDF)
To view and print report you will need to use Adobe Acrobat Reader (available as freeware)
Users with visual disabilities can visit Online conversion tools for Adobe PDF documents web page
Printable tabloid cover (839 KB) - 1 page (11" by 17" paper)
Whole report (4.0 MB) - 34 pages (8.5" by 11" paper)
Francy, D.S., and Darner, R.A., 2006, Procedures for Developing Models To Predict Exceedances of Recreational Water-Quality Standards at Coastal
Beaches: U.S. Geological Survey Techniques and Methods 6–B5, 34 p.
|AccessibilityFOIAPrivacyPolicies and Notices|
|U.S. Department of the
Interior, U.S. Geological Survey
Persistent URL: http://pubs.water.usgs.gov/tm6b5
Page Contact Information: USGS Publishing Network
Last modified: Tuesday, 12-Dec-2006 15:52:44 EST