Site-Specific Estimation of Peak-Streamflow Frequency Using Generalized Least-Squares Regression for Natural Basins in Texas

Water-Resources Investigations Report 99-4172

By William H. Asquith and Raymond M. Slade, Jr.

Abstract
Introduction: Purpose and Scope; Hydrologic Regionalization; Basin Characteristics; Generalized Least-Squares and Other Regression Methods in Studies of Peak-Streamflow Frequency
Site-Specific Estimation of Peak-Streamflow Frequency: Site-Specific Approach; Computer Program; Example; Evaluation of Site-Specific Regionalization of Peak-Streamflow Frequency
Summary
Selected References
Appendix

Tables

Table 1. Summary of site-specific peak-streamflow frequency estimation for example in text
Table 2. Evaluation of site-specific peak-streamflow frequency options
Table 3. Weighted root-mean-square errors (RMSE) for options of site-specific peak-streamflow frequency
Table 4. Mean of the five largest station errors (M5LM) for options of site-specific peak-streamflow frequency
Table 5. Root-mean-square errors between peak-streamflow frequency analysis using the traditional regional regression approach and selected options of the site-specific peak-streamflow frequency approach

Computer Program

For Windows NT: txfreq2003.zip
For UNIX Systems: txfreq2003.tar.gz

U.S. DEPARTMENT OF THE INTERIOR
Bruce Babbitt, Secretary

U.S. GEOLOGICAL SURVEY
Charles G. Groat, Director

Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

For additional information write to:

District Chief
U.S. Geological Survey
8027 Exchange Dr.
Austin, TX 78754.4733
Email: mailto:dc_tx@usgs.gov

Copies of this report can be purchased from

U.S. Geological Survey
Branch of Information Services
Box 25286
Denver, CO 80225.0286
Email: infoservices@usgs.gov

ABSTRACT

The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and also for the desin of culverts, dams, levees, and other flood-control structures. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site. The stations included in the regression are selected using an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Selecting the stations using cluster analysis ensures that the stations included in the regression will have the most pertinent information about flooding characteristics of the ungaged site and therefore provide the basis for potentially improved peak-streamflow frequency estimation. An evaluation of the site-specific approach in estimating peak-streamflow frequency for gaged sites indicates that the site-specific approach is at least as accurate as a traditional regional regression approach.

INTRODUCTION

Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and for the design of culverts, dams, levees, and other flood-control structures. In 1991, the U.S. Geological Survey (USGS), in cooperation with the Texas Department of Transportation, began an investigation of floods in Texas. The primary objective of the investigation was to develop techniques and procedures to estimate peak-streamflow frequency for streams with natural drainage basins in Texas. A natural basin is defined as a basin with less than 10 percent impervious cover, less than 10 percent of its drainage area controlled by reservoirs, and no other human-related factors such as land use changes that would measurably affect peak streamflow. Two techniques that accomplish the primary objective are available: The first technique involves "traditional" regional regression equations for 11 geographically defined regions in Texas (Asquith and Slade, 1997). The second technique, referred to as the site-specific approach, is the subject of this report.

Purpose and Scope

The purpose of this report is to document a computer program for site-specific peak-streamflow frequency estimation for ungaged sites in natural basins of Texas. The computer program regresses peak-streamflow frequency on basin characteristics of selected stations and produces various diagnostic calculations on the regression. The data base for the computer program, with the exception of climate factors, is documented in Asquith and Slade (1997). The data base contains the peak-streamflow frequency estimated with data through the 1993 USGS water year(1), selected basin characteristics, and ancillary information for each of 559 streamflow-gaging stations in Texas and for each of 105 stations in Arkansas, Louisiana, New Mexico, and Oklahoma. Both the in-state and out-of-state stations met the natural-basin criteria at the time that the data included in Asquith and Slade (1997) were collected. All stations have at least 8 years of annual peak-streamflow data from natural basins. Asquith and Slade (1997, pl. 1) provides a map of Texas showing the locations of the stations. Climate factors involved in development of the computer program were derived from Lichty and Karlinger (1990).

Hydrologic Regionalization

Hydrologic regionalization is the method by which peak-streamflow frequency information for similar stations is combined to (1) provide more information about the flooding characteristics, which improves the accuracy of peak-streamflow frequency estimates over those derived solely from the data for one station, and (2) provide procedures to estimate peak-streamflow frequency for ungaged stream sites. Traditionally, hydrologic regionalization of peak-streamflow frequency is used to establish regional hydrologic regression equations. The regression establishes the statistical relation between peak-streamflow frequency and selected basin characteristics. In regression, the 2-, 5-, 10-, 25-, 50-, and 100-year peak streamflows are used as dependent variables, and selected basin characteristics are used as independent variables. Logarithmic transformations of the dependent and independent variables commonly are done to increase the linearity between the dependent and independent variables. Numerous references are available describing background, theoretical aspects, application, and problems related to statistical peak-streamflow estimation. The reader is directed to the "Selected References" section. Excellent starting points in the literature are Cunnane (1989) and Stedinger and others (1993, p. 18.33).

Basin Characteristics

Eight independent variables (basin characteristics) were selected for investigation of hydrologic regression for Texas: the 2-year 24-hour precipitation, mean annual precipitation, contributing drainage area, basin shape factor, and stream slope; and the 2-, 25-, and 100-year climate factors (Lichty and Karlinger, 1990). The following basin characteristics are discussed in Asquith and Slade (1997, p. 7): 2-year 24-hour precipitation, mean annual precipitation, contributing drainage area, basin shape factor, and stream slope. The 2-year 24-hour precipitation (Hershfield, 1962) and mean annual precipitation (based on 1951-80 data [Larkin and Bomar, 1983]) were determined for the approximate centroid of each basin and are expressed in inches. The 2-year 24-hour precipitation and other precipitation variables for Texas have been updated (Asquith, 1998a), but the updates were too late to be included in the peak-streamflow frequency study by Asquith and Slade (1997) and in the current report. After extensive testing, the authors decided that the precipitation variables and climate factors would not be implemented in the user interface of the computer program for this application. However, they are retained in program logic, evaluated, and shown in tables 2 and 3 to keep the program adaptable for future applications. Some of the extensive testing is documented in "Evaluation of Site-Specific Regionalization of Peak-Streamflow Frequency" in this report.

The contributing drainage area is expressed in square miles. The basin shape factor is the ratio of the square of the stream length to the contributing drainage area, which mathematically represents the ratio of the longest stream length to the mean width of the basin. The stream length for the stream-slope calculation is defined from the longest mapped channel from the station to the headwaters, on the basis of USGS 1:100,000 quadrangle maps. The average stream slope, expressed in feet per mile, is the ratio of (1) the change in elevation of the longest mapped channel from the station to the headwaters to (2) the length of the longest mapped channel.

Maps of 2-, 25-, and 100-year climate factors (Lichty and Karlinger, 1990) were developed for the United States east of the Rocky Mountains and south of approximately latitude 45° N using data from 71 long-term rainfall stations and regression analysis that related rainfall-runoff estimates of peak-streamflow frequency from 50 model calibrations to basin characteristics. The 2-, 25-, and 100-year climate factors were estimated for each of the 664 stations by interpolation of a gridded data base derived from climate-factor contour maps. The interpolation was done using Fortran algorithms provided by Gary D. Tasker (U.S. Geological Survey, written commun., 1996). These algorithms are incorporated in the computer program presented in this report. Only 2-, 25-, and 100-year climate factors are documented by Lichty and Karlinger (1990), but six recurrence intervals are discussed in this report. For program development, the 2-year climate factor was used for hydrologic regression of the 2- and 5-year peak streamflows. The 25-year climate factor was used for the 10- and 25-year peak streamflows, and the 100-year climate factor was used for the 50- and 100-year peak streamflows. However, for the final version of the program, the authors decided not to implement the climate factors in the user interface of the computer program for the application presented here.

Generalized Least-Squares and Other Regression Methods in Studies of Peak-Streamflow Frequency

Generalized least-squares (GLS) regression (Stedinger and Tasker, 1985; Tasker and Stedinger, 1989) is the basis of the hydrologic regression in this report. GLS has advantages over the more widely known regression methods--ordinary least-squares (OLS) and weighted least-squares (WLS). One advantage of GLS regression over OLS is that stations with more data, hence more reliable peak-streamflow frequency estimates, are given more weight in the regression procedures. Another advantage of GLS regression over OLS and WLS is that further adjustments in weight are done on the basis of cross correlations in the data--thus, the undesirable effects of stations with related data (cross-correlated data) are reduced. Neither OLS or WLS regression considers the effects of cross-correlated data.

Numerous studies of peak-streamflow frequency have been done for various regions of Texas. These studies, although differing in scope, all have used linear regression. The peak-streamflow frequency study for Texas by Schroeder and Massey (1977) used OLS regression. Peak-streamflow frequency studies using WLS have been done for several localities. Such studies include regional equations for the Highland Lakes area (Asquith and others, 1996), the entire State of Texas (Asquith and Slade, 1997), the tributaries of the lower Colorado River (Asquith, 1998b), and the Brazos River Basin (Raines, 1998). A peak-streamflow frequency study using GLS was done for Hays County (Slade and others, 1995). A more basic-research-oriented study of peak-streamflow frequency in Texas by Tasker and Slade (1994) used a GLS regression and focused on the performance of a site-specific regression analysis. Tasker and Slade (1994) demonstrated that the site- specific approach (referred to as "interactive" by Tasker and Slade [1994] and as "region-of-influence" by Hodge and Tasker [1995]) coupled with GLS regression has smaller root-mean-square errors than the traditional geographic regional approach.

SITE-SPECIFIC ESTIMATION OF PEAK-STREAMFLOW FREQUENCY

Comprehensive discussion of the various approaches and regionalization techniques of peak-streamflow frequency is beyond the scope of this report. The authors assume that the reader has some knowledge and experience with the various aspects of peak-streamflow frequency regionalization. The reader is directed to "Selected References" for further details.

Site-Specific Approach

The site-specific approach for peak-streamflow frequency regionalization, first proposed for Texas by Slade and Smith (1993), differs from the standard or traditional regional regression approach in that equations are developed to estimate peak-streamflow frequency specifically for the ungaged site. The equations are unique to the ungaged site and therefore have limited applicability or transferability to other ungaged sites. In the traditional approach, equations are carefully and purposefully constructed to be applicable for a broad range of sites within a common geographic region. The emphasis in the site-specific approach is on the peak-streamflow prediction and associated errors of prediction rather than the equation that produces them.

A major element of any hydrologic regionalization technique is how stations to be pooled together, or regionalized, are chosen. In the traditional regional approach, the stations that are pooled together are chosen on the basis of a commonality of geographic region. The region encompassing the stations is delineated on the basis of factors such as channel types, climatic characteristics, elevation, geology, physiography, proximity to regional water bodies, and relative topography. In an effort to develop equations that are applicable to a broad range of ungaged sites within a region, it is common to seek and include stations that provide a large range of basin characteristics. Accordingly, some of the stations can have considerably different basin characteristics. For example, in region 3 of Asquith and Slade (1997, p. 62), drainage areas of the included stations range from 11.8 to 14,635 square miles (mi²). Logic dictates that the annual peak-streamflow generation processes for the station with the 14,635-mi² basin have little relevance to the annual peak-streamflow generation processes for the station with the 11.8-mi² basin, even though the 11.8-mi² station is in the larger basin and is proximate to the 14,635 mi² station.

The regionalization in the site-specific approach is based on a fundamentally different line of reasoning. The regionalization is based on an informal cluster analysis that compares the basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. In the site-specific approach, the peak-streamflow frequency and basin characteristics used for the estimation of peak frequency at an ungaged site include only the data from stations whose basin and climatic characteristics are similar to those for the ungaged site.

Because of the restriction that the basin characteristics be similar, the site-specific approach has several advantages. One advantage of the approach is that the stations included in the regression will be hydrologically similar to the ungaged site. The basin characteristics of the ungaged site should lie within the multidimensional explanatory-variable cloud defined by the basin characteristics of the selected stations. Asquith and Slade (1997, p. 16-26) provide numerous two-dimensional representations of the multidimensional cloud as shaded areas encompassing clustered points on graphs of the relations between basin characteristics.

Several important consequences result when the ungaged site basin characteristics lie within and near the middle of the explanatory variable cloud. Because estimations of peak streamflows from the regression are made near the center of the space of the explanatory variables, extrapolation errors, or more specifically extrapolation biases, in the relation between peak-streamflow frequency and basin characteristics are reduced. Additionally, violations of the linearity assumption for the regression are less likely to cause problems for the linear regression (Tasker and Slade, 1994). The issue of linearity between peak-streamflow frequency and basin characteristics for Texas is discussed in Asquith and Slade (1997, p. 8-10).

Regions are not defined by fixed geographic boundaries. Therefore, the potential for substantially different estimates for similarly characterized watersheds on either side of such a boundary is greatly reduced. Although it is considered an advantage not to have fixed regional geographic boundaries, visualization of the region surrounding a particular ungaged site is sometimes difficult.

The site-specific approach can be easily updated in the future as more data are collected; and as new statistical tools, such as L-moments (Hosking and Wallis, 1997), improve peak-streamflow frequency estimation and more detailed basin-characteristic assessment becomes possible through sophisticated computer analysis (Randy L. Ulery, U.S. Geological Survey, oral commun., 1998).

The most important aspect of a site-specific approach is the selection of stations with characteristics similar to those of the ungaged site. Two techniques are identified for selection of suitable stations: (1) a proximity search with a drainage area range restriction and (2) a similarity search. Discussion of the application of these two techniques for Texas is presented later.

The first technique available for selecting suitable stations is a geographic proximity search in which the selected "nearest" stations are within a specified range in drainage area. The geographic distance D_ij (or proximity) of the ungaged site i to station j is calculated by the following:

(1)

,

where
X_i = the easting coordinate of site i;
X_j = the easting coordinate of station j;
Y_i = the northing coordinate of site i; and
Y_j = the northing coordinate of station j.

A subset of stations is chosen from the entire peak-streamflow frequency data base by selecting those stations (j) that are most proximate to the site (i) and whose drainage areas lie within the following range:

(2)

,

where
A_i = the drainage area of site i;
A_j = the drainage area of station j; and

where a =

the number of log cycles selected to define the range

The second technique uses a measure of the similarity between the ungaged site and the stations in the data base. In the technique, only the most similar stations are chosen. A subset of size m is chosen from the entire data base by selecting those stations whose basin characteristics are the most similar (or the least dissimilar) to the ungaged site. A measure of the similarity between two stations, referred to as the Euclidean dissimilarity metric, is presented by Tasker and Slade (1994). Hereafter, this metric is referred to simply as similarity. An adaptation of the similarity from Tasker and Slade (1994) was developed for this report and includes geometric distance between the ungaged site and a station. The similarity between site i and station j is calculated by the following:

(3)

,

where
d_ij = the similarity between site i and station j;
sd_{1 or 2} = standard deviations of (1) all the distances D_ij between the ungaged site and all the stations in the data base, and (2) the basin characteristic B_k in the data base;
B_ik = the log₁₀ of the basin characteristic k for site i;
B_jk = the log₁₀ of the basin characteristic k for station j; and
m = the number of basin characteristics included in the similarity measure.

Division by the standard deviation is necessary to remove the effects of disproportional units. The most suitable stations are selected on the basis of smallest values of similarity.

Computer Program

The computer program for site-specific estimation of peak-streamflow frequency is written in Fortran. The program does a step-backward GLS regression between peak-streamflow frequency and basin characteristics of the selected stations. The program also does a variety of diagnostic calculations on the regression. Step-backward regression involves iterative regressions beginning with all of the independent variables in the equation and, in successive regressions, dropping the least significant variable after each regression on the basis of p-values for the variables. Nonsignificant variables have p-values greater than about 0.10. The variables ultimately retained in the equation will have p-values less than about 0.10, which implies that there is about a 90 percent or greater probability that the regression coefficients of the retained independent variables are not zero. Therefore, the final equation contains only basin characteristics that are considered important for the ungaged site.

Drainage area is well known to be a crucial factor in the rainfall-runoff process. Testing during development of the computer program indicated that, depending on selection criteria, drainage area might have a p-value only slightly larger than the p-value of the other independent variables, particularly the climate factor. Cases occurred in which drainage area would be rejected in favor of the other independent variables. In such cases, the resulting equation commonly produced estimates with unacceptably large standard errors of prediction, although the overall equation diagnostics (such as model error and sampling error) were acceptable. The emphasis of the site-specific approach is on the estimates and associated prediction errors for the site rather than on errors associated with the equation. To remedy this situation, the program was modified to always accept drainage area as the first explanatory variable. In a test involving the same stations, an equation using drainage area and an equation using climate factor instead had similar values of model error and sampling error, but the equation using drainage area produced site estimates with smaller standard errors of prediction.

Estimates from equations using drainage area rather than climate factor, even if climate factor was slightly more statistically significant, required less extrapolation than the estimates from equations using climate factor rather than drainage area. Therefore, the inclusion of at least drainage area in the equation reduces the chances of encountering extreme extrapolation biases.

The authors believe that always including drainage area produces a more reliable equation, meaning that the equation could be applied for nearby segments of the same stream as the ungaged site. Program testing in 1995 and 1996 indicated that the 2-year 24-hour precipitation and mean annual precipitation explain little of the variation of peak-streamflow frequency; therefore, the authors decided not to include those precipitation variables in the analysis associated with computer program development. Further testing in 1998 indicated that climate factors contributed little to peak-streamflow frequency estimation, although they might be useful for other hydrologic analyses. Therefore, climate factors also were excluded from the analysis for the computer program. Finally, three variables are used: contributing drainage area, basin shape factor, and stream slope.

The computer program gives the user flexibility in choosing the number of stations and how the stations are selected for a region from which site-specific peak-streamflow frequency values are estimated. With this flexibility, however, the user's experience and judgment become more important than they were for the traditional regional regression approach. Few options are available and few user decisions are needed for applying the regional equations of Asquith and Slade (1997). However, several options and user decisions are involved in making predictions using the computer program of this report, and the choices made can result in substantially different estimates of peak-streamflow frequency. Selection of the most appropriate program options is difficult at best; however, some guidelines are provided in "Evaluation of the Site-Specific Regionalization of Peak-Streamflow Frequency" in this report.

The World Wide Web URL http://water.usgs.gov/lookup/get?wri994172 contains the report and the software required to install and run the program. The compiled program executables are available for Windows NT (txfreq.exe), Sun Solaris 2.6 (txfreq.sun), Silicon Graphics IRIX 6.4 (txfreq.sgi), or other compatible operating systems. The computer program requires no formal installation other than downloading the compressed archives (txfreq.zip for Windows NT or txfreq.tar.gz for Unix systems) and uncompressing the archives into a folder or directory on the hard disk. All the files, except the program executables, are in multiplatform-compatible ASCII format. The ASCII files are readable by text editors; warning, if modified they can also be rendered unusable by the program. The ASCII files are README, txfreq.for, cgrid.krg, rec664.txt, txfreq_info.txt, and tx664.dat. The README file documents the contents of the archives.

txfreq.for--The Fortran source code, provided for those who might wish to implement the program on other operating systems such as the freely available Linux.
cgrid.krg--Gridded values of the climate factors from Lichty and Karlinger (1990); this file is used by subroutine CFX.
rec664.txt--The concurrent or shared record-length matrix for each of the 664 stations in the data base. The concurrent record matrix is a 664 by 664 symmetrical matrix, although the file contains only the lower triangle and diagonal of the matrix. The records in rec664.txt are in the same order as in the tx664.dat file, described below.
txfreq_info.txt--The ASCII description of the regression errors and other diagnostics; its contents are written to one of the output files for the program. This feature was added to make the program output a comprehensive document suitable for archiving and report preparation.
tx664.dat--The main data-base file containing, in the following order, the USGS station number; latitude; longitude; weight factor; log₁₀- (log) transformed drainage area; mean annual precipitation; 2-year 24-hour precipitation; stream slope; basin shape factor; log-transformed 2-, 5-, 10-, 25-, 50-, and 100-year peak-streamflow frequency; and the 2-, 25-, and 100-year climate factors.

Asquith and Slade (1997) provide a complete discussion of the information contained within the tx664.dat file except for the climate factors. The cgrid.krg, rec664.txt, and tx664.dat are used in the GLS regression routines. The GLS regression routines and the basic logic of the program were provided by Gary D. Tasker (U.S. Geological Survey, written commun., 1995, 1998) and are detailed in Tasker and Stedinger (1989). Three other files, ex_full.out, ex_sum.out, and ex_sta.out are generated by the program example in this report (see "Example"). These files are included for illustration only; they are not used by the program.

Example

Suppose that an estimate of the peak-streamflow frequency is needed for the design of a bridge for a stream site located at latitude 29°58'10" and longitude 98°53'33". The basin characteristics for the stream site are a drainage area of 839 mi², a basin shape factor of 5.65, and a stream slope of 15.01 feet per mile. This "ungaged" site corresponds to station 519 in Asquith and Slade (1997) and in tx664.dat. Station 519 is Guadalupe River at Comfort, Tex. (USGS station 08167000).

The appendix contains the input and output of the program for the example. The first step is to (1) start the program by double-clicking on the txfreq.exe icon (Windows NT) or typing and entering the executable file name at the command prompt (Windows NT or Unix). (2) The program prompts for whether the user wants to see two warning messages; enter 3 to continue analysis. Next, (3) the program prompts successively for the output files containing full documentation of analysis, summary of analysis, and listing of the selected stations. For this example, enter for each prompt, (4) ex_full_test.out, (5) ex_sum_test.out, and (6) ex_sta_test.out. Examples of these files (without the `_test') are contained in the archive. (7) The program then prompts for whether analysis is needed in English or metric units. Subsequently, (8) the user inputs a site identifier (name), the latitude, the longitude, and the requested basin characteristics of the ungaged site--spaces are used to delimit the degrees, minutes, and seconds. Then (9) the program prompts for the number of stations to pool together---for the example, enter 30. (10) The program then prompts for whether just drainage area will be used for the final equations; this feature allows the user to get an estimate of peak-streamflow frequency where drainage area is the only available basin characteristic. Since the basin shape factor and stream slope are available, `n' is entered. Finally, (11) the program prompts for the method to select the most pertinent stations--for the example, choice 1 is a proximity-based search with a log drainage area range defined by ±1.5 log cycles. A summary of the results is printed to the screen. The summary contains the peak-streamflow frequency predictions and associated prediction errors and limits. Complete output of the program analysis is directed into an ASCII file named ex_full.out. At the end, the program creates the ASCII file ex_sta.out. This file lists the pooled stations along with the data used in the regression. Listed in the first and second columns are the similarity and geographic distance values for each station.

The ex_full.out file contains each step in the step-backward regression, the final equation, diagnostics and errors associated with the equation as a whole, the estimate for the ungaged site, regression diagnostics, prediction errors, and selected confidence limits. The ex_full.out file for this example is included in the archive. A summary of the site-specific approach for the example is listed in table 1.

Evaluation of Site-Specific Regionalization of Peak-Streamflow Frequency

Table 1. Summary of site-specific peak-streamflow frequency estimation for example in text
[No. of stations = 30. All four independent variables tested in regression. Stations selected on proximity and drainage area range of ±1.5 log₁₀ cycles--see text for details. ft³/s, cubic feet per second; A, contributing drainage area; SH, basin shape factor; SL, stream slope]
Recurrence interval	Estimate of peak streamflow (ft³/s)	Standard error of prediction (percent)	Square root of standard variance of prediction (log₁₀)	Square root of model error variance (log₁₀)	Final regression equation
2-year	11,500	45.6	0.1887	0.1818	Q₂ = 267.2A^0.559
5-year	34,300	27.7	0.1180	0.1097	Q₅ = 1193A^0.568SH^-0.268
10-year	56,600	24.5	0.1050	0.0956	Q₁₀ = 2187A^0.550SH^-0.260
25-year	95,500	26.1	0.1113	0.0789	Q₂₅ = 10^3.63A^0.532SH^-0.267
50-year	129,000	27.0	0.1153	0.1020	Q₅₀ = 10^6.37A^0.119SL^-1.11SH^-0.401
100-year	172,000	31.9	0.1350	0.1209	Q₁₀₀ = 10^6.96A^0.040SL^-1.29SH^-0.433

Extensive evaluation of the site-specific approach for peak-streamflow frequency regionalization was done in 1995 and 1996 using the 559 Texas stations. The site-specific approach was compared to the traditional regional approach presented for Texas by Asquith and Slade (1995, 1997). Asquith and Slade used a subset of 527 stations from the 559 Texas stations within 11 hydrologic regions to develop regional regression equations. The 527 stations form the basis of the evaluation of the site-specific approach presented here. The 100-year peak streamflow was selected for comparison because this recurrence interval is often needed for hydraulic design. Similar evaluation of other recurrence intervals should produce results similar to those presented in this report.

The three values of the 100-year peak streamflow available for this evaluation are the station, regional, and site-specific estimates, which are defined as follows: (1) the station estimate is the 100-year peak streamflow from frequency analysis of the station data (annual peak streamflow); (2) the regional estimate is the 100-year peak streamflow from the traditional approach; and, (3) the site-specific estimate is the 100-year peak streamflow from the site-specific approach.

The station streamflows for the entire data base are from Asquith and Slade (1997); whereas, the regional estimates were generated from modification of the WLS regression equations in that same report. Hundreds of regional regression equations following the regions, data, and analysis style of Asquith and Slade (1997) were generated for this evaluation in the sense that a new equation was generated for each station. These equations did not incorporate the station for which a 100-year peak-streamflow estimate was needed in the analysis, which assures that the regional streamflow for a station is truly independent of the station streamflow.

The site-specific streamflows were generated using the computer program of this report and numerous search and sample-size options. As in the regional regressions, the station for which a 100-year peak streamflow was needed was removed from the GLS regressions. For the regional streamflows, 527 stations were available; whereas, the entire data base of 664 stations was available for each site-specific streamflow estimate.

The search options of the site-specific approach that were considered are presented in table 2. Both the proximity and similarity based methods were used to select stations to regionalize. The similarity search method is based on similarity as described by equation 3. The proximity search method is based on geometric distance and drainage area range restriction (eqs. 1, 2). In addition to varying the search options, the sample size was varied. The sample size used for each option was 10, 15, 20, 25, 30, 35, and 40 stations. In total, the consideration of the 13 options (table 2), each with seven sample sizes and 527 stations, yields 47,957 values of 100-year peak streamflows. These 100-year peak streamflows provide the basis of the evaluation.

Table 2. Evaluation of site-specific peak-streamflow frequency options
[Shaded areas indicate options used for corresponding option number. a, number of log₁₀ cycles to define the range of drainage area]
Option no.	Proximity and drainage area range a=(log₁₀)			Similarity
Option no.	1	1.5	2	Distance	Area	Shape	Slope	2-year 24-hour precipitation	Mean annual precipitation
1
2
3
4
5
6
7
8
9
10
11
12
13

For this evaluation, the authors assume that the 100-year peak-streamflow estimate derived from the station data (station streamflow) represents the "true" 100-year peak streamflow at each station. Therefore, the approach (regional or site-specific) that estimates a 100-year peak discharge similar to the known station streamflow is considered the more favorable approach. The residual or error for each station is defined as the log station streamflow in cubic feet per second subtracted from the log estimated streamflow for that station. Thus the error units are in log cubic feet per second (log-ft³/s). A weighted root-mean-square error (RMSE) was used to represent the error for each regionalization procedure; the weights are based on the weight factors (equivalent years of record) at each station (Asquith and Slade, 1997). The RMSE for each of the site-specific options tested are presented in table 3. The RMSE for the regional streamflows (based on the 11 regions in Asquith and Slade, 1997) is 0.29 log-ft³/s. If all 11 regions of Texas were combined and treated as a single region and streamflows are computed from WLS equations as in Asquith and Slade (1997), the RMSE is 0.34 log-ft³/s. Comparison of the regional RMSE (0.29) to the entire State RMSE (0.34) indicates that the delineation of the State into hydrologically similar regions results in more reliable equations than those treating the state as a single region.

The RMSE of most site-specific options (table 3) are remarkably similar to the RMSE of the regional approach (0.29 log-ft³/s). Of the proximity-based options (1-3), option 2 results in the smallest overall RMSEs. Additionally, option 2 logically might be preferred over options 1 and 3. Option 1 (proximity, drainage area range defined by ±1 log cycle of drainage area) probably produces a drainage area range too small for a robust GLS regression, thus increasing the potential for large extrapolation errors as indicated by the greater RMSE. Option 3 (proximity, 2 log cycles of drainage area) probably produces too large a range in drainage area. Asquith and Slade (1997, p. 8-10) shows that the relation between log peak streamflow and log drainage area is curvilinear for ranges of drainage area exceeding about 2 log cycles. The increasing nonlinearity decreases the precision of linear regression by increasing extrapolation bias. Of the similarity options tested, the options that incorporate distance (4-9) produce similar RMSEs. Of the similarity options without distance (10-13), options 11-13 can be rejected because of relatively large errors. Conceptually, options 11-13 would be expected to produce larger errors because, without considering distance, stations on opposite sides of the State could be selected although their flood characteristics probably would be dissimilar. For sample sizes exceeding about 25 to 30 stations the RMSE stabilizes. On the basis of RMSE, options 2, 3, and 4-9 are judged favorable for peak-streamflow frequency analysis. To simplify the evaluation, only options 2, 4, and 9 were considered for further analysis.

In addition to RMSEs, the mean of the five largest station errors (M5LE) for each site-specific option and sample size were computed (table 4). The M5LE represents a measure of the magnitude of the largest errors. Large M5LEs indicate that a few predictions are very different from the station streamflows. Therefore, an option is considered favorable if it does not produce large M5LEs. The M5LE for the regional streamflows based on the 11 regions in Asquith and Slade (1997) is 1.50 log-ft³/s.

Table 3. Weighted root-mean-square errors (RMSE) for options of site-specific peak-streamflow frequency
[Option numbers correspond to those in table 2; errors in units of log₁₀ cubic feet per second.]
No. of stations	Option no.
No. of stations	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)	(13)
10	0.49	0.34	0.37	0.33	0.34	0.35	0.38	0.37	0.35	0.35	0.47	0.49	0.38
15	0.45	0.32	0.32	0.30	0.33	0.30	0.33	0.32	0.31	0.32	0.40	0.40	0.34
20	0.43	0.31	0.31	0.30	0.31	0.30	0.30	0.30	0.29	0.31	0.38	0.38	0.33
25	0.41	0.30	0.31	0.30	0.30	0.29	0.31	0.30	0.29	0.30	0.36	0.37	0.32
30	0.41	0.30	0.31	0.30	0.30	0.30	0.30	0.30	0.29	0.29	0.35	0.36	0.31
35	0.41	0.30	0.30	0.30	0.30	0.29	0.30	0.30	0.29	0.30	0.35	0.35	0.30
40	0.41	0.30	0.30	0.29	0.30	0.29	0.30	0.30	0.29	0.30	0.35	0.35	0.30

The M5LEs listed in table 4 show characteristics similar to the RMSEs in table 3--the M5LEs seem to be smallest for sample sizes of 25 to 30 stations. For 30 stations, the mean of the M5LEs for the three options considered the most favorable from RMSE analysis (2, 4, and 9) is 1.31 log-ft³/s. This value compares to the M5LE of 1.50 log-ft³/s for the WLS regression equations of the traditional regional approach and indicates that the site-specific procedure is less likely to produce unreasonable peak streamflows.

Table 4. Mean of the five largest station errors (M5LE) for options of site-specific peak-streamflow frequency
[Option numbers correspond to those in table 2; errors in units of log₁₀ cubic feet per second. Shaded errors represent most "favorable" options.]
No. of stations	Option no.
No. of stations	(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)	(13)
10	2.15	1.70	2.26	1.48	1.55	2.37	2.07	1.91	1.89	1.82	3.30	2.50	1.82
15	1.53	1.62	1.64	1.43	1.53	1.56	1.61	1.67	1.27	1.33	2.53	1.74	1.48
20	1.33	1.45	1.50	1.31	1.42	1.47	1.39	1.24	1.39	1.25	1.79	1.39	1.42
25	1.29	1.27	1.38	1.29	1.30	1.34	1.39	1.27	1.31	1.31	1.90	1.31	1.37
30	1.28	1.29	1.34	1.29	1.28	1.30	1.36	1.29	1.35	1.26	1.43	1.30	1.30
35	1.28	1.28	1.34	1.31	1.20	1.36	1.35	1.30	1.40	1.26	1.39	1.32	1.27
40	1.30	1.30	1.30	1.31	1.33	1.36	1.38	1.32	1.41	1.23	1.39	1.37	1.22

Table 5. Root-mean-square errors between peak-streamflow frequency analysis using the traditional regional regression approach and selected options of the site-specific peak-streamflow frequency approach
[The traditional regional regression approach follows Asquith and Slade (1997) and uses weighted least-squares; the site-specific approach uses generalized least-squares. The analysis was based on regression using 30 stations; a, number of log₁₀ cycles to define the range of drainage area]
Approach and applicable options	Traditional regional regression approach	Site-specific approach option (2)	Site-specific approach option (4)	Site-specific approach option (9)
Traditional regional regression approach	--	0.155	0.188	0.212
Site-specific approach: Proximity and drainage area range (a = 1.5) Option 2 from table 2	0.155	--	0.165	0.211
Site-specific approach: Similarity with distance and area Option 4 from table 2	0.188	0.165	--	0.205
Site-specific approach: Similarity all 5 independent variables Option 9 from table 2	0.212	0.211	0.205	--

The inter-model RMSEs between the regional equation estimates and the site-specific estimates using options 2, 4, and 9 (each with 30 stations) were computed (table 5). All the inter-model RMSEs are less than those for comparable options in table 3, which indicates that each of the four approaches produces estimates with average errors smaller than the inherent average error in the streamflows. Thus, all three methods for estimating the 100-year peak streamflow (from station streamflows, regional regression, or site-specific regression) produce average estimates similar to one another. However, the errors shown in table 5 are sufficiently large to conclude that the methods can produce appreciably different estimates.

No single option is preferable for selecting stations to include in the regression for the site-specific approach. However, several of the options (1, 11, 12, and 13) seem unfavorable from the RMSEs listed in table 3. Additionally, the RMSE stabilizes for all options for sample sizes of at least 30. The 30-station minimum, however, is not a strict rule; the program provides the user a choice of between 20 and 50 stations. This choice permits the user of the program to select fewer stations for areas of the State with a low station density and more stations for areas of the State with a relatively high station density. The evaluation did not consider sample sizes larger than 40. In many areas in the northwestern, western, and southern parts of Texas, the density of stations is considerably less than the density in most of the central and eastern parts. Comparison of inter-model RMSEs indicates that option 2 (proximity, drainage area range defined by ±1.5 log cycles of drainage area) produces estimates more like those of the regional equations in Asquith and Slade (1997)--this is not surprising because both option 2 and the regional equations heavily rely on geographic distance and drainage area to select the stations. Because an optimum option for selecting the stations is not apparent, the program provides three different selection criteria--one based on the 1.5 log cycle proximity criteria and two based on similarity.

SUMMARY

The USGS, in cooperation with the Texas Department of Transportation, has developed a computer program to estimate peak-streamflow frequency for ungaged sites in natural basins in Texas. Peak-streamflow frequency refers to the peak streamflows for recurrence intervals of 2, 5, 10, 25, 50, and 100 years. Peak-streamflow frequency estimates are needed by planners, managers, and design engineers for flood-plain management; for objective assessment of flood risk; for cost-effective design of roads and bridges; and for the design of culverts, dams, levees, and other flood-control structures.

The data base that the program uses contains the peak-streamflow frequency, selected basin characteristics, and ancillary information for each of 559 streamflow-gaging stations in Texas as well as for each of 105 nearby stations in Arkansas, Louisiana, New Mexico, and Oklahoma. All of the stations included in this report have at least 8 years of annual peak-streamflow data from natural basins. A natural basin is defined as a basin with less than 10 percent impervious cover, less than 10 percent of its drainage area controlled by reservoirs, and no other human-related factors such as land use changes that would measurably affect peak streamflow.

Hydrologic regression establishes the statistical relation between peak-streamflow frequency and selected basin characteristics. In hydrologic regression, the 2-, 5-, 10-, 25-, 50-, and 100-year peak streamflows are used as dependent variables; and selected basin characteristics are used as independent variables. The program uses logarithmic transformations of the dependent and independent variables to increase the linearity between the dependent and independent variables. The program estimates peak-streamflow frequency using a site-specific approach and a multivariate generalized least-squares linear regression. A site-specific approach differs from a traditional regional regression approach by developing unique equations to estimate peak-streamflow frequency specifically for the ungaged site.

The stations included in the regression are selected using an informal cluster analysis by comparing basin characteristics of the ungaged site to the basin characteristics of all the stations in the data base. The program provides several choices for selecting the stations. Using cluster analysis to select the stations ensures that the stations included in the regression will have characteristics similar to those at the ungaged site and, therefore, should provide the basis for improved peak-streamflow frequency estimation. The basin characteristics of the ungaged site will, if data are available, lie within the multidimensional explanatory-variable cloud defined by the basin characteristics of the selected stations. Several important consequences result when the basin characteristics lie within and near the middle of the multidimensional explanatory variable cloud. Because estimates of peak streamflows from the regression are made near the center of the space of the explanatory-variables, extrapolation biases in the relation between peak-streamflow frequency and basin characteristics are expected to be reduced. Additionally, violations of the linearity assumption for the regression are less likely to affect the linear regression. Other advantages of the site-specific approach are that no fixed geographic boundaries define a region and that the site-specific approach can be easily updated in the future.

Evaluation of the site-specific approach suggests that no single option is preferable for selecting stations to include in the regression. The evaluation indicates that a sample size of at least 30 stations be used, however, the 30-station threshold is not a strict rule. Hence, the computer program provides the user with a choice of selecting 20 to 50 stations. Because an optimum option for selecting the stations is not apparent, the computer program provides three different selection criteria.

SELECTED REFERENCES

Asquith, W.H., 1998a, Depth-duration frequency of precipitation for Texas: U.S. Geological Survey Water-Resources Investigations Report 98-4044, 107 p., 3 app.

______1998b, Peak-flow frequency for tributaries of the Colorado River downstream of Austin, Texas: U.S. Geological Survey Water-Resources Investigations Report 98-4015, 19 p., 1 app.

Asquith, W.H., and Slade, R.M., Jr., 1995, Documented and potential extreme peak discharges and relation between potential extreme peak streamflows and probable maximum flood peak discharges in Texas: U.S. Geological Survey Water-Resources Investigations Report 95-4249, 58 p.

______1997, Regional equations for estimation of peak-streamflow frequency for natural basins in Texas: U.S. Geological Survey Water-Resources Investigations Report 96-4307, 68 p.

Asquith, W.H., Slade, R.M., Jr., and Lanning-Rush, Jennifer, 1996, Peak-flow frequency and extreme flood potential for streams in the vicinity of the Highland Lakes, central Texas: U.S. Geological Survey Water-Resources Investigations Report 96-4072, 1 sheet.

Carr, J.T., 1967, The climate and physiography of Texas: Texas Water Development Board Report 53, 27 p.

Cook, R.D., and Weisberg, Sanford, 1982, Residuals and influence in regression: New York, Chapman and Hall, 230 p.

Cunnane, C., 1989, Statistical distributions for flood frequency analysis: Geneva, World Meteorological Organization Operational Hydrology Report No. 33, 73 p.

Helsel, D.R., and Hirsch, R.M., 1992, Studies in environmental science 49--Statistical methods in water resources: New York, Elsevier, 522 p.

Hershfield, D.M., 1962, Rainfall frequency atlas of the United States for durations from 30 minutes to 24 hours and return periods from 1 to 100 years: U.S. Weather Bureau Technical Paper 40, 61 p.

Hodge, S.A., and Tasker, G.D., 1995, Magnitude and frequency of floods in Arkansas: U.S. Geological Survey Water-Resources Investigations Report 95-4224, 52 p., 1 diskette.

Hosking, J.R.M., and Wallis, J.R., 1997, Regional frequency analysis--An approach based on L-moments: Cambridge University Press, 224 p.

Larkin, T.J., and Bomar, G.W., 1983, Climatic atlas of Texas: Texas Department of Water Resources LP 192, 151 p.

Lichty, R.W., and Karlinger, M.R., 1990, Climate factor for small-basin flood frequency: American Water Resources Association, Water Resources Bulletin, v. 26, no. 4, p. 577-586.

Raines, T.H., 1998, Peak-discharge frequency and potential extreme peak discharge for natural streams in the Brazos River Basin, Texas, U.S. Geological Survey Water-Resources Investigations Report 98-4178, 42 p.

Schroeder, E.E., and Massey, B.C., 1977, Technique for estimating the magnitude and frequency of floods in Texas: U.S. Geological Survey Water-Resources Investigations Report 77-110, 22 p.

Slade, R.M., Jr., and Asquith, W.H., 1996, Peak data for U.S. Geological Survey gaging stations, Texas network; and computer program to estimate peak-streamflow frequency: U.S. Geological Survey Open-File Report 96-148, 57 p., 1 diskette.

Slade, R.M., Jr., Asquith, W.H., and Tasker, G.D., 1995, Multiple-regression equations to estimate peak-flow frequency for streams in Hays County, Texas: U.S. Geological Survey Water-Resources Investigations Report 95-4019, 1 sheet.

Slade, R.M., Jr., and Smith, Peter, 1993, Use of a site-specific regional-analysis system to estimate peak-streamflow frequency for undeveloped basins in Texas, in Texas Sections Joint Spring Meeting and Texas Hydrology Roundup, Austin, Texas, 1993, Proceedings: Austin, Tex., American Institute of Hydrology and American Water Resources Association [variously paged].

Stedinger, J.R., and Tasker, G.D., 1985, Regional hydrologic analysis--Ordinary, weighted, and generalized least squares compared: Water Resources Research, v. 21, no. 9, p. 1,421-1,432.

Stedinger, J.R., Vogel, R.M., and Foufoula-Georgiou, Efi, 1993, Frequency analysis of extreme events, in Maidment, D.A., ed., Handbook of applied hydrology, chap. 18: New York, McGraw-Hill, p. 18.1-66.

Tasker, G.D., and Slade, R.M., Jr., 1994, An interactive regional regression approach to estimating flood quantiles, in Fontane, D.G. and Tuvel, H.N., eds., Water policy and management--Solving the problems: Proceedings of the 21st Annual Conference of the Water Resources Planning and Management Division, American Society of Civil Engineers, p. 782-875.

Tasker, G.D., and Stedinger, J.R., 1989, An operational GLS model for hydrologic regression: Journal of Hydrology, v. III, p. 361-375.

Thomas, B.E., Hjalmarson, H.W., and Waltemeyer, S.D., 1994, Methods of estimating magnitude and frequency of floods in the southwestern United States: U.S. Geological Survey Open-File Report 93-419, 211 p.

Thomas, W.O., and Landers, M.N., 1989, Regionalization of flood characteristics, in National Conference on Hydraulic Engineering '89, New Orleans, Aug. 14--18, 1989, Proceedings: American Society of Civil Engineers, p. 372-378.

U.S. Geological Survey, 1986, National water summary 1985--Hydrologic events and surface-water resources: U.S. Geological Survey Water-Supply Paper 2300, 506 p.

Footnotes

(1)A water year is the 12-month period October 1 through September 30, designated by the calendar year in which it ends.

For more information, contact Will Asquith

Accessibility FOIA Privacy Policies and Notices
U.S. Department of the Interior, U.S. Geological Survey Persistent URL: http://pubsdata.usgs.gov/pubs/wri/wri994172/ Page Contact Information: USGS Publishing Network Last modified: Wednesday, December 07 2016, 01:08:45 PM

Site-Specific Estimation of Peak-Streamflow Frequency Using Generalized Least-Squares Regression for Natural Basins in Texas

Water-Resources Investigations Report 99-4172 By William H. Asquith and Raymond M. Slade, Jr.

Contents

Tables

Computer Program

Footnotes

Water-Resources Investigations Report 99-4172

By William H. Asquith and Raymond M. Slade, Jr.