Censored and uncensored generalized additive models (GAMs) were developed using streamflow data from 941 U.S. Geological Survey streamflow-gaging stations (streamgages) to predict decadal statistics of daily streamflow for streams draining to the Gulf of Mexico. The modeled decadal statistics comprise no-flow fractions and L-moments of logarithms of nonzero streamflow for six decades (1950–2009). These statistics represent metrics of decadal flow-duration curves (dFDCs) derived from about 10 million daily mean streamflows. The L-moments comprise the mean, coefficient of L-variation, and the third through fifth L-moment ratios. The GAMs were fit to the statistics from 941 streamgages and 2,750 streamgage-decades by using watershed properties such as basin area and slope, decadal precipitation and temperature, and decadal values of flood storage and urban development percentages. The GAMs then estimated decadal statistics for 9,220 prediction locations (stream reaches) coincident with outlets of level-12 hydrologic unit codes. Both entire dataset (whole model) and leave-one-watershed-out model results are reported. No-flow fractions are censored data, and Tobit extensions to GAMs were used to model ephemeral streamflow conditions. Conversely, uncensored GAMs were used for estimation of the L-moments. The GAMs are shown, by coverage probabilities, to construct reliable 95-percent prediction limits. An example shows how no-flow fractions and L-moments may be used to approximate dFDCs by using selected probability distributions (mathematical formulas) including the asymmetric exponential power, generalized normal, and kappa distributions.

Asquith, W.H., Knight, R.R., and Crowley-Ornelas, E.R., 2020, RESTORE/fdclmrpplo—Source code for estimation of L-moments and percent no-flow conditions for decadal flow-duration curves and estimation at level-12 hydrologic unit codes: U.S. Geological Survey software release, version 1.0.2,

Robinson, A.L., Asquith, W.H., Crowley-Ornelas, E., and Knight, R.R., 2021, Estimated quantiles of decadal flow-duration curves using selected probability distributions fit to no-flow fractions and L-moments predicted for streamgages and for pour points of level-12 hydrologic unit codes in the southeastern United States, 1950–2010: U.S. Geological Survey data release,

Robinson, A.L., Asquith, W.H., and Knight, R.R., 2019, Summary of decadal no-flow fractions and decadal L-moments of nonzero streamflow flow-duration curves for National Hydrography Dataset, version 2 catchments in the southeastern United States, 1950–2010: U.S. Geological Survey data release,

Robinson, A.L., Worland, S.C., and Rodgers, K.D., 2020, Estimated daily mean streamflows for HUC12 pour points in the southeastern United States, 1950–2009: U.S. Geological Survey data release,

U.S. Geological Survey, 2019, USGS water data for the Nation: U.S. Geological Survey National Water Information System database,

For more information on the USGS—the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment—visit

For an overview of USGS information products, including maps, imagery, and publications, visit

Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner.

This research was funded by the Gulf Coast Ecosystem Restoration Council (RESTORE Council). The work by Scott Worland was done while serving as a Hydrologist with the U.S. Geological Survey.

Multiply | By | To obtain |

Length | ||
---|---|---|

meter (m) | 3.281 | foot (ft) |

kilometer (km) | 0.6214 | mile (mi) |

meter (m) | 1.094 | yard (yd) |

Area | ||

square kilometer (km^{2}) |
247.1 | acre |

square kilometer (km^{2}) |
0.3861 | square mile (mi^{2}) |

Flow rate | ||

cubic meter per second (m^{3}/s) |
70.07 | acre-foot per day (acre-ft/d) |

cubic meter per second (m^{3}/s) |
35.31 | cubic foot per second (ft^{3}/s) |

cubic meter per second (m^{3}/s) |
22.83 | million gallons per day (Mgal/d) |

Horizontal coordinate information is referenced to the North American Datum of 1983 (NAD 83).

Asymmetric exponential power distribution

Censoring-extended generalized additive model

Drainage-area ratio

Decadal flow-duration curve

U.S. Environmental Protection Agency

Flow-duration curve

Generalized additive model

Generalized logistic distribution

Generalized normal distribution

Kappa distribution

Leave-one-out

Multilinear regression

National Hydrography Dataset Plus

Nash-Sutcliffe model efficiency coefficient

National Water Information System

Resources and Ecosystems Sustainability, Tourist Opportunities, and Revived Economies of the Gulf Coast States Act

Root mean square error

Support vector machine

U.S. Geological Survey

The U.S. Geological Survey (USGS) and the U.S. Environmental Protection Agency (EPA) are collaborating on a project, in cooperation with the

Hydrologic alteration of FDCs have been documented in more than 86 percent of monitored streams nationally including much of the Gulf Coast region (

An assessment of temporal and spatial trends in streamflow delivery to Gulf Coast estuaries can improve the understanding of potential drivers of change in estuarine health. Estimated streamflows as expressed by estimated L-moments and estimated FDCs at unmonitored (ungaged) stream locations for decadal time scales are critical to trend assessments. Estimated streamflows for the RESTORE Act funded Baseline Flow project are based in part on combinations of either regional regression (this study) or machine-learning (

In contrast to the aforementioned RESTORE Act project publications, the current (2022) study is focused on FDC estimation through the estimation of L-moments of the nonzero streamflow with an innovative method for the handling of no-flow fractions. The L-moments can be used to fit probability distributions, and such fits in turn provide FDC quantiles. Streamflow estimation at ungaged locations is based on the statistical coupling between observed FDC statistics and myriad potential predictor variables or watershed properties. The properties considered include physical, physiographic, and land-use categories as well as generalized hydrometeorologic, hydrogeologic, and water-resources development (dams and reservoirs) characteristics. The FDC statistics of primary description in this report are the no-flow (zero-flow) fractions and L-moments (

The purposes of this report are (1) to describe results of generalized additive models (GAMS) for FDC quantile estimation through GAM regionalization of L-moments and fitting selected probability distributions to the estimated L-moments and (2) to accompany supporting algorithms and computations documented in and sourcing from the RESTORE/fdclmrpplo software repository (

The study area (^{2}) and is coincident at the finest scale to HUC12 boundaries of the Watershed Boundary Dataset (USGS, 2018). The map (

Study area, stream network (U.S. Environmental Protection Agency, 2018a), locations of 956 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) (

The RESTORE/fdclmrpplo software storage location (repository) (

The data on the maps shown in this report are limited to the 2000s. The maps have analogs within the RESTORE/fdclmrpplo repository through comprehensive map-based visualization as algorithm diagnostics for the six decades of streamgage-specific estimates and predictions at HUC12s (

FDCs provide an efficient means of describing the distribution of streamflow at a site and are commonly used as a means of depicting central tendency, variation (interquartile range), extremely low- or high-flow regimes, and depending on the site, the percentage of no-flow days. An FDC depicts the probability with which a given streamflow for a study period will be either greater than or less than a given value. For example, if a decade of flows for a site has a 30th percentile of 100 cubic meters per second (m^{3}/s), then 30 percent of the decade the streamflow volume will be less than or equal to 100 m^{3}/s, while 70 percent of the decade, streamflow will be equal to or greater than 100 m^{3}/s. Myriad complex and interrelated hydrometeorologic, hydrogeologic, and anthropogenic processes influence streamflow (

FDCs have many applications for researchers and resource managers and “are popular tools to estimate the amount of water available in a [watershed]” (

For this study, there is an explicit distinction between the percentages of no-flow and the remainder (nonzero streamflow) of dFDCs. No-flow conditions, synonymous with zero-flow conditions and indicative of ephemeral or intermittent streams, are inherently more problematic to regionalize than overall mean annual streamflow because very low (including zero) streamflows tend to be greatly influenced by local hydrologic conditions in places lacking abundant rainfall and by basin area. The occurrence of no-flow days varies greatly for streams across the study area; this variation is attributable in part to the very humid (east) to semi-arid (west) breadth of climate across the study area.

The no-flow continuum ranges through (1) brief occurrences of zero flow during flow reversals in low-sloped or near-coastal streamgages, (2) continuous presence of water in the stream channel that is not passing downstream through the control cross section at the streamgage, (3) disconnected pools of water, and (4) long stretches of dry channels. ^{3}/s) (native units), and days with mean flow less than (<) 0.005 ft^{3}/s are reported as a zero.

Empirical FDCs can be constructed with data having any number of no-flow observations, sometimes greater than 90 percent, depending on the physical setting of the basin. Logarithmic transformation is common; however, such transformation is problematic when the empirical FDC contains no-flows. No-flow fractions are important to consider because stream locations with large fractions of zero flows were prevalent in many parts of the study area. Not rigorously accounting for no-flows could potentially bias results for basins having a likelihood of no-flow conditions and negatively affect statistical syntheses for water managers. It is important to note that not all regional studies of FDCs account for no-flow conditions. For example,

In the study by

The statistical methods described herein address the potential bias when including nonzero flow through censored statistical techniques. Basic background information on censored data and working with such data can be found in

For the current (2022) study, no-flows are viewed as a type of censored data for two reasons: (1) such streamflows are arguably censored values because each is technically <0.01 ft^{3}/s (<2.83^{3}/s, in scientific notation) though the USGS National Water Information System (NWIS) database reports zero (USGS, 2019); and (2) conceptually the idea of “zero flow” itself represents a continuum of hydrologic conditions. One statistical approach for censored-data analysis is “survival regression” (

Locations for prediction were identified by using a combination of the National Hydrography Dataset Plus (NHDPlus) (EPA, 2018a) and the Watershed Boundary Dataset (USGS, 2018). Specific locations for estimating FDCs were identified by using location identifiers (COMIDs) of stream reaches coincident with the pour points of HUC12 watersheds, which have a median accumulated drainage area of about 200 km^{2}. The study area has approximately 10,000 HUC12s, but estimation is restricted to about 9,000 prediction locations. A COMID is a unique identification number of a stream reach, and ideally each HUC12 pour point has a unique COMID. Some COMIDs are associated with the pour points of multiple HUC12s, however, and HUC12-addressing schemes occasionally have been revised (

Additional data screening was made for the 956 streamgages, and attendant disclosures specific for this study are listed in RESTORE/fdclmrpplo (

The 941 streamgages and 2,750 decades of streamflow data were used with the watershed properties and other characteristics to construct the final GAMs presented by the RESTORE/fdclmrpplo software repository (

Relations between observed no-flow fractions and potential covariates are complex, which is caused in part by nonlinear processes functioning at various scales and interactions with physical factors such as rainfall patterns and other atmospheric influences, land-use and streamflow regulation patterns, and other difficult-to-identify factors. GAMs (

A GAM uses relations between a response variable and an additive combination of various parametric terms and smooth terms (smooth functions) (_{i} = X_{i}θ_{i}_{i}_{i}_{i}

is the response variable (base_{10} logarithmic transformation) for the

is a model vector including an optional intercept for strictly parametric and suitably transformed predictor variables;

is a parameter matrix;

is a smooth function of the predictor variable _{i}_{i}

represent additional smooth terms as needed;

is the smooth on the easting (_{i}_{i}

are errors.

The _{i}θ

Whole-sample modeling has a potential for underestimating prediction uncertainties because model performance is assessed on its fit to the “training data” as opposed to performance on independent data. An elementary approach to assess model performance is out-of-sample or leave-one-out (LOO) testing. This is a form of cross validation, and specifically for this study, leave-one-watershed-out was used. Watersheds (the streamgage without regard to the number of decades available) in this study are treated as the fundamental sampling unit; therefore, watersheds were selected one by one for removal, the same GAM form was fit, and predictions for the streamgage and its decades were recorded.

For flexible modeling techniques, such as GAMs with many smooths, there usually are no simple formulas to compute the expected LOO fit. Practitioners commonly use cross validation to assess model performance on independent data by using rigorous numerical computation in place of theoretical analysis; however, GAMs are in a class of modeling technique similar to regression for which some theoretical analysis is applicable (

Given intent to provide estimates for the 9,220 prediction locations in the study area, documentation of prediction limits and coverage probabilities of the limits is important. Highly technical disclosures regarding the coverage probabilities are reported herein and 95-percent prediction limits for the streamgage-decade records and the prediction locations are reported in

The algorithms by

The computed prediction limits of the flowtime from the censored GAM of the no-flow fraction (cGAM-PPLO) and later for other GAMs of the L-moments were based on the 0.05 probability (95-percent limits) of the quantile of the t-distribution (±_{α/2,d}_{α}_{,d}

Coverage probabilities are the proportions that a computed interval contains the true value of interest. For this study and for whole model and LOO model results, coverage probabilities were computed by using the 95-percent prediction limits previously described. If general assumptions about the computation of a prediction intervals are met, then the coverage probability will equal the definition of the computed prediction limit (95 percent). The coverage probabilities are compared to the estimated 95-prediction limits as part of the discussion of model diagnostics.

The decadal no-flow fractions were conceptualized as product lifetimes and hence as a survival analysis problem. For this study, the days of nonzero streamflow (“flowtimes”) are the response variable in the model and are right-tail censored. The variable _{i}

The censored GAM for this study (

Summary list of output from mgcv::gam(…) (

For development of a given GAM including flowtimes or other statistics, decisions are required including which variables to use, which variables are to be parametric or smoothed, and which smoothing parameter settings to use. The decisions are based not only on performance for no-flow regionalization but also for regionalization of the L-moments of the nonzero streamflow FDC. When justified, structural similarity among all the models described herein is desirable to achieve some qualitative consistency in the ensemble of predictions.

The authors consider it useful to avoid smoothing terms when a parametric term can be used to ease interpretive burden. In an effort to avoid smoothing when possible, the properties measured in percentiles (

High percentages of development (fraction urbanization,

The decade coefficients generally have probability values (

The bedrock permeability (bedperm) (_{10}(days) for the latter three classes than for sandstone.

Bedrock permeability classes at 9,220 prediction locations (stream reaches) and locations of 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during 1950–2009. The classes for the majority of the watershed are acc_bedperm_1, not a principal aquifer; acc_bedperm_2, sandstone; acc_bedperm_3, semiconsolidated sand; acc_bedperm_5, sandstone and carbonate rocks; acc_bedperm_6, unconsolidated sand and gravel; and acc_bedperm_7, carbonate rock.

An effort was made to select parallel variables for the different GAMs because it is important to generally have structurally similar individual GAMs per L-moment of nonzero streamflow. This helps foster L-moment predictions for each prediction location to be meaningful as a set for later probability distribution fitting. Of the non-censored L-moments, the mean nonzero streamflow was treated first and built with similarity to the cGAM-PPLO, but the bedrock permeability and grassland variables were no longer significant. The model (

Summary list of output from mgcv::gam(…) (

Whole- and LOO-model estimates by the GAM-L1 for the decadal mean nonzero streamflows by streamgage-decade records are available in the data release by

The Duan retransformation bias correction (

A summary of the GAM-T2 model is shown in

Summary list of output from mgcv::gam(…) (

The addition of flood storage (

Flood storage values for the 2000s at 9,220 prediction locations (stream reaches) and locations of 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during the 1950–2009.

A summary of the GAM-T3 model is shown in

Summary list of output from mgcv::gam(…) (

A summary of the GAM-T4 model is shown in

Summary list of output from mgcv::gam(…) (

A summary of the GAM-T5 model is shown in

Summary list of output from mgcv::gam(…) (

Selected model diagnostics are listed in

[NSE, Nash-Sutcliffe model efficiency coefficient; RMSE, root mean square error; LOO, leave-one-watershed-out, cross validation; cGAM, censored generalized additive model; GAM, uncensored generalized additive model; L1, mean of nonzero part of decadal flow-duration curve (dFDCs); T2, dimensionless coefficient of L-variation (second L-moment ratio) of decadal FDC; T3, dimensionless L-skew (third L-moment ratio) of decadal FDC; T4, dimensionless L-kurtosis (fourth L-moment ratio) of decadal FDC; T5, dimensionless fifth L-moment ratio of decadal FDC; PPLO, probability or fraction of decadal no-flow; Note, modeling based on 2,750 streamgage-decade records]

Statistical model | Whole-model NSE | Whole-model RMSE | LOO-model RMSE | Whole-model 95-percent coverage probabilities | LOO-model 95-percent coverage probabilities |

GAM-L1^{a} |
0.975 | 0.129 | 0.137 | 0.945 | 0.936 |

GAM-T2 | 0.661 | 0.087 | 0.092 | 0.958 | 0.947 |

GAM-T3 | 0.693 | 0.081 | 0.086 | 0.958 | 0.948 |

GAM-T4 | 0.730 | 0.087 | 0.092 | 0.947 | 0.939 |

GAM-T5 | 0.732 | 0.085 | 0.090 | 0.946 | 0.934 |

cGAM-PPLO^{b} |
0.303 | 0.225 | 0.226 | 0.944 | 0.942 |

Additional diagnostics for the cGAM-PPLO model | |||||
---|---|---|---|---|---|

cGAM-PPLO | Fraction of whole-model absolute errors equal to zero = 0.698 | ||||

cGAM-PPLO | Fraction of whole-model absolute errors less than 0.02 = 0.783 | ||||

cGAM-PPLO | Fraction of whole-model absolute errors less than 0.05 = 0.844 | ||||

cGAM-PPLO | Fraction of whole-model absolute errors less than 0.10 = 0.905 | ||||

cGAM-PPLO | Fraction of LOO-model absolute errors equal to zero = 0.693 | ||||

cGAM-PPLO | Fraction of LOO-model absolute errors less than 0.02 = 0.779 | ||||

cGAM-PPLO | Fraction of LOO-model absolute errors less than 0.05 = 0.836 | ||||

cGAM-PPLO | Fraction of LOO-model absolute errors less than 0.10 = 0.893 |

The NSE and RMSE are computed from base_{10} logarithms of cubic meters per second (GAM-L1 model), and all other L-moment models (GAM-T2, T3, T4, and T5) listed are untransformed.

The NSE and RMSE are computed from base_{10} logarithms of flowtime, where a decade with zero no-flows (a perennial decade) would have a flowtime of about log_{10}(3,653 days) = 2.562. The reported statistics, however, are misleading because of the extreme censored nature of the data. The 95-percent confidence limit coverage probabilities also are computed in the flowtime domain.

The NSE can range from −∞ to 1. An NSE of 1 indicates a perfect fit between simulated and measured data. An NSE of 0 indicates that the model predictions are only as accurate as the mean of the measured data, and an NSE of less than 0 indicates that the mean of the measured data is a better predictor than the model. RMSEs succinctly measure accuracy and retain the units of the response data on which the models were constructed. RMSE values are never negative, and it is unclear how comparable RMSEs are across the dimensionless models (GAM-T2 through GAM-T5).

The cGAM and GAMs also were evaluated by using assessment of their prediction-limit coverage probabilities. cGAM performance was further evaluated by using (1) assessment restricted to just those predictions of no-flow, (2) assessment of counts of perennial and ephemeral predictions, and (3) independent check using a support vector machine (SVM) prediction method. SVMs are a type of machine learning with completely different foundational mathematics than GAMs (

Coverage probabilities are a diagnostic as to whether the GAM algorithms correctly estimate their respective prediction errors. The coverage probabilities for the cGAM-PPLO model are listed in

There are other ways to describe cGAM-PPLO performance. Of the 2,750 streamgage-decade records, about 27.2 percent have at least one no-flow day, which can be compared to the whole-model estimate of 18.1 percent and the LOO-model estimate of 18.0 percent. The mean no-flow fraction of 2,750 records is 0.033; the whole-model mean is 0.025, and LOO-model mean is 0.025. These differences imply that cGAM-PPLO slightly underestimates no-flow prevalence among stations and on average. A binary diagnostic is the number of correct decisions (perennial flow observed and predicted or ephemeral flow observed and predicted). The whole model is correct 84.9 percent of the time, and the LOO model is correct 83.9 percent of the time, compared to 72.8 percent of streamgage-decade records with perennial flow.

The estimates of the decadal no-flow fractions for the 2000s for the prediction locations are shown in

Estimated no-flow fractions (probabilities) for the 2000s at 9,220 prediction locations (stream reaches) and at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during 1950–2009.

A preponderance of perennial streamflow occurs and is predicted in the eastern half of the study area (

Many streamgage-decade records for Arkansas, northwest Louisiana, Oklahoma, and Texas show perennial or near-perennial flow. Throughout this region, however, many prediction locations have no-flow fractions greater than about 0.2. This could be a basis for an assertion that streamgages tend to operate on streams that mostly flow in contrast to the stream reaches expected to be at no-flow conditions much of the time. The assertion would represent a type of bias in the operational footprint of the streamgage network in western parts of the study area; rigorous testing, however, could be formidable.

The estimates of the decadal mean nonzero (L1) streamflow for the 2000s at the prediction locations are shown in

Log-transformed estimated mean nonzero streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during 1950–2009.

The L1 estimates (

The interdecade differences of GAM-L1 results (values in

The estimates of the decadal coefficient of L-variation (T2) values for the 2000s are shown in

Estimated coefficients of L-variation of nonzero streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey streamflow-gaging stations (streamgages) with at least one complete decade of record during the 1950–2009.

The T2 estimates (

The estimates of the decadal L-skew (T3) values for the 2000s are shown in

Estimated values of L-skew of nonzero streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during the 1950–2009.

The T3 estimates (

The estimates of the decadal L-kurtosis (T4) for the 2000s are shown in

Estimated values of L-kurtosis of nonzero streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during the 1950–2009.

The T4 estimates are more difficult to interpret than the T3 values because T4 values express, though not uniquely, a type of peakedness of the dFDCs. Warmer colors continue to depict hydrologic regimes having less T4. Broad regions throughout the study area show a narrow range of T4, with the exceptions being the main stems of major river systems that can be seen by the red to near-red colors in the figure. The primary use of T4 values is thought to be in assessments of fit when using 3-parameter distributions (

The estimates of the decadal fifth L-moment ratio (T5) for the 2000s are shown in

Estimated values of fifth L-moment ratios of nonzero streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during the 1950–2009.

It is difficult to interpret the T5 estimates as geometric measures of the shape of the dFDCs, but T5 measures asymmetry similar to T3. In broad regions throughout the study area, this moment shows a narrow range of T5 values, with the exceptions being the main stems of major river systems that can be seen by the red to near-red colors in the figures. The primary use of the T5 is thought to be in assessments of probability distribution fit (

The term “overall mean streamflow” is the volumetric yield of a watershed. In other words, overall mean streamflow is the mean that includes the zero flows and nonzero flows. The estimates of the decadal overall mean streamflows for the 2000s are shown in

Here is a simplified example. Suppose for the 1960s that the no-flow fraction is 0.11 from the cGAM-PPLO, that the estimated nonzero mean streamflow is 2.01 m^{3}/s from the L1-GAM, and that the Duan retransformation bias correction from GAM-L1 is about 1.046, which is the bias_corr in file all_gage_looest_L1.csv (^{3}/s. Predictions from GAM-L1 represent median response, but for the circumstances here, the mean response itself is desired and hence the retransformation bias correction is used.

Log-transformed estimates of the overall mean streamflow for the 2000s at 9,220 prediction locations (stream reaches) and observed values at 941 U.S. Geological Survey (USGS) streamflow-gaging stations (streamgages) with at least one complete decade of record during 1950–2009.

The 95-percent prediction limits of the overall mean streamflows were computed for whole- and LOO-model results. First, the lower limit was computed from the upper limit for the no-flow fraction, the lower limit for mean nonzero streamflow, and the retransformation bias correction. Second, the upper limit was computed from the lower limit of no-flow fraction, the upper limit for mean nonzero streamflow, and the retransformation bias correction. This computation is structurally a 1 minus no-flow fraction term times mean nonzero streamflow times the bias correction. Third, by using coverage probabilities of the models (cGAM-PPLO and GAM-L1) that are listed in

The compression or adjustment of the prediction limits for the overall mean streamflow is deemed acceptable because the stochastic errors in no-flow fractions and mean nonzero streamflows are not perfectly correlated (simultaneous with each other), meaning the largest no-flow fraction does not exclusively occur with the smallest mean nonzero streamflow. As a result, the initial computation in the first and second parts stated previously are intuitively too large. Finally, the estimated overall mean streamflows for the streamgage-decade records and for the prediction locations are available in

In order to review overall performance of estimation of mean streamflow using cGAM-PPLO and GAM-L1, it is informative to estimate overall mean streamflow for all streamgage-decade records and to compare those to observed data (

Observed overall decadal mean flow values and leave-one-watershed-out estimates from generalized additive models of decadal mean nonzero streamflow (GAM-L1) with retransformation bias correction and decadal no-flow fractions from the censored generalized additive model (cGAM-PPLO).

In

Data from USGS streamgage 08167000, Guadalupe River at Comfort, Texas, were selected to plot in ^{3}/s. The whole- and LOO-model overall mean estimates from ^{3}/s, respectively. Conversely, the 2000s are the wettest observed decade, with an observed mean of 8.729 m^{3}/s and whole- and LOO-model overall mean estimates of 7.149 and 7.124 m^{3}/s, respectively.

A censored GAM of no-flow fractions (cGAM-PPLO) was created, and subsequent predictions at HUC12s were deemed reliable as part of the regional study of dFDCs. For the nonzero part of the dFDCs, uncensored GAMs were created to predict mean nonzero streamflow (GAM-L1), its coefficient of L-variation (GAM-T2), and higher L-moment ratios (GAM-T3, GAM-T4, GAM-T5). The watershed properties implemented in the statistical models include basin area (immutable), basin slope (immutable), decadal precipitation and temperature, and decadal percentages of grassland and urban development. Other variables include mean 1998–2009 solar radiation and the projected coordinates (immutable) of the streamgages. General streambed permeability (immutable), one of two categorical variables used in the model, is used only for the cGAM-PPLO.

The other categorical variable (decade) has six classes for the six decades (1950–2009) of streamflow data, and because the coefficients for the decades are considered small (

A demonstration is shown of the no flow and L-moment GAMs used to produce dFDC quantiles using selected probability distributions. The demonstration shows probability distributions fit to the estimated L-moments with left-tail truncation accommodating the estimated no-flow fraction. A HUC12 pour point near streamgage 08167000 (

A HUC12 pour point (ungaged location; COMID 3588922) is located on a minor tributary to the river monitored by streamgage 08167000 (COMID 3589508). The respective basin areas are 2,181.28 km^{2} (streamgage) and 188.54 km^{2} (ungaged location). The observed dFDCs for the streamgage are transferred to the prediction location by (188.54/2,181.28)^{0.9} = 0.110. The result of this calculation indicates that the projected flows are estimated to be 11 percent of those observed at the nearby streamgage. For example, the observed (gaged) 1950s overall mean streamflow was 2.996 m^{3}/s, which will be transferred to the prediction location as 0.3296 m^{3}/s.

The dFDCs transferred from the streamgage to the prediction location are shown in

Example computation of flow-duration curve defined by the asymmetric exponential power distribution for six decades during the 1950–2009 for an ungaged prediction location on a tributary of the Guadalupe River using streamflow values for nearby U.S. Geological Survey (USGS) streamflow-gaging station (streamgage) 08167000 on the Guadalupe River in order to show comparison between the locations through scaling by the drainage-area ratio method.

Example computation of flow-duration curve defined by the generalized normal distribution for six decades between during the 1950–2009 for an ungaged prediction location on a tributary of the Guadalupe River using streamflow values for nearby U.S. Geological Survey (USGS) streamflow-gaging station (streamgage) 08167000 on the Guadalupe River in order to show comparison between the locations through scaling by the drainage-area ratio method.

Example computation of flow-duration curve defined by the kappa distribution for six decades during the 1950–2009 for an ungaged prediction location on a tributary of the Guadalupe River using streamflow values for nearby U.S. Geological Survey (USGS) streamflow-gaging station (streamgage) 08167000 on the Guadalupe River in order to show comparison between the locations through scaling by the drainage-area ratio method.

Three probability distributions were selected, and each is fit to the decadal L1 corrected for retransformation bias, the T2 estimates, and higher L-moment ratios. The probability distributions are shown in

The KAP is a 4-parameter distribution and is fit with L1 and T2 through the T4 (

The AEP4 is another 4-parameter distribution and, similar to the KAP, is fit with L1 and T2 through the T4 (

For the fitting process to the L-moments, the retransformation bias correction was used, and truncation of the distribution to the estimated no-flow fraction was made unless the fitted distribution itself had its own limiting value to zero that happened to be at a higher no-flow fraction. The no-flow truncation is seen in the GNO curves in ^{3}/s. The truncation can further be seen by other distribution fits becoming vertical at a streamflow of about 0.001 m^{3}/s.

The three distributional forms show some differences. Although truly fit to all four L-moments in this example, the AEP4 upper tail (

The fitted and truncated distributions can be numerically integrated to estimate an overall mean streamflow separately from the direct method of proration between the no-flow fraction L1. Hence, verification checks were made by comparing the overall mean streamflow from no-flow fractions and mean nonzero streamflow with retransformation bias correction (PPLO-ubL1) to numerical integration of the distribution-approximated dFDCs. Decadal estimated overall mean streamflow values of the GNO, KAP, and AEP4 distributions for streamgage 08167000 (not the prediction location used in ^{3}/s. The numerical congruency could be used to assess situations in which the FDC by a given distribution is not to be reported for a streamgage-decade record. Such checks also could be made if FDCs are estimated for the prediction locations by using the selected probability distribution formulas.

[m^{3}/s, cubic meters per second; PPLO-ubL1, overall mean streamflow computed by estimates of no-flow fraction (PPLO) from the cGAM-PPLO model (censored generalized additive model [cGAM]) and the unbiased (UB) mean nonzero streamflow (L1) from the GAM-L1 model (uncensored generalized additive model [GAM]) with the retransformation bias correction; GNO, overall mean streamflow by numerical integration of generalized normal distribution approximation of the flow-duration curve (FDC); KAP, overall mean streamflow by numerical integration of kappa distribution approximation of the FDC; AEP4, overall mean streamflow by numerical integration of asymmetric exponential power distribution approximation of the FDC]

Decade | PPLO-ubL1^{3}/s) |
GNO distribution^{3}/s) |
KAP distribution^{3}/s) |
AEP4 distribution^{3}/s) |

1950 | 3.883 | 3.991 | 3.903 | 4.045 |

1960 | 4.927 | 5.061 | 4.960 | 5.122 |

1970 | 8.040 | 8.246 | 8.127 | 8.351 |

1980 | 7.432 | 7.633 | 7.465 | 7.736 |

1990 | 9.160 | 9.412 | 9.226 | 9.534 |

2000 | 7.149 | 7.352 | 7.176 | 7.449 |

General estimates of the dFDC quantiles (

Reliable predictions of overall mean streamflows (

Hydrologic alteration has been documented in the majority of monitored streams in the United States and is thought to be the primary cause of impairment in riverine ecosystems. Study of the timing and quantity of freshwater inflows can be used to more precisely restore the water quality, marine habitats, and biological resources in the Gulf of Mexico. This study focused on providing flow-duration curve (FDC) quantiles using probability distributions fit to L-moments of nonzero streamflow and the no-flow fractions. Generalized additive models (GAMs) were created by using streamflow data from 941 streamgages and watershed properties to estimate streamflow at ungaged locations. Decadal L-moments were studied by using GAMs, and no-flow fractions were studied by using censor-extended GAMs (cGAMs). The L-moments studied using GAMs were L-moment of nonzero streamflow (L1), coefficient of L-variation (T2), L-skew (T3), L-kurtosis (T4), and fifth L-moment ratio (T5). No-flow conditions were considered type I censored data and GAMs that incorporate Tobit regression methods were used to alleviate bias of censored data in the cGAM. Decadal FDC quantiles were estimated by fitting the L-moment estimates by using GAM regionalization with selected probability distributions.

The reliability of the models was extensively tested. Whole-sample modeling can underestimate prediction uncertainties; therefore, to combat this, leave-one-out (LOO) testing was performed on the GAMs and the cGAM. Nash-Sutcliffe efficiency coefficient, root mean square error, and coverage probabilities for the whole and LOO models were used to evaluate the GAMs and cGAM. The Nash-Sutcliffe efficiency coefficient and root mean square error were never negative, indicating that the models performed better than the mean of the measured data. LOO-modeling coverage probabilities verified whole-model predictions of GAMs, and the cGAM uncertainties are reliable 95-percent prediction limits. The predicted L-moment GAMs and no flow cGAM can be used to reliably produce regionalized decadal FDC (dFDC) quantiles. Therefore, dFDC estimations can be made for ungaged stream reaches in the study area. For example, the level-12 hydrologic unit code pour point (COMID 3588922) near U.S. Geological Survey streamgage 08167000 was used to compare an estimated dFDC (using the GAMs and cGAM) at the ungaged location (COMID 3588922) to an observed dFDC at nearby U.S. Geological Survey streamgage 08167000. Observed dFDC was projected from the streamgage to the ungaged location for the comparison using the drainage-area ratio method. The example highlights the ability to approximate dFDCs at ungaged locations from the no-flow fraction and L-moments using asymmetric exponential power, generalized normal, and kappa probability distributions.

Asymmetric exponential power probability distribution (4-parameter) (

The censored GAM of the no-flow fraction (PPLO).

An identification number for a stream reach (

Drainage-area ratio method (

Flow-duration curve. Can be modified by prepending a time period, such as “period-of-record FDC,” “decadal FDC,” or “2000s FDC.”

The GAM of the decadal mean nonzero streamflow (L1).

The GAM of the decadal coefficient of L-variation of nonzero streamflow (T2).

The GAM of the decadal L-skew of nonzero streamflow (T3).

The GAM of the decadal L-kurtosis of nonzero streamflow (T4).

The GAM of the decadal fifth L-moment ratio of nonzero streamflow (T5).

Generalized logistic distribution (3-parameter) which is the upper L-kurtosis (T4) bounds of the kappa distribution (

Generalized normal probability distribution (3-parameter) and equivalent to 3-parameter log-normal distribution (

Hydrologic unit code, a unique address of a spatial region of a river system.

Kappa probability distribution (4-parameter) (

Decadal mean nonzero streamflow.

No-flow fraction or equivalently, the percentage of decadal no-flow.

Shorthand for overall mean streamflow (watershed yield) computed by proration of the no-flow fraction (PPLO) and mean nonzero streamflow (L1) using a Duan retransformation bias correction (ub) (

A minor repository (

A large repository of statistical software and documentation (

Decadal coefficient of L-variation of nonzero streamflow.

Decadal L-skew of nonzero streamflow.

Decadal L-kurtosis of nonzero streamflow.

Decadal fifth L-moment ratio of nonzero streamflow.

For more information about this publication, contact

Director, Lower Mississippi-Gulf Water Science Center

U.S. Geological Survey

640 Grassmere Park, Suite 100

Nashville, TN 37211

For additional information, visit

Publishing support provided by

Lafayette Publishing Service Center