Random Forest Regression Models for Estimating Low-Streamflow Statistics at Ungaged Locations in New York, Excluding Long Island

Scientific Investigations Report 2025-5060
Prepared in cooperation with the New York State Department of Environmental Conservation
By: , and 

Links

Abstract

Models to estimate low-streamflow statistics at ungaged locations in New York, excluding Long Island and including hydrologically connected basins from bordering States, were developed for the first time by the U.S. Geological Survey, in cooperation with the New York State Department of Environmental Conservation. A total of 224 basin characteristics were developed for 213 unaltered streamgages (locations where the human effects on streamflow were limited), across the following categories: basin geometry, climate, land cover, soils, surficial geology, and other characteristics. The basins with unaltered streamgages were evaluated for potential redundancy, and streamgages in close proximity and with similar drainage areas were flagged and removed from the testing and cross-validation datasets to prevent data leaking from the training dataset to the testing dataset.

Random forest regression models were created by using basin characteristics as predictor variables and by developing a workflow to train, tune, and test the model. Models were developed to estimate the ungaged lowest annual 7-day and 30-day average streamflow that occurs (on average) once every 10 years (7Q10 and 30Q10). The top four basin characteristics used for the 7Q10 and 30Q10 models were drainage area, total stream length, perimeter of the basin, and length of the longest flow path. Results for the 7Q10 and 30Q10 models had coefficients of determination (R2) of 0.796 and 0.853, respectively. The output model results were bias-corrected for ungaged locations across New York and are available within the interactive StreamStats tool.

Suggested Citation

Stagnitta, T.J., Woda, J.C., and Graziano, A.P., 2025, Random forest regression models for estimating low-streamflow statistics at ungaged locations in New York, excluding Long Island: U.S. Geological Survey Scientific Investigations Report 2025–5060, 23 p., https://doi.org/10.3133/sir20255060.

ISSN: 2328-0328 (online)

Study Area

Table of Contents

  • Abstract
  • Introduction
  • Study Area and Supporting Work
  • Methods
  • Results
  • Discussion
  • StreamStats Web Application for Modeled Results in Ungaged Locations
  • Summary
  • Acknowledgments
  • References Cited
Publication type Report
Publication Subtype USGS Numbered Series
Title Random forest regression models for estimating low-streamflow statistics at ungaged locations in New York, excluding Long Island
Series title Scientific Investigations Report
Series number 2025-5060
DOI 10.3133/sir20255060
Publication Date August 01, 2025
Year Published 2025
Language English
Publisher U.S. Geological Survey
Publisher location Reston, VA
Contributing office(s) New York Water Science Center
Description Report: v, 23 p.; 2 Data Releases
Country United States
State New York
Other Geospatial New York excluding Long Island
Online Only (Y/N) Y
Additional Online Files (Y/N) N
Additional publication details