<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book SYSTEM "BITS-book2.dtd">
<book xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" dtd-version="2.0" xml:lang="EN">
<collection-meta collection-type="series"><title-group>
<title>U.S. Geological Survey Scientific Investigations Report</title>
<alt-title alt-title-type="pub-short-title">Scientific Investigations Report</alt-title>
<alt-title alt-title-type="pub-acronym-title">SIR</alt-title>
</title-group><contrib-group content-type="secretary-director"><contrib><string-name><given-names>DAVID</given-names><surname>BERNHARDT</surname></string-name><aff><institution>U.S. Department of the Interior</institution></aff><role>Secretary</role></contrib><contrib><string-name><given-names>James F.</given-names><surname>Reilly</surname><suffix>II</suffix></string-name><aff><institution>U.S. Geological Survey</institution></aff><role>Director</role></contrib></contrib-group><issn publication-format="print">2328-031X</issn><issn publication-format="online">2328-0328</issn></collection-meta>
<book-meta><book-id book-id-type="publisher-id">2021-5093</book-id><book-id book-id-type="doi">10.3133/sir20215093</book-id><book-title-group>
<book-title>A Machine Learning Approach to Modeling Streamflow with Sparse Data in Ungaged Watersheds on the Wyoming Range, Wyoming, 2012&#x2013;17</book-title>
<alt-title alt-title-type="sentence-case">A machine learning approach to modeling streamflow with sparse data in ungaged watersheds on the Wyoming Range, Wyoming, 2012&#x2013;17</alt-title>
<alt-title alt-title-type="running-head">A Machine Learning Approach to Modeling Streamflow with Sparse Data in Ungaged Watersheds on the Wyoming Range</alt-title>
</book-title-group><contrib-group content-type="authors">
<contrib contrib-type="author"><string-name><x>By</x><x> </x><given-names>Ryan R.</given-names><x> </x><surname>McShane</surname></string-name><x> and </x></contrib>
<contrib contrib-type="author"><string-name><given-names>Cheryl A.</given-names><x> </x><surname>Eddy-Miller</surname></string-name></contrib>
</contrib-group><pub-date date-type="pub"><year>2021</year></pub-date><book-volume-number/><publisher>
<publisher-name>U.S. Geological Survey</publisher-name>
<publisher-loc>Reston, Virginia</publisher-loc>
</publisher><edition/><abstract>
<title>Abstract</title>
<p>Scant availability of streamflow data can impede the utility of streamflow as a variable in ecological models of aquatic and terrestrial species, especially when studying small streams in watersheds that lack streamgages. Streamflow data at fine resolution and broad extent were needed by collaborators for ecological research on small streams in several ungaged watersheds of southwestern Wyoming, where streamflow data are sparse.</p>
<p>To improve the utility of sparse streamflow data to ecological research in ungaged watersheds, we developed a machine learning approach in R for modeling spatially and temporally continuous monthly streamflow from 2012 through 2017 in three semiarid montane-steppe watersheds (with drainage areas of 26&#x2013;55&#x00A0;square miles and mean elevations of 8,031&#x2013;8,455&#x00A0;feet) on the Wyoming Range in the upper Green River Basin. A machine learning streamflow (MLFLOW) model was calibrated and validated with 971&#x00A0;discrete streamflow observations and 24&#x00A0;static and dynamic predictor variables derived from geospatial and time series data on climatic, physiographic, and anthropogenic characteristics affecting streamflow. The predictor variables were temporally and spatially conditioned to amplify the relation of predictor variables to monthly streamflow.</p>
<p>The MLFLOW model had satisfactory agreement between observed and predicted streamflow (coefficient of determination [<italic>R</italic><sup>2</sup>]=0.80, Nash-Sutcliffe efficiency [NSE]=0.79, NSE with log-transformed data [logNSE]=0.82, and percent bias [PBIAS]=0.7&#x00A0;percent). NSE and logNSE indicated the MLFLOW model performed equally well for high and low flows, and PBIAS indicated the MLFLOW model did not overpredict or underpredict monthly streamflow. Streamflow predictions seemed to well represent the annual hydrograph within the study area during the study period.</p>
<p>The most important variables (statistically important in the MLFLOW model) for explaining monthly streamflow were temporally and spatially conditioned dynamic climatic variables, mostly precipitation and snow water equivalent. Importance of the static and dynamic variables did not differ substantially among the three watersheds but differed considerably among the 6&#x00A0;years. Monthly streamflow increased with increasing precipitation, snow water equivalent, and drainage area but decreased with increasing forest cover, elevation, evapotranspiration, and temperature.</p>
<p>The MLFLOW model was most sensitive to selection of dynamic climatic variables. Unconditioned dynamic climatic variables alone explained 54&#x00A0;percent of the variance (<italic>R</italic><sup>2</sup>=0.54) in monthly streamflow, whereas adding static physiographic and anthropogenic variables only explained 12&#x00A0;percent more of the variance (<italic>R</italic><sup>2</sup>=0.66). Also, spatial conditioning of all variables together with temporal conditioning of dynamic variables increased the variance explained in the MLFLOW model by another 14&#x00A0;percent (<italic>R</italic><sup>2</sup>=0.80). The MLFLOW model also had greater sensitivity to temporal than to spatial differences in the data. For the MLFLOW model trained with observations from all watersheds and years or for models trained with observations from all except one watershed or 1&#x00A0;year left out sequentially, performance was better in testing on observations from each watershed than from each year separately. Also, performance was better for models fitted to fewer sites than to fewer months of observations.</p>
<p>The greatest utility of the modeling approach is the ease of use and the speed of processing input data, running the model, and interpreting the model output, whereas the greatest limitation is the need for spatially and temporally representative streamflow observations to drive the model. Although familiarity with R is necessary, only a working knowledge of hydrology (for selecting appropriate predictor variables and evaluating the quality of streamflow observations) and a rudimentary understanding of machine learning models are needed. Therefore, this modeling approach is practicable for other scientists who work with water but who are not hydrologists.</p>
</abstract><custom-meta-group>
<custom-meta><meta-name>Online Only</meta-name><meta-value>True</meta-value></custom-meta>
</custom-meta-group><notes notes-type="further-information"><p>For more information on the USGS&#x2014;the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment&#x2014;visit <ext-link>https://www.usgs.gov</ext-link> or call 1&#x2013;888&#x2013;ASK&#x2013;USGS.</p></notes><notes notes-type="overview"><p>For an overview of USGS information products, including maps, imagery, and publications, visit <ext-link>https://store.usgs.gov/</ext-link>.</p></notes><notes notes-type="disclaimer"><p>Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.</p></notes><notes notes-type="permissions"><p>Although this information product, for the most part, is in the public domain, it also may contain copyrighted materials as noted in the text. Permission to reproduce copyrighted items must be secured from the copyright owner.</p></notes></book-meta>
<front-matter>
<ack>
<title>Acknowledgments</title>
<p>Max Kuhn is thanked for creating the &#x201C;caret&#x201D; package in R, which greatly simplified calibration and validation of the machine learning streamflow model. Many universities and State and Federal agencies are greatly appreciated for producing time series and geospatial data for use in modeling streamflow at fine resolution and broad extent.</p>
<p>Annika Walters, with the Wyoming Cooperative Fish and Wildlife Research Unit (University of Wyoming and U.S.&#x00A0;Geological Survey), and her graduate students, Richard Walker and Carlin Gerard, were instrumental in providing streamflow observations used to fit the machine learning streamflow model. Several U.S.&#x00A0;Geological Survey colleagues provided reviews that improved the report and accompanying data release.</p>
</ack>
<front-matter-part book-part-type="Conversion-Factors">
<book-part-meta>
<title-group>
<title>Conversion Factors</title>
</title-group>
</book-part-meta>
<named-book-part-body>
<table-wrap id="ta" position="float">
<caption>
<title>U.S. customary units to International System of Units</title>
</caption>
<table rules="groups">
<col width="41.88%"/>
<col width="13.71%"/>
<col width="44.41%"/>
<thead>
<tr>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Multiply</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">By</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">To obtain</td>
</tr>
</thead>
<tbody>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Length</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">inch (in.)</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">2.54</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">centimeter (cm)</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">inch (in.)</td>
<td valign="top" align="left">25.4</td>
<td valign="top" align="left">millimeter (mm)</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">foot (ft)</td>
<td valign="top" align="left">0.3048</td>
<td valign="top" align="left">meter (m)</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">mile (mi)</td>
<td valign="top" align="left">1.609</td>
<td valign="top" align="left">kilometer (km)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Area</th>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt" scope="row">square mile (mi<sup>2</sup>)</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">2.590</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">square kilometer (km<sup>2</sup>)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Volume</th>
</tr>
<tr>
<td valign="top" align="left" scope="row">cubic foot (ft<sup>3</sup>)</td>
<td valign="top" align="left">0.02832</td>
<td valign="top" align="left">cubic meter (m<sup>3</sup>)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Flow rate</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="row">cubic foot per second (ft<sup>3</sup>/s)</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.02832</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">cubic meter per second (m<sup>3</sup>/s)</td>
</tr>
</tbody></table></table-wrap>
<table-wrap id="tb" position="float">
<caption>
<title>International System of Units to U.S. customary units</title>
</caption>
<table rules="groups">
<col width="44.79%"/>
<col width="14.39%"/>
<col width="40.82%"/>
<thead>
<tr>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Multiply</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">By</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">To obtain</td>
</tr>
</thead>
<tbody>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Length</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">centimeter (cm)</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">0.3937</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">inch (in.)</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">millimeter (mm)</td>
<td valign="top" align="left">0.03937</td>
<td valign="top" align="left">inch (in.)</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">meter (m)</td>
<td valign="top" align="left">3.281</td>
<td valign="top" align="left">foot (ft)</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt" scope="row">kilometer (km)</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">0.6214</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">mile (mi)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">&#x00A0;&#x00A0;Area</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">square kilometer (km<sup>2</sup>)</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">0.3861</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">square mile (mi<sup>2</sup>)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Volume</th>
</tr>
<tr>
<td valign="top" align="left" scope="row">cubic meter (m<sup>3</sup>)</td>
<td valign="top" align="left">35.31</td>
<td valign="top" align="left">cubic foot (ft<sup>3</sup>)</td>
</tr>
<tr>
<th colspan="3" valign="top" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Flow rate</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="row">cubic meter per second (m<sup>3</sup>/s)</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">35.31</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">cubic foot per second (ft<sup>3</sup>/s)</td>
</tr>
</tbody></table></table-wrap>
<p>Temperature in degrees Celsius (&#x00B0;C) may be converted to degrees Fahrenheit (&#x00B0;F) as follows: &#x00B0;F = (1.8 &#x00D7; &#x00B0;C) + 32.</p>
<p>Temperature in degrees Fahrenheit (&#x00B0;F) may be converted to degrees Celsius (&#x00B0;C) as follows: &#x00B0;C = (&#x00B0;F &#x2013; 32) / 1.8.</p>
</named-book-part-body>
</front-matter-part>
<front-matter-part book-part-type="Datum">
<book-part-meta>
<title-group>
<title>Datum</title>
</title-group>
</book-part-meta>
<named-book-part-body>
<p>Vertical coordinate information is referenced to the North American Vertical Datum of 1988, unless noted otherwise.</p>
<p>Horizontal coordinate information is referenced to the North American Datum of 1983.</p>
<p>Elevation, as used in this report, refers to distance above the vertical datum.</p>
</named-book-part-body>
</front-matter-part>
<glossary content-type="Abbreviations">
<title>Abbreviations</title>
<def-list>
<def-item><term>DEM</term><def><p>digital elevation model</p></def></def-item>
<def-item><term>HUC12</term><def><p>12-digit hydrologic unit code</p></def></def-item>
<def-item><term>logNSE</term><def><p>Nash-Sutcliffe efficiency with log-transformed data</p></def></def-item>
<def-item><term>MAD</term><def><p>median absolute deviation</p></def></def-item>
<def-item><term>MLFLOW</term><def><p>machine learning streamflow</p></def></def-item>
<def-item><term>NSE</term><def><p>Nash-Sutcliffe efficiency</p></def></def-item>
<def-item><term>PBIAS</term><def><p>percent bias</p></def></def-item>
<def-item><term><italic>r</italic></term><def><p>Pearson correlation coefficient</p></def></def-item>
<def-item><term><italic>R</italic><sup>2</sup></term><def><p>coefficient of determination</p></def></def-item>
<def-item><term>USGS</term><def><p>U.S. Geological Survey</p></def></def-item>
<def-item><term>WLCI</term><def><p>Wyoming Landscape Conservation Initiative</p></def></def-item>
</def-list>
</glossary>
</front-matter>
<book-body>
<book-part>
<body>
<sec>
<title>Introduction</title>
<p>Streamflow is a necessary ecological resource for many animal and plant species. However, scant availability of streamflow data can impede the utility of streamflow as a variable in ecological models of aquatic and terrestrial species, especially when studying small streams (low stream order or low flow, or both) in watersheds that lack streamgages. Much ecological research on small streams is concentrated on species-habitat relations throughout the channel network of a watershed, but streamflow data, needed at fine resolution and broad extent for this research, are typically sparse. For instance, collaborators with the Wyoming Cooperative Fish and Wildlife Research Unit (University of Wyoming and U.S.&#x00A0;Geological Survey [USGS]) needed more detailed streamflow information for researching the effects of multiple stressors (including low flows) on fish and invertebrates in several ungaged watersheds in the upper Green River Basin (not shown) in southwestern Wyoming (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>), where streamflow data are sparse (<xref ref-type="bibr" rid="r13">Girard and Walters, 2018</xref>; <xref ref-type="bibr" rid="r53">Walters and others, 2019</xref>; <xref ref-type="bibr" rid="r52">Walker and others, 2020</xref>).</p>
<fig id="fig01" position="float" fig-type="figure"><label>Figure 1</label><caption><p>The study area on the Wyoming Range, including the watersheds, channel networks, and sampling sites used in modeling monthly streamflow and the U.S.&#x00A0;Geological Survey streamgage closest to the study area.</p><p content-type="toc">Figure 1.&#x2003;Map showing the study area on the Wyoming Range, including the watersheds, channel networks, and sampling sites used in modeling monthly streamflow and the U.S. Geological Survey streamgage closest to the study area.</p></caption>
<long-desc>Sampling sites and channel networks are shown within the 3 watersheds. The streamgage is to the south on Fontenelle Creek.</long-desc><graphic xlink:href="rol21-0070_fig01"/></fig>
<p>Several approaches to modeling streamflow at various scales have been developed that may improve the utility of sparse streamflow data to ecological research in small streams. One approach is a water-balance model that estimates various elements of the hydrologic cycle (including runoff generation), such as the USGS Thornthwaite water-balance model (<xref ref-type="bibr" rid="r26">McCabe and Markstrom, 2007</xref>). Another approach is a physically based hydrologic (rainfall-runoff) model that simulates water and energy fluxes between the atmosphere and the land surface, such as the USGS Precipitation-Runoff Modeling System (<xref ref-type="bibr" rid="r22">Leavesley and others, 1983</xref>; <xref ref-type="bibr" rid="r24">Markstrom and others, 2015</xref>). However, these two approaches require streamflow data of adequate quality and in sufficient quantity, typically from USGS streamgages, that may not be available for the area or period of interest, and the approaches require expertise in hydrologic modeling that may be impractical for a scientist without much knowledge of hydrology or experience using complicated applications for modeling streamflow.</p>
<p>A third streamflow modeling approach is a machine learning model that can fit potentially complex relations between streamflow observations and environmental predictor variables. Machine learning models are increasingly used in the water sciences in general (<xref ref-type="bibr" rid="r42">Shen and others, 2018</xref>) and specifically for modeling streamflow (<xref ref-type="bibr" rid="r4">Bellos and Carbajal, 2020</xref>). However, many applications of machine learning to streamflow modeling are reliant on streamflow data from streamgages for use in model calibration and validation. These data are typically more available for larger, higher-order streams, and calibration and validation of models on low-order streams has been uncommon. For example, <xref ref-type="bibr" rid="r28">Miller and others (2018)</xref> used a random forest model fitted to USGS streamgage data (with few data from low-order streams) to predict monthly streamflow for the conterminous United States. However, in contrast to this reliance on streamgage data, <xref ref-type="bibr" rid="r17">Jaeger and others (2019)</xref> used a random forest model fitted to thousands of observations of wet or dry stream conditions to predict the annual probability of a channel maintaining streamflow throughout the year in the Columbia River Basin (not shown).</p>
<p>Of the three approaches to modeling streamflow, machine learning models may be most adaptable to the needs of ecological researchers working in ungaged watersheds and may have fewer initial learning impediments to developing a working streamflow model. Moreover, using a more complex water-balance model or physically based hydrologic model, even if the requisite data are available, does not necessarily result in better streamflow predictions than using a simpler machine learning model. Therefore, we determined that a machine learning model was the most appropriate streamflow modeling approach for this study.</p>
<p>The primary objective of this study was to develop an approach to modeling streamflow with sparse data in ungaged watersheds in southwestern Wyoming, predicting spatially and temporally continuous monthly streamflow for all the study area and study period (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>). A secondary objective was to explain the environmental drivers of monthly streamflow within the study area during the study period. We accomplished these two objectives by developing a machine learning streamflow (MLFLOW) model calibrated and validated with discrete streamflow observations and static and dynamic predictor variables derived from geospatial and time series data on climatic, physiographic, and anthropogenic characteristics affecting streamflow. The resulting model predictions and explanation of streamflow are applied at the reach scale (tens of meters), of use particularly on small streams.</p>
<p>This study also is a contribution to the Wyoming Landscape Conservation Initiative (WLCI), a program with the mission &#x201C;to implement a long-term, science-based program of assessing, conserving, and enhancing fish and wildlife habitats while facilitating responsible energy and other development through local collaboration and partnerships&#x201D; (<xref ref-type="bibr" rid="r5">Bowen and others, 2014</xref>, p.&#x00A0;2). Much of the WLCI area in southwestern Wyoming consists of small streams in watersheds without streamgages, which complicates management of species dependent on these streams. Therefore, the modeling approach developed in this study can support species management decisions throughout the WLCI area.</p>
<sec>
<title>Description of Study Area</title>
<p>The channel networks of 3&#x00A0;watersheds and 125&#x00A0;streamflow sampling sites on the Wyoming Range in southwestern Wyoming (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) were used for modeling monthly streamflow. The three watersheds are identified by the following USGS 12-digit hydrologic unit codes (HUC12; <xref ref-type="bibr" rid="r47">U.S.&#x00A0;Geological Survey, 2016</xref>): 140401010907 (Lower South Piney Creek), 140401011102 (North Fork Dry Piney Creek), and 140401011103 (Dry Piney Creek). In this report, these three HUC12s are referred to as watersheds&#x00A0;1&#x2013;3, respectively. The three watersheds are in the western WLCI area, which covers much of southwestern Wyoming (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>).</p>
<p>The three watersheds are in a sparsely populated region and have many similar physical properties. Watersheds&#x00A0;1 and 2 are similar in drainage area (34 and 26&#x00A0;square miles [mi<sup>2</sup>], respectively) and mean elevation (8,428 and 8,455&#x00A0;feet, respectively), whereas watershed&#x00A0;3 is larger (55&#x00A0;mi<sup>2</sup>) and lower (8,031&#x00A0;feet) (<xref ref-type="bibr" rid="r49">U.S.&#x00A0;Geological Survey, 2019a</xref>). The watersheds have a similar semiarid climate characterized by warm summers with precipitation from occasional thunderstorms and by cold winters with snow typically covering the montane-steppe landscape from November through March. Watersheds&#x00A0;1 and 2 have a mean annual temperature of 38.1&#x00A0;degrees Fahrenheit and a total annual precipitation of 10.6&#x00A0;inches, whereas watershed&#x00A0;3 is cooler (37.2&#x00A0;degrees Fahrenheit) and wetter (13.7&#x00A0;inches) (<xref ref-type="bibr" rid="r36">PRISM Climate Group, 2018</xref>). Geology of the watersheds is mostly sedimentary rocks of Quaternary and Tertiary age (<xref ref-type="bibr" rid="r34">Oriel and Platt, 1980</xref>). The watersheds are on the eastern Overthrust Belt (not shown), and the surficial units in the upper elevations and Cretaceous Mountain (not shown; to the east of watersheds&#x00A0;1 and 2) are much older, dating to the Cambrian. Additionally, the Overthrust Belt has many surficial and buried faults that transmit groundwater into the watersheds through springs and seeps (<xref ref-type="bibr" rid="r59">Zielinski and others, 1985</xref>). Land cover in the watersheds is mostly shrub/scrub (66&#x2013;72&#x00A0;percent) and forest (20&#x2013;25&#x00A0;percent), and irrigated land use ranges from about 0.5 to 2.5&#x00A0;percent (<xref ref-type="bibr" rid="r30">Multi-Resolution Land Characteristics Consortium, 2017</xref>). Oil and gas development began in the region in the early 1900s and is ongoing. The energy development footprint increases from watershed&#x00A0;1 to 3 (<xref ref-type="bibr" rid="r53">Walters and others, 2019</xref>). Total length of streams in the three watersheds is 372&#x00A0;miles, and about 75&#x00A0;percent (277&#x00A0;miles) consisted of first- and second-order streams (<xref ref-type="bibr" rid="r49">U.S.&#x00A0;Geological Survey, 2019a</xref>). Annual hydrographs are characterized by snowmelt high flows in late spring and early summer (May and June) followed by low flows during summer through the following spring; the lowest flows are in fall and winter.</p>
<p>The only USGS streamgage close to the study area and operating during the study period (2012&#x2013;17) was on Fontenelle Creek near Herschler Ranch, near Fontenelle, Wyoming (USGS&#x00A0;09210500; <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>), which has a drainage area of 152&#x00A0;mi<sup>2</sup> and an elevation of 6,950&#x00A0;feet (above the National Geodetic Vertical Datum of 1929; <xref ref-type="bibr" rid="r51">U.S.&#x00A0;Geological Survey, 2021</xref>). The watershed monitored by this streamgage has a larger drainage area and lower mean elevation than all three watersheds of the study area. This difference in drainage area and elevation and the distance of this streamgage from the study area made the streamgage unfit for calibrating and validating a streamflow model.</p>
</sec>
<sec>
<title>Purpose and Scope</title>
<p>A machine learning approach was developed in R (<xref ref-type="bibr" rid="r37">R&#x00A0;Core Team, 2021</xref>) for modeling monthly streamflow from 2012 through 2017 in three watersheds on the Wyoming Range. This study was intended to investigate the utility and limitations of applying a machine learning approach to modeling streamflow with sparse data (for example, streamflow measurements made at different times and places instead of a time series of streamflow observations made at a streamgage). This study was not intended to investigate performance differences among various machine learning models used for modeling streamflow. Methodology of the modeling approach is described, including temporal and spatial conditioning processes applied to the input data and calibration and validation of the MLFLOW model. Results of the MLFLOW model are presented with discussion of the model output, including predictive performance at the sampling sites (with comparisons to other streamflow modeling studies), variable importance for explaining monthly streamflow, sensitivity to the predictor variables and streamflow observations used in fitting the model, and monthly streamflow predictions on the channel networks. The input data, model output, and R scripts for the MLFLOW model are available in an accompanying data release (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>).</p>
</sec>
</sec>
<sec>
<title>Methods for Machine Learning Approach to Modeling Streamflow</title>
<p>The MLFLOW model was developed to predict monthly streamflow using static and dynamic variables derived from geospatial and time series data and to explain the multidimensional relation of monthly streamflow to the predictor variables. The following sections describe characteristics of the input data (streamflow observations and predictor variables) and procedures used to process data temporally and spatially for use in the MLFLOW model, development of the MLFLOW model, evaluation of the model output for predictive performance and variable importance, and assessment of model sensitivity to how the data were used in fitting the MLFLOW model. All data preparation, streamflow modeling, and model analysis were done by using geoprocessing tools in ArcGIS (<xref ref-type="bibr" rid="r10">Esri, 2019</xref>) and TauDEM (<xref ref-type="bibr" rid="r46">Tarboton, 2016</xref>) and by using several packages in R (<xref ref-type="bibr" rid="r37">R&#x00A0;Core Team, 2021</xref>).</p>
<sec>
<title>Streamflow Observations</title>
<p>No USGS streamgages, with time series of streamflow, were available in the study area for use in developing the MLFLOW model. One nearby streamgage (less than 20&#x00A0;miles from the study area) was operational during the study period (2012&#x2013;17), but the streamgage was on a river with a larger drainage area and lower mean elevation than of the three watersheds of the study area. Therefore, this study relied on available discrete streamflow observations to calibrate and validate the MLFLOW model.</p>
<p>Streamflow measurements were made at 125&#x00A0;sites (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) in 35&#x00A0;months during 2012&#x2013;17, totaling 971&#x00A0;discrete observations (<xref ref-type="table" rid="t01">table&#x00A0;1</xref>); only a single measurement was made at any site in any month. These observations were not made on a consistent spatiotemporal basis but together are representative of the annual hydrograph on a monthly timestep, including high flows in late spring and early summer (<xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>B</italic>&#x2013;<italic>E</italic></xref>) and low flows from late summer through early spring (<xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>F</italic>&#x2013;<italic>H</italic></xref>, <italic>A</italic>). Therefore, the observations were assumed to qualitatively represent monthly streamflow. Ecologists with the Wyoming Cooperative Fish and Wildlife Research Unit (University of Wyoming and USGS) made 763&#x00A0;observations at 101&#x00A0;sites in 30&#x00A0;months during 2012&#x2013;17 for various research on aquatic species distribution and abundance. Most (736) of the observations were made during May through August of 2012&#x2013;17, and the remainder of the observations were made during April of 2012 and 2017 and September of 2013&#x2013;16. USGS hydrologists made 208&#x00A0;observations at 24&#x00A0;sites in 10&#x00A0;months during 2015&#x2013;17 for research on groundwater/surface-water relations. Most (184) of the observations were made during June, August, and November of 2015 and 2016 and July and September of 2017; the remainder of the observations were made during April 2016 and February 2017. Streamflow measurements were made following standard USGS techniques and methods for streamflow measurement and computation using current-velocity meters, portable flumes, or pressure transducers (<xref ref-type="bibr" rid="r38">Rantz and others, 1982</xref>; <xref ref-type="bibr" rid="r40">Sauer and Turnipseed, 2010</xref>). The monthly streamflow observation data are available in the accompanying data release (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>).</p>
<table-wrap id="t01" position="float">
<label>Table 1</label><caption><title>Number of sites and months sampled by watershed and year and spatial and temporal distribution among watersheds and years of number of streamflow observations by month sampled during 2012&#x2013;17.</title>
<p content-type="toc">Table 1.&#x2003;Number of sites and months sampled by watershed and year and spatial and temporal distribution among watersheds and years of number of streamflow observations by month sampled during 2012&#x2013;17.</p>
<p>[--, not sampled]</p>
</caption>
<table rules="groups">
<col width="13.48%"/>
<col width="9.57%"/>
<col width="9.7%"/>
<col width="9.98%"/>
<col width="6.46%"/>
<col width="5.89%"/>
<col width="6.92%"/>
<col width="6.92%"/>
<col width="8.49%"/>
<col width="11.51%"/>
<col width="11.08%"/>
<thead>
<tr>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Watershed<break/>or year</td>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number<break/>of<break/>sites sampled</td>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number of months sampled</td>
<td colspan="8" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number of streamflow observations</td>
</tr>
<tr>
<td valign="middle" colspan="1" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">February</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">April</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">May</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">June</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">July</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">August</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">September</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">November</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">Watershed 1</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">37</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">32</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">3</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">1</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">16</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">87</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">96</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">69</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">20</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">22</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">Watershed 2</td>
<td valign="top" align="char" char=".">37</td>
<td valign="top" align="char" char=".">33</td>
<td valign="top" align="char" char=".">3</td>
<td valign="top" align="char" char=".">5</td>
<td valign="top" align="char" char=".">9</td>
<td valign="top" align="char" char=".">80</td>
<td valign="top" align="char" char=".">91</td>
<td valign="top" align="char" char=".">73</td>
<td valign="top" align="char" char=".">11</td>
<td valign="top" align="char" char=".">9</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">Watershed 3</td>
<td valign="top" align="char" char=".">51</td>
<td valign="top" align="char" char=".">34</td>
<td valign="top" align="char" char=".">7</td>
<td valign="top" align="char" char=".">8</td>
<td valign="top" align="char" char=".">11</td>
<td valign="top" align="char" char=".">131</td>
<td valign="top" align="char" char=".">119</td>
<td valign="top" align="char" char=".">69</td>
<td valign="top" align="char" char=".">16</td>
<td valign="top" align="char" char=".">15</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2012</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">80</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">1</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">45</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">40</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">79</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2013</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">95</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">74</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">92</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">28</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">3</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2014</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">71</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">7</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">19</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">55</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">11</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">6</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2015</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">95</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">6</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">8</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">53</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">45</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">35</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">8</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">22</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2016</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">95</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">7</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">11</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">7</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">83</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">19</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">31</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">7</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">24</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">2017</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">94</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">7</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">13</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">2</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">4</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">24</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">55</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">27</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">23</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">--</td>
</tr>
</tbody></table></table-wrap>
<fig id="fig02" position="float" fig-type="figure"><label>Figure 2</label><caption><p>Observed streamflow against drainage area of sampling sites by watershed and year for each month sampled during 2012&#x2013;17. <italic>A</italic>,&#x00A0;February; <italic>B</italic>,&#x00A0;April; <italic>C</italic>,&#x00A0;May; <italic>D</italic>,&#x00A0;June; <italic>E</italic>,&#x00A0;July; <italic>F</italic>,&#x00A0;August; <italic>G</italic>,&#x00A0;September; and <italic>H</italic>,&#x00A0;November.</p><p content-type="toc">Figure 2.&#x2003;Graphs showing observed streamflow against drainage area of sampling sites by watershed and year for each month sampled during 2012&#x2013;17.</p></caption>
<long-desc>Observed streamflow varies against drainage area of sampling sites by watershed and year for each month sampled.</long-desc><graphic xlink:href="rol21-0070_fig02"/></fig>
<p>The number of sites where streamflow measurements were made was different among the three watersheds and among the 6&#x00A0;years of sampling, whereas the number of months when streamflow measurements were made was similar among watersheds and among years. On average, sampling involved 42&#x00A0;sites and 33&#x00A0;months per watershed and 88&#x00A0;sites and 6&#x00A0;months per year (<xref ref-type="table" rid="t01">table&#x00A0;1</xref>). No sites were sampled in every month, but every site was sampled in at least 2&#x00A0;months, and of the 35&#x00A0;months sampled during 2012&#x2013;17, sites were sampled in 8&#x00A0;months on average (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>). The sites sampled in the greatest number of months were sites&#x00A0;25, 27, and 31 (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>), which were sampled in 28, 25, and 28&#x00A0;months, respectively. Additionally, the watershed (3) or year (2013) with the most streamflow observations had about 100 more observations than the watershed (1) or year (2014) with the fewest streamflow observations (<xref ref-type="table" rid="t02">table&#x00A0;2</xref>).</p>
<table-wrap id="t02" position="float">
<label>Table 2</label><caption><title>Summary statistics of streamflow observations by watershed and year.</title>
<p content-type="toc">Table 2.&#x2003;Summary statistics of streamflow observations by watershed and year.</p>
</caption>
<table rules="groups">
<col width="16.59%"/>
<col width="15.64%"/>
<col width="18.66%"/>
<col width="17.72%"/>
<col width="14.6%"/>
<col width="16.79%"/>
<thead>
<tr>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Watershed<break/>or year</td>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number of streamflow observations</td>
<td colspan="4" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Summary statistics of streamflow observations<break/>(cubic foot per second)</td>
</tr>
<tr>
<td valign="middle" colspan="1" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Minimum</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Mean</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Maximum</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">Watershed 1</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">314</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.05</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">1.81</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">2.68</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">19.67</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">Watershed 2</td>
<td valign="top" align="char" char=".">281</td>
<td valign="top" align="char" char=".">0</td>
<td valign="top" align="char" char=".">1.40</td>
<td valign="top" align="char" char=".">2.53</td>
<td valign="top" align="char" char=".">17.08</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">Watershed 3</td>
<td valign="top" align="char" char=".">376</td>
<td valign="top" align="char" char=".">0</td>
<td valign="top" align="char" char=".">1.13</td>
<td valign="top" align="char" char=".">2.05</td>
<td valign="top" align="char" char=".">19.54</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2012</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">170</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.40</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.62</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">4.34</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2013</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">202</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.46</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.84</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">9.97</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2014</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">98</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.05</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2.32</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2.97</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">11.49</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2015</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">171</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.13</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2.01</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2.89</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">11.90</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">2016</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">182</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.04</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">1.98</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2.72</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">9.39</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">2017</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">148</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">4.01</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">5.21</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">19.67</td>
</tr>
</tbody></table></table-wrap>
<p>Observed flows at the sampling sites ranged from 0 to 19.67&#x00A0;cubic feet per second (ft<sup>3</sup>/s) and averaged 2.39&#x00A0;ft<sup>3</sup>/s (<xref ref-type="table" rid="t02">table&#x00A0;2</xref>), and drainage area of the sampling sites ranged from 0.64 to 78.1&#x00A0;mi<sup>2</sup> and averaged 14.3&#x00A0;mi<sup>2</sup> (<xref ref-type="fig" rid="fig02">fig.&#x00A0;2</xref>). A robust streamflow/drainage-area relation among the sampling sites was not readily apparent in every month sampled (<xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>A</italic>&#x2013;<italic>H</italic></xref>), most likely because sites with different drainage areas were sampled in different months, confounding the relation between streamflow and drainage area with the relation between streamflow and month. Sites with the largest drainage areas (greater than 67&#x00A0;mi<sup>2</sup>) were sampled only in watershed&#x00A0;3 and only during summer (June through August, <xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>D</italic>&#x2013;<italic>F</italic></xref>). Furthermore, sites with smaller drainage areas (less than 41&#x00A0;mi<sup>2</sup>) were sampled in every month (<xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>A</italic>&#x2013;<italic>H</italic></xref>) but only in 1 or 2&#x00A0;years in February, April, and November (<xref ref-type="table" rid="t01">table&#x00A0;1</xref>; <xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>A</italic>, <italic>B</italic>, <italic>H</italic></xref>). Additionally, in some months (for example, May, <xref ref-type="fig" rid="fig02">fig.&#x00A0;2<italic>C</italic></xref>), the streamflow-drainage area relation seemed to be more robust by watershed, where streamflow increased with increasing drainage area, than by year, when streamflow decreased with drainage area in some years but increased in others.</p>
<p>The distribution in values of streamflow observations varied more among the 6&#x00A0;years of sampling than among the three watersheds. For example, mean streamflow differed by only 31&#x00A0;percent among watersheds but by 740&#x00A0;percent among years (<xref ref-type="table" rid="t02">table&#x00A0;2</xref>). In general, streamflow values were low&#x2014;median streamflow ranged from 0.4 to 4.01&#x00A0;ft<sup>3</sup>/s among watersheds and years; furthermore, two watersheds and 3&#x00A0;years had zero-flow observations (<xref ref-type="table" rid="t02">table&#x00A0;2</xref>). In addition, climatically, 2012 and 2013 were dry years, whereas 2017 was a wet year. These 2&#x00A0;dry years and 1&#x00A0;wet year had very low and very high mean streamflow, respectively, compared with the other 3&#x00A0;years (2014&#x2013;16, <xref ref-type="table" rid="t02">table&#x00A0;2</xref>).</p>
</sec>
<sec>
<title>Predictor Variables</title>
<p>Based on the potential to affect streamflow, 24&#x00A0;variables describing physiographic, anthropogenic, and climatic characteristics were chosen as predictors in the MLFLOW model&#x2014;20&#x00A0;variables described static physiographic and anthropogenic conditions, and 4&#x00A0;variables described dynamic climatic conditions (<xref ref-type="table" rid="t03">table&#x00A0;3</xref>). Vector data (surficial geology permeability index and number of surficial geology contacts, bedrock geology faults, springs, water diversions, roads, and water wells, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) were converted to raster data. Raster data describing topography (drainage area, elevation, slope-area ratio, and slope, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) were derived from the 30-meter (m) resolution National Elevation Dataset digital elevation model (DEM) in the USGS National Hydrography Dataset (<xref ref-type="bibr" rid="r49">U.S.&#x00A0;Geological Survey, 2019a</xref>). All variables were snapped to the 30-m DEM extent. Any variable with lower spatial resolution than the 30-m DEM resolution, including base-flow index, depth to soil restrictive layer, depth to water table, and every climate variable (<xref ref-type="table" rid="t03">table&#x00A0;3</xref>), was resampled, which was necessary for processing the data for use in the MLFLOW model. This resampling produced data with higher spatial resolution than the original data resolution but with minimal change in the distribution of values of the original data. Each climate variable was provided as a monthly value&#x2014;total for evapotranspiration and precipitation, mean for temperature, and first day of the month for snow water equivalent. Values of each predictor variable were produced throughout the channel networks of the three watersheds, which consisted of every cell of the flow accumulation grid in the medium resolution (30&#x00A0;m) National Hydrography Dataset with a value of 100 or more, equivalent to a drainage area of 0.09&#x00A0;square kilometer or more. Data on the predictor variables are available in the accompanying data release (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>), including raw values of the variables and values of the variables with temporal and spatial conditioning&#x2014;the conditioning processes are described in the following section.</p>
<table-wrap id="t03" orientation="landscape" position="float">
<label>Table 3</label><caption><title>Static physiographic and anthropogenic variables and dynamic climatic variables used in the machine learning streamflow model.</title>
<p content-type="toc">Table 3.&#x2003;Static physiographic and anthropogenic variables and dynamic climatic variables used in the machine learning streamflow model.</p>
<p>[Suffixes of variable codes (following an underscore) with the following definitions: a, area-averaged spatial conditioning; d, distance-decayed spatial conditioning; 00, 01, 03, 06, 09, 12, temporal conditioning of dynamic climatic variables with a moving average of the current month (00) and prior 1, 3, 6, 9, or 12 months (01&#x2013;12), respectively. DEM, digital elevation model]</p>
</caption>
<table rules="groups">
<col width="9.34%"/>
<col width="25.2%"/>
<col width="27.53%"/>
<col width="37.93%"/>
<thead>
<tr>
<td valign="middle" colspan="2" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Variable</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Code</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Source</td>
</tr>
</thead>
<tbody>
<tr>
<th valign="middle" colspan="4" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="col">Static physiographic and anthropogenic variables used to fit models</th>
</tr>
<tr>
<td rowspan="8" valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">Geology</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">Soil bulk density</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">bulkDens_a</td>
<td valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">Chaney and others (2019); POLARIS (2019)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Number of surficial geology contacts</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">contacts_d</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Wyoming State Geological Survey (2015)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Number of bedrock geology faults</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">faults_d</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Wyoming State Geological Survey (2014)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Soil saturated hydraulic conductivity</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">hydCond_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Chaney and others (2019); POLARIS (2019)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Depth to soil restrictive layer</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">restrLayer_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Soil Survey Staff (2016, 2017)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Surficial geology permeability index</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">surfPerm_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Stoeser and others (2005); Bartolino and others (2019)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Soil saturated water content</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">waterCont_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Chaney and others (2019); POLARIS (2019)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Depth to water table</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">waterTable_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Soil Survey Staff (2016, 2017)</td>
</tr>
<tr>
<td rowspan="2" valign="top" align="left" scope="row">Hydrology</td>
<td valign="top" align="left">Base-flow index</td>
<td valign="top" align="left">baseFlow_a</td>
<td valign="top" align="left">Wolock (2003)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" scope="row">Number of springs</td>
<td valign="top" align="left">springs_d</td>
<td valign="top" align="left">U.S. Geological Survey (2019b)</td>
</tr>
<tr>
<td rowspan="4" valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">Topography</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Drainage area</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">area</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Computed with 30-meter DEM (U.S. Geological Survey, 2019a) using TauDEM (Tarboton, 2016)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Elevation</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">elevation_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">30-meter DEM (U.S. Geological Survey, 2019a)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Slope</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">slope_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Computed with 30-meter DEM (U.S. Geological Survey, 2019a) using TauDEM (Tarboton, 2016)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Slope-area ratio</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">slopeArea_a</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Computed with 30-meter DEM (U.S. Geological Survey, 2019a) using TauDEM (Tarboton, 2016)</td>
</tr>
<tr>
<td rowspan="3" valign="top" align="left" scope="row">Vegetation</td>
<td valign="top" align="left">Cropland cover</td>
<td valign="top" align="left">cropland_a</td>
<td valign="top" align="left">Massey and others (2017)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" scope="row">Forest cover</td>
<td valign="top" align="left">forest_a</td>
<td valign="top" align="left">Homer and others (2015); Multi-Resolution Land Characteristics Consortium (2017)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" scope="row">Wetland cover</td>
<td valign="top" align="left">wetland_a</td>
<td valign="top" align="left">Homer and others (2015); Multi-Resolution Land Characteristics Consortium (2017)</td>
</tr>
<tr>
<td rowspan="3" valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">Human</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Number of water diversions</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">diversions_d</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Wyoming Water Development Office (2007)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Number of roads</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">roads_d</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">O&#x2019;Donnell and others (2014)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="background-color:rgb(242,242,242)" scope="row">Number of water wells</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">wells_d</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">Wyoming State Engineer&#x2019;s Office (2016)</td>
</tr>
<tr>
<th valign="middle" colspan="4" align="center" style="border-top: solid 0.50pt" scope="col">Dynamic climatic variables used to fit models</th>
</tr>
<tr>
<td rowspan="4" valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="row">Climate</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">Evapotranspiration</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">eta_00_a, eta_01_a, eta_03_a, eta_06_a, eta_09_a, eta_12_a</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">Senay and others (2013); U.S. Geological Survey (2018)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" scope="row">Precipitation</td>
<td valign="top" align="left">ppt_00_a, ppt_01_a, ppt_03_a, ppt_06_a, ppt_09_a, ppt_12_a</td>
<td valign="top" align="left">Daly and others (2008); PRISM Climate Group (2018)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" scope="row">Snow water equivalent</td>
<td valign="top" align="left">swe_00_a, swe_01_a, swe_03_a, swe_06_a, swe_09_a, swe_12_a</td>
<td valign="top" align="left">Barrett (2003); National Operational Hydrologic Remote Sensing Center (2004)</td>
</tr>
<tr>
<td valign="top" colspan="1" align="left" style="border-bottom: solid 0.50pt" scope="row">Temperature</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">tmp_00_a, tmp_01_a, tmp_03_a, tmp_06_a, tmp_09_a, tmp_12_a</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">Daly and others (2008); PRISM Climate Group (2018)</td>
</tr>
</tbody></table></table-wrap>
<sec>
<title>Processes of Temporal and Spatial Conditioning</title>
<p>Streamflow is affected by antecedent climatic conditions. For example, precipitation during an earlier period can have an enduring effect on streamflow. Therefore, to generate a diverse lagged effect of climatic conditions on streamflow, a temporal conditioning process was applied to the climate variables. The process consisted of moving averages of the time series data that ranged from the prior month to the prior year. Data on evapotranspiration, precipitation, snow water equivalent, and temperature were temporally conditioned into five variants for each climate variable, and every variant was used in fitting the MLFLOW model&#x2014;moving-average (mean) values of the current month together with the previous 1, 3, 6, 9, or 12&#x00A0;months (codes with suffix &#x201C;01&#x201D; through &#x201C;12,&#x201D; <xref ref-type="table" rid="t03">table&#x00A0;3</xref>). Data on the four climate variables without conditioning (raw values of the data; codes with suffix &#x201C;00,&#x201D; <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) also were used in model fitting. Using May 2017 as an example, variants for snow water equivalent were generated to describe, for any cell, unconditioned (raw) snow water equivalent for the current month (<xref ref-type="fig" rid="fig03">fig.&#x00A0;3<italic>A</italic></xref>); mean snow water equivalent for the current and prior month (<xref ref-type="fig" rid="fig03">fig.&#x00A0;3<italic>B</italic></xref>); and mean snow water equivalent for the current and prior 3, 6, 9, and 12&#x00A0;months (<xref ref-type="fig" rid="fig03">fig.&#x00A0;3<italic>C</italic>&#x2013;<italic>F</italic></xref>, respectively). For any cell in the resulting grids, the different conditioned variants (<xref ref-type="fig" rid="fig03">fig.&#x00A0;3<italic>B</italic>&#x2013;<italic>F</italic></xref>) could have a value greater or less than the raw value of the current month (<xref ref-type="fig" rid="fig03">fig.&#x00A0;3<italic>A</italic></xref>), which might be more explanatory, in the MLFLOW model, of streamflow in the current month.</p>
<fig id="fig03" position="float" fig-type="figure"><label>Figure 3</label><caption><p>An example of a dynamic climatic variable, snow water equivalent, shown before and after temporal conditioning with a moving average of cell values (figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>). <italic>A</italic>,&#x00A0;current month May&#x00A0;2017; <italic>B</italic>,&#x00A0;current and prior month; <italic>C</italic>,&#x00A0;current and prior 3&#x00A0;months; <italic>D</italic>,&#x00A0;current and prior 6&#x00A0;months; <italic>E</italic>,&#x00A0;current and prior 9&#x00A0;months; and <italic>F</italic>,&#x00A0;current and prior 12&#x00A0;months.</p><p content-type="toc">Figure 3.&#x2003;Diagrams showing an example of a dynamic climatic variable, snow water equivalent, shown before and after temporal conditioning with a moving average of cell values.</p></caption>
<long-desc>Snow water equivalent ranges from 0 to 350 millimeters for different moving averages of months.</long-desc><graphic xlink:href="rol21-0070_fig03"/></fig>
<p>The relation of a predictor variable to streamflow depends on the area upslope of a channel and the distance to a channel downslope. Therefore, two geospatial processes were used to account for area and distance effects on streamflow. An area-averaged accumulation of the spatial data generated upstream effects of a variable. A distance-decayed accumulation of the spatial data generated more localized effects of a variable. Practical software tools for implementing these geospatial (flow-conditioning) processes are available in <xref ref-type="bibr" rid="r1">Barnhart and others (2020)</xref>.</p>
<p>The process for area-averaged accumulation involved weighting a flow accumulation grid with values of a variable and dividing by the total area of all upslope grid cells. The resulting grid described the average value of the variable for the area upslope of each grid cell. Area-averaged conditioning was most applicable to more continuous variables, such as vegetation and climate, whose effect on streamflow was anticipated mostly from areas upstream. Spatial conditioning with an area-averaged accumulation was applied to 17&#x00A0;variables (codes with suffix &#x201C;a,&#x201D; <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) as follows: 6&#x00A0;geology (soil bulk density, soil saturated hydraulic conductivity, depth to soil restrictive layer, surficial geology permeability index, soil saturated water content, and depth to water table); 1&#x00A0;hydrology (base-flow index); 3&#x00A0;topography (elevation, slope, and slope-area ratio); 3&#x00A0;vegetation (cropland cover, forest cover, and wetland cover); and 4&#x00A0;climate (evapotranspiration, precipitation, snow water equivalent, and temperature). Using forest cover as an example, the unconditioned value of forest cover for any cell was 0 or 100&#x00A0;percent (<xref ref-type="fig" rid="fig04">fig.&#x00A0;4<italic>A</italic></xref>), but area-averaged conditioning produced a continuous value for every cell based on the averaged downslope accumulation of forest cover (<xref ref-type="fig" rid="fig04">fig.&#x00A0;4<italic>B</italic></xref>). Streamflow would be less affected by the forest cover at a specific grid cell on the channel network than by the general forest cover upstream from the cell.</p>
<fig id="fig04" position="float" fig-type="figure"><label>Figure 4</label><caption><p>An example of a static physiographic variable, forest cover, shown before and after spatial conditioning with an area-averaged accumulation of cell values (figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>). <italic>A</italic>,&#x00A0;unconditioned and <italic>B</italic>,&#x00A0;with area-averaged conditioning.</p><p content-type="toc">Figure 4.&#x2003;Diagrams showing an example of a static physiographic variable, forest cover, shown before and after spatial conditioning with an area-averaged accumulation of cell values.</p></caption>
<long-desc>Forest cover ranges from 0 to 100 percent. Averaged accumulation of forest cover ranges from 0 to 100 percent.</long-desc><graphic xlink:href="rol21-0070_fig04"/></fig>
<p>The process for distance-decayed accumulation involved weighting a flow accumulation grid with values of a variable and multiplying by a decay grid, which was implemented as a negative exponential function of the distance of any grid cell to the nearest channel downslope (traveling along the flow direction grid). The resulting grid described the total value of the variable for all cells upslope of each grid cell attenuated by distance. Distance-decayed conditioning was most applicable to discrete variables, such as human infrastructure or geologic structures, which were expected to affect streamflow more in proximate channels. Spatial conditioning with a distance-decayed accumulation was applied to six variables (codes with suffix &#x201C;d,&#x201D; <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) as follows: two geology (number of surficial geology contacts and number of bedrock geology faults); one hydrology (number of springs); and three human (number of water diversions, number of roads, and number of water wells). Using surficial geology contacts as an example, the unconditioned value of a geologic contact for any cell was 0 or 1 (<xref ref-type="fig" rid="fig05">fig.&#x00A0;5<italic>A</italic></xref>), but distance-decayed conditioning produced a continuous value for every cell based on the decayed downslope accumulation of geologic contacts (<xref ref-type="fig" rid="fig05">fig.&#x00A0;5<italic>B</italic></xref>). Streamflow would be most affected in a channel closest to the geologic contact, decreasing in effect farther downstream; moreover, a geologic contact would have less effect on streamflow as upslope distance from a channel increased.</p>
<fig id="fig05" position="float" fig-type="figure"><label>Figure 5</label><caption><p>An example of a static physiographic variable, number of surficial geology contacts, shown before and after spatial conditioning with a distance-decayed accumulation of cell values (figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>). <italic>A</italic>,&#x00A0;unconditioned and <italic>B</italic>,&#x00A0;with distance-decayed conditioning.</p><p content-type="toc">Figure 5.&#x2003;Diagrams showing an example of a static physiographic variable, number of surficial geology contacts, shown before and after spatial conditioning with a distance-decayed accumulation of cell values.</p></caption>
<long-desc>Surficial geology contacts range from 0 to 1. Decayed accumulation of surficial geology contacts ranges from 0 to 6.</long-desc><graphic xlink:href="rol21-0070_fig05"/></fig>
</sec>
<sec>
<title>Evaluation of Predictor Variable Conditioning</title>
<p>The dynamic and static variables were temporally and spatially conditioned to amplify the relation of the predictor variables to the response variable (monthly streamflow). Pearson correlation coefficient (<italic>r</italic>) was used to evaluate the effect of temporal and spatial conditioning on the predictor-response relation. Values of <italic>r</italic> were compared between a variable without conditioning and the equivalent variable with temporal or spatial conditioning (for example, unconditioned forest cover and forest cover with area-averaged spatial conditioning).</p>
</sec>
</sec>
<sec>
<title>Development of Machine Learning Streamflow Model</title>
<p>The MLFLOW model used a gradient boosting machine (<xref ref-type="bibr" rid="r11">Friedman, 2001</xref>, <xref ref-type="bibr" rid="r12">2002</xref>; also known as boosted trees or generalized boosted models) because of the potential for the machine learning model to fit complex relations between the predictor variables (24&#x00A0;geology, hydrology, topography, vegetation, human, and climate variables, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) and the response variable (monthly streamflow; streamflow observations, <xref ref-type="table" rid="t01">table&#x00A0;1</xref>). Gradient boosting machines are one of many machine learning models, including random forests, support vector machines, and neural networks, that are used in various hydrologic modeling (<xref ref-type="bibr" rid="r42">Shen and others, 2018</xref>), including modeling of streamflow (<xref ref-type="bibr" rid="r4">Bellos and Carbajal, 2020</xref>). Gradient boosting machines were applied in this study using the &#x201C;caret&#x201D; (<xref ref-type="bibr" rid="r19">Kuhn 2008</xref>, <xref ref-type="bibr" rid="r20">2020</xref>) and &#x201C;gbm&#x201D; (<xref ref-type="bibr" rid="r14">Greenwell and others, 2020</xref>; <xref ref-type="bibr" rid="r39">Ridgeway, 2020</xref>) packages in R (<xref ref-type="bibr" rid="r37">R&#x00A0;Core Team, 2021</xref>).</p>
<p>A gradient boosting machine is a machine learning model used to solve regression or classification problems. The gradient boosting machine produces a predictive model that is an ensemble of many weaker models, which are typically implemented as decision trees. Gradient boosting is treated statistically as a numerical optimization problem with an objective of minimizing the loss of the model by adding weak learners using a functional gradient descent (<xref ref-type="bibr" rid="r12">Friedman, 2002</xref>). Gradient boosting machines are applied with the following features in R (<xref ref-type="bibr" rid="r39">Ridgeway, 2020</xref>): a loss function (squared error) to optimize; a weak learner (decision trees) for predicting; and an additive model (stochastic gradient descent) for adding weak learners to minimize the loss function. However, gradient boosting is a greedy model, meaning gradient boosting can readily overfit the data (<xref ref-type="bibr" rid="r11">Friedman, 2001</xref>). To limit the model from overfitting the data, the following four hyperparameters of gradient boosting machines can be tuned using packages in R (<xref ref-type="bibr" rid="r20">Kuhn, 2020</xref>, <xref ref-type="bibr" rid="r14">Greenwell and others, 2020</xref>): (1)&#x00A0;the learning rate (shrinkage) of the decision trees; (2)&#x00A0;the maximum interaction depth (number of branches, or intermediate nodes) of a tree; (3)&#x00A0;the minimum number of observations per leaf, or terminal node, of a tree; and (4)&#x00A0;the number of iterations (trees) fit. The hyperparameters of the models in this study were tuned as follows: 0.005, 0.01, or 0.05 for shrinkage; 1 or 5 for maximum interaction depth; 5 or 10 for minimum number of observations; and 500 for number of trees. More detail on modeling with gradient boosting machines in general and specifically in R is available in <xref ref-type="bibr" rid="r21">Kuhn and Johnson (2013)</xref>.</p>
<p>Development of the MLFLOW model proceeded in an iterative process of model calibration and validation. Streamflow observations (monthly streamflow) used to fit the model were split into training and testing samples using repeated <italic>k</italic>-fold cross-validation (<xref ref-type="bibr" rid="r15">Hastie and others, 2009</xref>). The data were randomly split into 10&#x00A0;folds, with each fold containing 10&#x00A0;percent of the data. The model was iteratively trained with 9&#x00A0;folds and tested against 1&#x00A0;fold left out. The splitting of data into 10&#x00A0;folds was repeated 5&#x00A0;times, totaling 50&#x00A0;iterations (resamples) of training and testing. For each resample, the model fitted 500&#x00A0;trees, with each tree iteratively learning from the previously fitted tree.</p>
<sec>
<title>Evaluation of Predictive Performance and Variable Importance</title>
<p>Predictive performance of the MLFLOW model was evaluated with the following four goodness-of-fit metrics: coefficient of determination (<italic>R</italic><sup>2</sup>), Nash-Sutcliffe efficiency (NSE), NSE with log-transformed data (logNSE), and percent bias (PBIAS). <italic>R</italic><sup>2</sup> measures the proportion of the data variance that can be explained by the model; values greater than 0.5 may be considered acceptable (<xref ref-type="bibr" rid="r23">Legates and McCabe, 1999</xref>). NSE measures the relation of the residual variance to the data variance (<xref ref-type="bibr" rid="r31">Nash and Sutcliffe, 1970</xref>). Because NSE is computed with squared values of predictions and observations, the metric is biased toward predictions agreeing with observations at larger values. Log transformation of data reduces the difference between large and small values, so logNSE is more responsive to agreement between predictions and observations at smaller values. PBIAS measures the average tendency that predicted values are greater or less than observed values (<xref ref-type="bibr" rid="r29">Moriasi and others, 2007</xref>). For the purposes of hydrologic modeling of monthly streamflow, values of NSE and logNSE greater than 0.5 and PBIAS less than 25&#x00A0;percent are considered satisfactory (<xref ref-type="bibr" rid="r29">Moriasi and others, 2007</xref>).</p>
<p>The MLFLOW model was developed not only as a predictive model but also as an explanatory model. The explanatory power of predictor variables in the model was evaluated with variable importance, which was computed as the reduction in squared error in any decision tree fitted using a predictor variable, averaged for every tree that included the variable (<xref ref-type="bibr" rid="r11">Friedman, 2001</xref>). Variable importance was evaluated for the MLFLOW model fitted to all data and for models fitted to data from each watershed or year separately. Moreover, the relation between predictor variables and monthly streamflow was interpreted with partial dependence plots, which emphasize the effect of each predictor variable by marginalizing the effect of all other predictor variables in the model (<xref ref-type="bibr" rid="r11">Friedman, 2001</xref>).</p>
</sec>
<sec>
<title>Explanation and Prediction of Streamflow</title>
<p>The MLFLOW model was initially fitted to all data. This initial model was used to explain monthly streamflow in relation to the predictor variables for all the study area and study period (2012&#x2013;17), and predictive performance and variable importance (including partial dependence plots) were evaluated. Furthermore, predictive performance was evaluated independently for every site with at least 8&#x00A0;streamflow observations, which was the median number of observations per site of all 125&#x00A0;sites sampled. Nine additional models were fitted, one model for each watershed (using data from all years in each model) and one model for each year (using data from all watersheds in each model). These additional models were used to explain spatial or temporal variation in the relation of monthly streamflow to the predictor variables, and variable importance of each model was evaluated. Lastly, the initial model, fitted to all data, was used to predict monthly streamflow at every cell on the channel networks in every month of 2012&#x2013;17. The predictions were assumed to qualitatively represent natural streamflow because the MLFLOW model had no predictor variables or mechanism for quantifying the effect of human infrastructure on streamflow. The monthly streamflow prediction data for each cell (17,518&#x00A0;cells) and month (72&#x00A0;months) are available in the accompanying data release (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>). Further models were fitted for qualitatively assessing model sensitivity, which is explained in the following section.</p>
</sec>
<sec>
<title>Assessment of Model Sensitivity</title>
<p>Multiple models were fitted to qualitatively assess sensitivity of the MLFLOW model to variations of the predictor variables and streamflow observations. The MLFLOW model was applied using the following three variations of the data: (1)&#x00A0;dynamic or static predictor variables with or without temporal or spatial conditioning, (2)&#x00A0;streamflow observations differentially grouped by watershed or year, and (3)&#x00A0;streamflow observations progressively reduced by percentage of sites or months available. Models developed using these data variations were qualitatively assessed using goodness-of-fit metrics (<italic>R</italic><sup>2</sup>, NSE, logNSE, and PBIAS).</p>
<p>Models fitted to different combinations of predictor variables were used to assess sensitivity to selection and conditioning of the predictor variables&#x2014;static versus dynamic; conditioned versus unconditioned. The following five combinations of the variables were used to fit models: (1)&#x00A0;unconditioned climatic (dynamic) variables (4&#x00A0;climate variables); (2)&#x00A0;dynamic variables with temporal conditioning (20&#x00A0;moving-average variants of the climate variables); (3)&#x00A0;unconditioned dynamic and physiographic and anthropogenic (static) variables (24&#x00A0;variables); (4)&#x00A0;dynamic and static variables with spatial conditioning (24&#x00A0;variables); and (5)&#x00A0;static variables with spatial conditioning and dynamic variables with spatial and temporal conditioning (44&#x00A0;variables, including the 20&#x00A0;moving-average variants of the climate variables) (<xref ref-type="table" rid="t03">table&#x00A0;3</xref>).</p>
<p>Model sensitivity to fitting data from a different area (watershed) or period (year) was assessed by differentially grouping streamflow observations by watershed or year for use in model training and testing. The following two approaches were used in grouping the observations: (1)&#x00A0;training with observations from all groups (all watersheds and years) and testing on observations from each watershed (1, 2, or 3) or year (2012, 2013, 2014, 2015, 2016, or 2017) separately and (2)&#x00A0;training with observations from all groups except one group left out sequentially and testing on observations from the left-out group; for example, training with data from watersheds&#x00A0;1 and 2 and testing on data from watershed&#x00A0;3, or training with data from 2012 through 2016 and testing on data from 2017.</p>
<p>Model sensitivity to the quantity of data fitted was assessed by progressively reducing streamflow observations by percentage of sites or months available for use in model fitting. Models were fitted using observations randomly sampled, with 50&#x00A0;iterations, in two percentages as follows: (1)&#x00A0;observations from only 50&#x00A0;percent of the sites or months (63&#x00A0;sites or 18&#x00A0;months on average per random sample) and (2)&#x00A0;observations from only 15&#x00A0;percent of the sites or months (about 19&#x00A0;sites or 6&#x00A0;months on average per random sample).</p>
</sec>
</sec>
</sec>
<sec>
<title>Results of Machine Learning Approach to Modeling Streamflow</title>
<p>The MLFLOW model performed satisfactorily&#x2014;streamflow predictions generally agreed with observations at low and high flows. Most of the important predictors of monthly streamflow were temporally conditioned dynamic climatic variables. Temporal and spatial conditioning of the predictor variables improved the explanatory power of the variables in the MLFLOW model. The next three sections discuss predictive performance, variable importance, and model sensitivity to variations of the predictor variables and streamflow observations (including comparison of the MLFLOW model with other streamflow modeling approaches). Spatial and temporal variability of the streamflow predictions and utility and limitations of the MLFLOW model are discussed in the last two sections.</p>
<sec>
<title>Predictive Performance</title>
<p>The MLFLOW model fitted to all data had satisfactory agreement between observed and predicted streamflow (<italic>R</italic><sup>2</sup>=0.80, NSE=0.79, logNSE=0.82, and PBIAS=0.7 percent, <xref ref-type="fig" rid="fig06">fig.&#x00A0;6</xref>). The equivalence between NSE (0.79) and logNSE (0.82) indicated the MLFLOW model performed equally well for high and low flows. PBIAS (0.7&#x00A0;percent) indicated the MLFLOW model did not overpredict or underpredict monthly streamflow in general. For streamflow observations less than 10&#x00A0;ft<sup>3</sup>/s, the model on average overpredicted by less than 1&#x00A0;percent. For streamflow observations toward 20&#x00A0;ft<sup>3</sup>/s, the model on average underpredicted by about 5&#x00A0;percent; however, the model had fewer streamflow observations to fit toward the upper range of values.</p>
<fig id="fig06" position="float" fig-type="figure"><label>Figure 6</label><caption><p>Relation between observed and predicted streamflow.</p><p content-type="toc">Figure 6.&#x2003;Graph showing relation between observed and predicted streamflow.</p></caption>
<long-desc>Observed and predicted streamflow are consistent and range from 0 to 10 cubic feet per second for most sampling units.</long-desc><graphic xlink:href="rol21-0070_fig06"/></fig>
<p>The MLFLOW model performed equally well for all months with streamflow observations. For each month, the mean of the distribution of residuals (observed minus predicted streamflow) of the model was about zero (<xref ref-type="fig" rid="fig07">fig.&#x00A0;7</xref>), indicating model predictions for months with fewer observations (such as February and April) were not biased by the model potentially overfitting data for months with more observations (such as June and July). For most months, the median of model predictions was higher than the median of streamflow observations, and the distribution of model predictions for all months was more limited than the distribution of streamflow observations (<xref ref-type="fig" rid="fig07">fig.&#x00A0;7</xref>), suggesting the MLFLOW model was fitting toward the mean of the data.</p>
<fig id="fig07" position="float" fig-type="figure"><label>Figure 7</label><caption><p>Distribution of observed and predicted streamflow and residual values (observed minus predicted) for each month sampled during 2012&#x2013;17.</p><p content-type="toc">Figure 7.&#x2003;Boxplots showing distribution of observed and predicted streamflow and residual values for each month sampled during 2012&#x2013;17.</p></caption>
<long-desc>Observed and predicted streamflow are consistent for all months and range from 0 to 20 cubic feet per second.</long-desc><graphic xlink:href="rol21-0070_fig07"/></fig>
<p>Goodness-of-fit metrics computed independently for every site with at least eight streamflow observations also indicated acceptable performance of the MLFLOW model (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>). Medians of the goodness-of-fit metrics for the 85&#x00A0;sites analyzed were 0.81 for <italic>R</italic><sup>2</sup>, 0.74 for NSE, 0.77 for logNSE, and 2.3&#x00A0;percent for PBIAS (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>), not much different from the metrics for the MLFLOW model fitted to all data (<italic>R</italic><sup>2</sup>=0.80, NSE=0.79, logNSE=0.82, and PBIAS=0.7 percent, <xref ref-type="fig" rid="fig06">fig.&#x00A0;6</xref>). Median absolute deviation (MAD) of the metrics indicated some variation in model performance among the sites analyzed (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>). However, the difference between median and MAD for PBIAS was less than 25&#x00A0;percent (median plus or minus MAD), so the MLFLOW model performed satisfactorily even for sites toward the tails of the distribution (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>). The other three metrics were all greater than or equal to 0.5 (median minus MAD) toward the lower distributional tail, also indicating satisfactory model performance (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>).</p>
<table-wrap id="t04" position="float">
<label>Table 4</label><caption><title>Distribution of goodness-of-fit metrics from repeated cross-validation of the machine learning streamflow model, fitted to all streamflow observations, computed for each site independently.</title>
<p content-type="toc">Table 4.&#x2003;Distribution of goodness-of-fit metrics from repeated cross-validation of the machine learning streamflow model, fitted to all streamflow observations, computed for each site independently.</p>
<p>[Metrics were analyzed for 85 sites with at least 8 streamflow observations, which was the median number of observations per site of all 125 sites sampled. <italic>R</italic><sup>2</sup>, coefficient of determination; NSE, Nash-Sutcliffe efficiency; logNSE, NSE with log-transformed data; PBIAS, percent bias; MAD, median absolute deviation]</p>
</caption>
<table rules="groups">
<col width="17.95%"/>
<col width="11.05%"/>
<col width="9.92%"/>
<col width="11.76%"/>
<col width="8.98%"/>
<col width="11.19%"/>
<col width="8.98%"/>
<col width="11.19%"/>
<col width="8.98%"/>
<thead>
<tr>
<td rowspan="2" valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number of<break/>sites analyzed</td>
<td colspan="2" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt"><italic>R</italic><sup>2</sup></td>
<td colspan="2" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt">NSE</td>
<td colspan="2" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt">logNSE</td>
<td colspan="2" valign="top" align="center" scope="colgroup" style="border-top: solid 0.50pt">PBIAS</td>
</tr>
<tr>
<td valign="top" colspan="1" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">MAD</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">MAD</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">MAD</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median</td>
<td valign="top" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">MAD</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="row">85</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.81</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.18</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.74</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.24</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.77</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.21</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">2.3</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">16.8</td>
</tr>
</tbody></table></table-wrap>
<p>Model performance at the three sites with the most streamflow observations in this study (sites&#x00A0;25, 27, and 31, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) was acceptable (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>A</italic>&#x2013;<italic>C</italic></xref>). All four goodness-of-fit metrics indicated site&#x00A0;31 (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>C</italic></xref>) had the best performance in general (<italic>R</italic><sup>2</sup>=0.88, NSE=0.88, logNSE=0.91, and PBIAS=0.8&#x00A0;percent). Site&#x00A0;25 (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>A</italic></xref>) had <italic>R</italic><sup>2</sup> of 0.89 and NSE of 0.84, comparable with values for site&#x00A0;31, but had worse logNSE (0.77) and PBIAS (&#x2212;3.6&#x00A0;percent) than for site&#x00A0;31. Site&#x00A0;27 (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>B</italic></xref>) had the best PBIAS (&#x2212;0.6&#x00A0;percent) and had logNSE (0.75) similar to the value for site&#x00A0;25, but <italic>R</italic><sup>2</sup> (0.62) and NSE (0.61) were more than 25&#x00A0;percent less than for sites&#x00A0;25 or 31. Qualitatively, the MLFLOW model tended to underpredict the highest flows in most years, whereas lower flows were overpredicted in some years but underpredicted in others (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>A</italic>&#x2013;<italic>C</italic></xref>). On average for the three sites, NSE (0.78) was lower than logNSE (0.81) (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>A</italic>&#x2013;<italic>C</italic></xref>), indicating performance was only about 3&#x00A0;percent better for low flows than for high flows, the same as for the MLFLOW model fitted to all data. The tendency to underpredict the highest flows probably resulted from fewer observations of high flows than of low flows available for use in fitting the MLFLOW model.</p>
<fig id="fig08" position="float" fig-type="figure"><label>Figure 8</label><caption><p>Observed and predicted streamflow at the three sites (<xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) with the most streamflow observations. <italic>A</italic>,&#x00A0;site 25; <italic>B</italic>,&#x00A0;site 27; and <italic>C</italic>,&#x00A0;site 31.</p><p content-type="toc">Figure 8.&#x2003;Graphs showing observed and predicted streamflow at the three sites with the most streamflow observations.</p></caption>
<long-desc>For the 3 sites, observed and predicted streamflow range from 0 to 20 cubic feet per second and is highest in 2017.</long-desc><graphic xlink:href="rol21-0070_fig08"/></fig>
<p>In comparison, <xref ref-type="bibr" rid="r28">Miller and others (2018)</xref> used a random forest model fitted to USGS streamgage data to predict monthly streamflow for the conterminous United States. The approach of <xref ref-type="bibr" rid="r28">Miller and others (2018)</xref> included temporal conditioning of dynamic climatic predictor variables, as in this study. Median NSE ranged from 0.5 to 0.9 and median PBIAS ranged from &#x2212;15 to 5&#x00A0;percent (<xref ref-type="bibr" rid="r28">Miller and others, 2018</xref>), compared with median NSE of 0.74 and median PBIAS of 2.3&#x00A0;percent for every site with at least eight observations in this study (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>). In addition, the model in <xref ref-type="bibr" rid="r28">Miller and others (2018)</xref> was fitted to few data from low-order streams, unlike the MLFLOW model in this study.</p>
<p>Additionally, the Precipitation-Runoff Modeling System, a physically based hydrologic model, was used to model monthly streamflow in seven watersheds (about HUC12 in size) in Montana (<xref ref-type="bibr" rid="r8">Chase and others, 2016</xref>). USGS streamgage data were used by <xref ref-type="bibr" rid="r8">Chase and others (2016)</xref> to calibrate and validate the model, whereas discrete streamflow observations were used in this study. The model performance in <xref ref-type="bibr" rid="r8">Chase and others (2016)</xref> ranged from negative values for NSE to about 0.75, whereas in this study, median NSE was 0.74 at sites with at least eight observations (<xref ref-type="table" rid="t04">table&#x00A0;4</xref>) and NSE was 0.84 or more at the two sites with the most observations (<xref ref-type="fig" rid="fig08">fig.&#x00A0;8<italic>A</italic>, <italic>C</italic></xref>).</p>
</sec>
<sec>
<title>Variable Importance</title>
<p>The most important variables (statistically important in the MLFLOW model) for explaining monthly streamflow were the 6-month moving average of precipitation and the 3-month moving average of snow water equivalent (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>). The top 20&#x00A0;important variables were populated by the following 14&#x00A0;moving-average variants: 4&#x00A0;precipitation variants, 4&#x00A0;snow water equivalent variants, 3&#x00A0;evapotranspiration variants, and 3&#x00A0;temperature variants. Forest cover was the only static variable among the top five variables (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>). Elevation, drainage area, depth to water table, number of diversions, and number of surficial geology contacts&#x2014;also static variables&#x2014;also were important variables for explaining monthly streamflow.</p>
<fig id="fig09" position="float" fig-type="figure"><label>Figure 9</label><caption><p>Relative importance of the top 20&#x00A0;variables (code, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) used in the model fitted to all data.</p><p content-type="toc">Figure 9.&#x2003;Graph showing relative importance of the top 20 variables used in the model fitted to all data.</p></caption>
<long-desc>Relative importance of top 20 variables ranges from 5 to 100 percent. Among top 20 are mostly dynamic climatic variables.</long-desc><graphic xlink:href="rol21-0070_fig09"/></fig>
<p>The 20&#x00A0;most important variables in the MLFLOW model had simple to more complex relations with monthly streamflow as interpreted with partial dependence plots. Many of the relations between the predictor variables and monthly streamflow were intuitive. Monthly streamflow increased with increasing drainage area (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>O</italic></xref>) and number of surficial geology contacts (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>P</italic></xref>) and decreased with increasing elevation (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>R</italic></xref>) and depth to water table (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>T</italic></xref>). Monthly streamflow also increased with increasing 6-, 9-, and 12-month moving averages of precipitation (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>E</italic>&#x2013;<italic>G</italic></xref>, respectively) and 3- and 6-month moving averages of snow water equivalent (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>J</italic>, <italic>K</italic></xref>, respectively) and decreased with increasing 9-month moving averages of evapotranspiration (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>C</italic></xref>) and temperature (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>N</italic></xref>). Other relations between the predictor variables and monthly streamflow were counterintuitive. Monthly streamflow decreased with increasing number of water diversions (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>Q</italic></xref>) and forest cover (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>S</italic></xref>). However, in the study area, most forest cover was at higher elevations where channels had smaller drainage areas, and most water diversions were on channels with larger drainage areas at lower elevations. Additionally, each climate variable had relations with monthly streamflow that reversed when proceeding from shorter to longer moving-average variants. For example, monthly streamflow increased with increasing current month and 1-month moving average of evapotranspiration (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>A</italic>, <italic>B</italic></xref>, respectively) and current month and 6-month moving average of temperature (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>L</italic>, <italic>M</italic></xref>, respectively), but monthly streamflow decreased with 9-month moving averages of evapotranspiration (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>C</italic></xref>) and temperature (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>N</italic></xref>). In addition, intermediate values of some predictor variables produced the highest monthly streamflow (such as depth to water table, <xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>T</italic></xref>) or the lowest (such as 1-month moving average of snow water equivalent, <xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>I</italic></xref>), whereas for other predictor variables, the relation with monthly streamflow had more than one inflection. For example, as current month of snow water equivalent increased (<xref ref-type="fig" rid="fig10">fig.&#x00A0;10<italic>H</italic></xref>), monthly streamflow increased, then decreased, and then increased again.</p>
<fig id="fig10" position="float" fig-type="figure"><label>Figure 10</label><caption><p>Partial dependence of streamflow on the top 20&#x00A0;most important variables (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>) used in the model fitted to all data. <italic>A</italic>&#x2013;<italic>N</italic>,&#x00A0;dynamic climatic variables; <italic>O</italic>&#x2013;<italic>T</italic>,&#x00A0;static physiographic and anthropogenic variables (code, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>).</p><p content-type="toc">Figure 10.&#x2003;Graphs showing partial dependence of streamflow on the top 20 most important variables used in the model fitted to all data.</p></caption>
<long-desc>Partial dependence of streamflow on top 20 most important variables differs among the variables.</long-desc><graphic xlink:href="rol21-0070_fig10"/></fig>
<p>The most important variables in models fitted to data from each watershed did not differ substantially among the watersheds or from the MLFLOW model fitted to all data. For each watershed, moving-average variants of precipitation and snow water equivalent were still the most important variables (<xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>A</italic>&#x2013;<italic>C</italic></xref>), suggesting insubstantial spatial variability in the relation between monthly streamflow and the predictor variables. The effect of precipitation and snow water equivalent on monthly streamflow was apparently as important across small or large areas (each watershed modeled separately; <xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>A</italic>&#x2013;<italic>C</italic></xref>) as across the study area (all watersheds modeled together; <xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>). However, forest cover, contrary to its ranking as third most important variable in the model fitted to all data (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>), was the fifth most important variable for watershed&#x00A0;1 (<xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>A</italic></xref>) and was not among the top five important variables for the other two watersheds (<xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>B</italic>, <italic>C</italic></xref>). In contrast, the 9-month moving average of snow water equivalent was not among the top 20&#x00A0;important variables in the model fitted to all data (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>) but was the third most important variable for watershed&#x00A0;3 (<xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>C</italic></xref>). Additionally, drainage area was among the top five important variables for watersheds&#x00A0;2 and 3 (<xref ref-type="fig" rid="fig11">fig.&#x00A0;11<italic>B</italic>, <italic>C</italic></xref>, respectively), but drainage area was only the 11th&#x00A0;most important variable in the model fitted to all data (<xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>). The increased importance of drainage area was understandable for watershed&#x00A0;3, which has the largest area (55&#x00A0;mi<sup>2</sup>) of the three watersheds, but the increased importance of drainage area was surprising for watershed&#x00A0;2, which has the smallest area (26&#x00A0;mi<sup>2</sup>) (<xref ref-type="bibr" rid="r49">U.S.&#x00A0;Geological Survey, 2019a</xref>).</p>
<fig id="fig11" position="float" fig-type="figure"><label>Figure 11</label><caption><p>Relative importance of the top five variables (code, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) used in models fitted to data only from each watershed. <italic>A</italic>,&#x00A0;watershed 1; <italic>B</italic>,&#x00A0;watershed 2; and <italic>C</italic>,&#x00A0;watershed 3.</p><p content-type="toc">Figure 11.&#x2003;Graphs showing relative importance of the top five variables used in models fitted to data only from each watershed.</p></caption>
<long-desc>For the 3 watersheds, relative importance of top 5 variables ranges from 25 to 100 percent.</long-desc><graphic xlink:href="rol21-0070_fig11"/></fig>
<p>The most important variables in models fitted to data from each year differed considerably among the years and from the MLFLOW model fitted to all data. Six static variables (cropland cover, number of surficial geology contacts, number of water diversions, forest cover, depth to water table, and drainage area)&#x2014;from one to three variables for any year&#x2014;were among the top five most important variables (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>A</italic>&#x2013;<italic>F</italic></xref>), which suggests considerable temporal variability in the relation between the predictor variables and monthly streamflow. In addition, moving-average variants of precipitation (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>A</italic>, <italic>C</italic></xref>) and snow water equivalent (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>D</italic>, <italic>F</italic></xref>) were among the top five important variables in only 2&#x00A0;years, whereas moving-average variants of evapotranspiration and temperature were among the top five important variables in all but 1&#x00A0;year (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>B</italic>&#x2013;<italic>F</italic></xref>). During a shorter period (each year modeled separately; <xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>A</italic>&#x2013;<italic>F</italic></xref>), the effect of evapotranspiration and temperature seemed more important, whereas the effect of precipitation and snow water equivalent seemed more important during the longer study period (all years modeled together; <xref ref-type="fig" rid="fig09">fig.&#x00A0;9</xref>). For 3&#x00A0;years, a dynamic variable was the most important&#x2014;the 1-month moving average of snow water equivalent for 2015 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>D</italic></xref>) and the 9-month moving average of evapotranspiration for 2016 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>E</italic></xref>) and 2017 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>F</italic></xref>). However, for the other 3&#x00A0;years, a static variable was the most important&#x2014;cropland cover for 2012 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>A</italic></xref>) and 2013 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>B</italic></xref>) and forest cover for 2014 (<xref ref-type="fig" rid="fig12">fig.&#x00A0;12<italic>C</italic></xref>). Although the varying importance of static variables from 1&#x00A0;year to another year may seem illogical, several reasons are possible. Static variables can interact differently with dynamic variables from year to year, resulting in the increased importance of a static variable. Dynamic variables alone can decrease in importance from year to year, which relatively increases the importance of a static variable. Dynamic variables may vary less within 1&#x00A0;year than during several years, resulting in the decreased importance of dynamic variables for a single year. Lastly, one static variable may substitute, in relative importance, for many correlated dynamic variables.</p>
<fig id="fig12" position="float" fig-type="figure"><label>Figure 12</label><caption><p>Relative importance of the top five variables (code, <xref ref-type="table" rid="t03">table&#x00A0;3</xref>) used in models fitted to data only from each year. <italic>A</italic>,&#x00A0;2012; <italic>B</italic>,&#x00A0;2013; <italic>C</italic>,&#x00A0;2014; <italic>D</italic>,&#x00A0;2015; <italic>E</italic>,&#x00A0;2016; and <italic>F</italic>,&#x00A0;2017.</p><p content-type="toc">Figure 12.&#x2003;Graphs showing relative importance of the top five variables used in models fitted to data only from each year.</p></caption>
<long-desc>For 2012 through 2017, relative importance of top 5 variables ranges from 10 to 100 percent.</long-desc><graphic xlink:href="rol21-0070_fig12"/></fig>
</sec>
<sec>
<title>Model Sensitivity</title>
<p>Temporal and spatial conditioning intensified the relation of many predictor variables with monthly streamflow, resulting in more information that the MLFLOW model could use for predicting streamflow. For many of the variables, such as forest cover or surficial geology contacts, unconditioned values of many cells on the channel networks were zero (<xref ref-type="fig" rid="fig04">figs.&#x00A0;4<italic>A</italic></xref>, <xref ref-type="fig" rid="fig05">5<italic>A</italic></xref>), but with spatial conditioning, values of most cells on the channel networks were nonzero (<xref ref-type="fig" rid="fig04">figs.&#x00A0;4<italic>B</italic></xref>, <xref ref-type="fig" rid="fig05">5<italic>B</italic></xref>), providing the MLFLOW model with more diversely valued variables that might better explain variation in streamflow. Temporal conditioning increased <italic>r</italic> for the dynamic variables by as much as 0.34 compared with the equivalent variable without conditioning (<xref ref-type="table" rid="t05">table&#x00A0;5</xref>). For example, current month of precipitation had <italic>r</italic> of &#x2212;0.02 (ppt_00, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>), whereas the 6-month moving average of precipitation had <italic>r</italic> of 0.36 (ppt_06, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>). Spatial conditioning increased <italic>r</italic> for the moving-average variants of the dynamic variables by as much as 0.25 (<xref ref-type="table" rid="t05">table&#x00A0;5</xref>). For example, the 1-month moving average of snow water equivalent without conditioning had <italic>r</italic> of 0.10 (swe_01, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>), but the 1-month moving average of snow water equivalent with conditioning had <italic>r</italic> of 0.35 (swe_01_a, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>). For the static variables, spatial conditioning increased <italic>r</italic> by as much as 0.15, such as for surficial geology contacts (contacts, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>). Also, for some static variables, correlation between the unconditioned variable and monthly streamflow was not possible to compute because all values of the variable were zero for any streamflow observation, such as for water diversions (diversions, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>). However, spatial conditioning produced a correlation of the variable with streamflow; for example, <italic>r</italic> of 0.09 for water diversions (diversions_d, <xref ref-type="table" rid="t05">table&#x00A0;5</xref>).</p>
<table-wrap id="t05" position="float">
<label>Table 5</label><caption><title>Relation of streamflow to some dynamic and static variables before and after temporal or spatial conditioning.</title>
<p content-type="toc">Table 5.&#x2003;Relation of streamflow to some dynamic and static variables before and after temporal or spatial conditioning.</p>
<p>[<italic>r</italic>, Pearson correlation coefficient; --, not possible to compute]</p>
</caption>
<table rules="groups">
<col width="27.38%"/>
<col width="20.5%"/>
<col width="34.22%"/>
<col width="17.9%"/>
<thead>
<tr>
<td valign="middle" colspan="2" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Variable without conditioning</td>
<td valign="middle" colspan="2" align="center" scope="colgroup" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Equivalent variable with conditioning</td>
</tr>
<tr>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Code (<xref ref-type="table" rid="t03">table 3</xref>)</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt"><italic>r</italic></td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Code (<xref ref-type="table" rid="t03">table 3</xref>)</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt"><italic>r</italic></td>
</tr>
</thead>
<tbody>
<tr>
<th valign="middle" colspan="4" align="char" char="." style="border-top: solid 0.50pt" scope="col">&#x00A0;&#x00A0;Temporal conditioning of climatic (dynamic) variables</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">eta_00</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.26</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">eta_01</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.16</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">ppt_00</td>
<td valign="top" align="char" char=".">&#x2212;0.02</td>
<td valign="top" align="left">ppt_06</td>
<td valign="top" align="char" char=".">0.36</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">swe_00</td>
<td valign="top" align="char" char=".">&#x2212;0.06</td>
<td valign="top" align="left">swe_03</td>
<td valign="top" align="char" char=".">0.33</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">tmp_00</td>
<td valign="top" align="char" char=".">0.01</td>
<td valign="top" align="left">tmp_06</td>
<td valign="top" align="char" char=".">&#x2212;0.26</td>
</tr>
<tr>
<th valign="middle" colspan="4" align="char" char="." style="border-top: solid 0.50pt" scope="col">&#x00A0;&#x00A0;Spatial conditioning of dynamic and physiographic and anthropogenic (static) variables</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">eta_12</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.13</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">eta_12_a</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.20</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">ppt_09</td>
<td valign="top" align="char" char=".">0.31</td>
<td valign="top" align="left">ppt_09_a</td>
<td valign="top" align="char" char=".">0.51</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">swe_01</td>
<td valign="top" align="char" char=".">0.10</td>
<td valign="top" align="left">swe_01_a</td>
<td valign="top" align="char" char=".">0.35</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">tmp_01</td>
<td valign="top" align="char" char=".">&#x2212;0.10</td>
<td valign="top" align="left">tmp_01_a</td>
<td valign="top" align="char" char=".">&#x2212;0.12</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">contacts</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">&#x2212;0.02</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">contacts_d</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.17</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">diversions</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">--</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">diversions_d</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.09</td>
</tr>
<tr>
<td valign="top" align="left" style="background-color:rgb(242,242,242)" scope="row">forest</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">&#x2212;0.07</td>
<td valign="top" align="left" style="background-color:rgb(242,242,242)">forest_a</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">&#x2212;0.20</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">waterTable</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.04</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">waterTable_a</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">&#x2212;0.06</td>
</tr>
</tbody></table></table-wrap>
<p>Models fitted to different combinations of predictor variables were used to qualitatively assess sensitivity of the MLFLOW model to selection and conditioning of the predictor variables&#x2014;static versus dynamic; conditioned versus unconditioned. Performance improved progressively from model&#x00A0;1 to model&#x00A0;5 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). All models had acceptable performance, although model&#x00A0;1 was marginal for three goodness-of-fit metrics (<italic>R</italic><sup>2</sup>=0.54, NSE=0.52, and logNSE=0.49, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>). However, model&#x00A0;5 (the model fitted to all data) had the best performance (<italic>R</italic><sup>2</sup>=0.80, NSE=0.79, logNSE=0.82, and PBIAS=0.7&#x00A0;percent, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>).</p>
<table-wrap id="t06" position="float">
<label>Table 6</label><caption><title>Goodness-of-fit metrics for models fitted to different combinations of static and dynamic variables before or after spatial or temporal conditioning, indicating qualitative model sensitivity to selection and conditioning of the predictor variables.</title>
<p content-type="toc">Table 6.&#x2003;Goodness-of-fit metrics for models fitted to different combinations of static and dynamic variables before or after spatial or temporal conditioning, indicating qualitative model sensitivity to selection and conditioning of the predictor variables.</p>
<p>[Metrics are the mean of 50 resamples (tenfold cross-validation with 5 repetitions) per model. <italic>R</italic><sup>2</sup>, coefficient of determination; NSE, Nash-Sutcliffe efficiency; logNSE, NSE with log-transformed data; PBIAS, percent bias]</p>
</caption>
<table rules="groups">
<col width="8.83%"/>
<col width="49.29%"/>
<col width="10.47%"/>
<col width="6.66%"/>
<col width="6.66%"/>
<col width="9.52%"/>
<col width="8.57%"/>
<thead>
<tr>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Model<break/>number</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Selection and conditioning of variables<break/>used to fit model</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Number<break/>of variables</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt"><italic>R</italic><sup>2</sup></td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">NSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">logNSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">PBIAS</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt" scope="row">1</td>
<td valign="top" align="left" style="border-top: solid 0.50pt">Unconditioned climatic (dynamic) variables</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">4</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.54</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.52</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.49</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.6</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">2</td>
<td valign="top" align="left">Dynamic variables with temporal conditioning</td>
<td valign="top" align="char" char=".">20</td>
<td valign="top" align="char" char=".">0.60</td>
<td valign="top" align="char" char=".">0.58</td>
<td valign="top" align="char" char=".">0.58</td>
<td valign="top" align="char" char=".">1.2</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">3</td>
<td valign="top" align="left">Unconditioned dynamic and physiographic and anthropogenic (static) variables</td>
<td valign="top" align="char" char=".">24</td>
<td valign="top" align="char" char=".">0.66</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">0.64</td>
<td valign="top" align="char" char=".">1.2</td>
</tr>
<tr>
<td valign="top" align="left" scope="row">4</td>
<td valign="top" align="left">Dynamic and static variables with spatial conditioning</td>
<td valign="top" align="char" char=".">24</td>
<td valign="top" align="char" char=".">0.69</td>
<td valign="top" align="char" char=".">0.67</td>
<td valign="top" align="char" char=".">0.69</td>
<td valign="top" align="char" char=".">0.6</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt" scope="row">5</td>
<td valign="top" align="left" style="border-bottom: solid 0.50pt">Static variables with spatial conditioning and dynamic variables with spatial and temporal conditioning</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt">44</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt">0.80</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt">0.79</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt">0.82</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt">0.7</td>
</tr>
</tbody></table></table-wrap>
<p>Related to this study, <xref ref-type="bibr" rid="r17">Jaeger and others (2019)</xref> used a machine learning approach to modeling streamflow permanence (presence or absence of streamflow) in the Columbia River Basin. <xref ref-type="bibr" rid="r17">Jaeger and others (2019)</xref> also applied spatial conditioning by area-averaged accumulation to the predictor variables, as in this study. Model performance was about 80&#x00A0;percent (out-of-bag error rate of 20&#x00A0;percent; <xref ref-type="bibr" rid="r17">Jaeger and others, 2019</xref>), which was comparable with this study (<italic>R</italic><sup>2</sup>=0.80, NSE=0.79, and logNSE=0.82, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>). Updated streamflow permanence modeling has been progressing in the upper Missouri River Basin (not shown), including temporal conditioning of dynamic predictor variables and spatial conditioning by distance-decayed accumulation (Roy Sando, U.S.&#x00A0;Geological Survey, oral commun., 2020), similar to this study.</p>
<p>Different combinations of static or dynamic variables and unconditioned or temporally or spatially conditioned variables, or both, were used to fit the MLFLOW model, which resulted in considerable improvements in performance. Temporal conditioning of dynamic variables in model&#x00A0;2 increased <italic>R</italic><sup>2</sup>, NSE, and logNSE by as much as 0.09 compared with the unconditioned dynamic variables in model&#x00A0;1 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). Adding unconditioned physiographic and anthropogenic variables in model&#x00A0;3 increased <italic>R</italic><sup>2</sup>, NSE, and logNSE by as much as 0.15 compared with just the climatic variables in model&#x00A0;1 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). Spatial conditioning of all variables in model&#x00A0;4 increased <italic>R</italic><sup>2</sup>, NSE, and logNSE by as much as 0.05 compared with the unconditioned dynamic and static variables in model&#x00A0;3 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). Spatial conditioning of all variables together with temporal conditioning of dynamic variables in model&#x00A0;5 increased <italic>R</italic><sup>2</sup>, NSE, and logNSE by as much as 0.13 compared with just the spatial conditioning in model&#x00A0;4 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). PBIAS varied less than 1&#x00A0;percent among the five models (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>), indicating the MLFLOW model did not overpredict or underpredict monthly streamflow in general within the study area during the study period. In addition, logNSE was worse than NSE by 0.03 in model&#x00A0;1 but was better by 0.03 in model&#x00A0;5 (<xref ref-type="table" rid="t06">table&#x00A0;6</xref>). This 0.06&#x00A0;improvement of logNSE relative to NSE indicates selection of all variables with all temporal and spatial conditioning improved prediction of low flows relative to prediction of high flows.</p>
<p>The MLFLOW model was most sensitive to selection of dynamic climatic variables. Unconditioned dynamic climatic variables (model&#x00A0;1) alone explained 54&#x00A0;percent of the variance (<italic>R</italic><sup>2</sup>=0.54, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>) in monthly streamflow, whereas adding static physiographic and anthropogenic variables (model&#x00A0;3) only explained 12&#x00A0;percent more of the variance (<italic>R</italic><sup>2</sup>=0.66, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>), indicating the greater importance of the time series data. Also, spatial conditioning of all variables together with temporal conditioning of dynamic variables (model&#x00A0;5) increased the variance explained in the MLFLOW model by another 14&#x00A0;percent (<italic>R</italic><sup>2</sup>=0.80, <xref ref-type="table" rid="t06">table&#x00A0;6</xref>). <xref ref-type="bibr" rid="r18">Kratzert and others (2019)</xref> used another machine learning approach (long short-term memory networks) with different combinations of dynamic and static variables that performed similarly to this study. Models developed with only dynamic climatic variables performed well (NSE=0.63) but performed moderately better with the inclusion of static physiographic variables (NSE=0.74) (<xref ref-type="bibr" rid="r18">Kratzert and others, 2019</xref>). However, model performance indicating that the dynamic variables had greater explanatory power than the static variables in the MLFLOW model was not surprising because time series data intrinsically have more variation, temporal and spatial, to use in fitting the model to the streamflow observations, which also are innately variable, temporally and spatially.</p>
<p>Sensitivity of the MLFLOW model to using data from a different area (watershed) or period (year) was qualitatively assessed by differentially grouping streamflow observations by watershed or year for use in model training and testing. For models trained with all watersheds and years, performance was better in testing on observations from each watershed than from each year separately (<xref ref-type="table" rid="t07">table&#x00A0;7</xref>), indicating the MLFLOW model had greater sensitivity to temporal than to spatial differences in the data. Means of the goodness-of-fit metrics were 0.79 for <italic>R</italic><sup>2</sup>, 0.79 for NSE, 0.82 for logNSE, and 1.4&#x00A0;percent for absolute value of PBIAS for models trained with all watersheds but were 0.68 for <italic>R</italic><sup>2</sup>, 0.67 for NSE, 0.67 for logNSE, and 3.5&#x00A0;percent for absolute value of PBIAS for models trained with all years (<xref ref-type="table" rid="t07">table&#x00A0;7</xref>). However, in contrast, models trained with all except 1&#x00A0;year left out sequentially and tested on the left-out year performed better than models trained with all except one watershed left out sequentially and tested on the left-out watershed. Means of the goodness-of-fit metrics were 0.51 for <italic>R</italic><sup>2</sup>, 0.49 for NSE, 0.50 for logNSE, and 11&#x00A0;percent for absolute value of PBIAS for models trained with a watershed left out but were 0.56 for <italic>R</italic><sup>2</sup>, 0.53 for NSE, 0.56 for logNSE, and 12&#x00A0;percent for absolute value of PBIAS for models trained with 1&#x00A0;year left out (<xref ref-type="table" rid="t07">table&#x00A0;7</xref>).</p>
<table-wrap id="t07" position="float">
<label>Table 7</label><caption><title>Goodness-of-fit metrics for models trained with observations from all watersheds and years and tested on observations from each watershed or year separately or for models trained with observations from all except one watershed or 1 year left out sequentially and tested on observations from the left-out watershed or year, indicating qualitative model sensitivity to data used from a different area or period.</title>
<p content-type="toc">Table 7.&#x2003;Goodness-of-fit metrics for models trained with observations from all watersheds and years and tested on observations from each watershed or year separately or for models trained with observations from all except one watershed or 1 year left out sequentially and tested on observations from the left-out watershed or year, indicating qualitative model sensitivity to data used from a different area or period.</p>
<p>[Metrics are the mean of 50 resamples (tenfold cross-validation with 5 repetitions) per model. <italic>R</italic><sup>2</sup>, coefficient of determination; NSE, Nash-Sutcliffe efficiency; logNSE, NSE with log-transformed data; PBIAS, percent bias]</p>
</caption>
<table rules="groups">
<col width="28.67%"/>
<col width="26.78%"/>
<col width="9.79%"/>
<col width="9.77%"/>
<col width="13.04%"/>
<col width="11.95%"/>
<thead>
<tr>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Groups used to train model</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Group used to test model</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt"><italic>R</italic><sup>2</sup></td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">NSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">logNSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">PBIAS</td>
</tr>
</thead>
<tbody>
<tr>
<th valign="middle" colspan="6" align="center" style="border-top: solid 0.50pt" scope="col">Data grouped by watershed</th>
</tr>
<tr>
<td rowspan="3" valign="top" align="left" style="border-top: solid 0.50pt" scope="row">All watersheds</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">Watershed 1</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.82</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.82</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.84</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">&#x2212;1.3</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">Watershed 2</td>
<td valign="top" align="char" char=".">0.79</td>
<td valign="top" align="char" char=".">0.79</td>
<td valign="top" align="char" char=".">0.84</td>
<td valign="top" align="char" char=".">3.0</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">Watershed 3</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.0</td>
</tr>
<tr>
<td rowspan="3" valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">All except the test watershed</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">Watershed 1</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.49</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.48</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.56</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">&#x2212;8.1</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="background-color:rgb(242,242,242)" scope="row">Watershed 2</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.54</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.52</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.56</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">5.1</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">Watershed 3</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.50</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.46</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.37</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">19.9</td>
</tr>
<tr>
<th valign="middle" colspan="6" align="center" style="border-top: solid 0.50pt" scope="col">Data grouped by year</th>
</tr>
<tr>
<td rowspan="6" valign="top" align="left" style="border-top: solid 0.50pt" scope="row">All years</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">2012</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.68</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.63</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">0.58</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt">9.3</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">2013</td>
<td valign="top" align="char" char=".">0.70</td>
<td valign="top" align="char" char=".">0.69</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">2.5</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">2014</td>
<td valign="top" align="char" char=".">0.56</td>
<td valign="top" align="char" char=".">0.56</td>
<td valign="top" align="char" char=".">0.63</td>
<td valign="top" align="char" char=".">2.8</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">2015</td>
<td valign="top" align="char" char=".">0.62</td>
<td valign="top" align="char" char=".">0.62</td>
<td valign="top" align="char" char=".">0.65</td>
<td valign="top" align="char" char=".">&#x2212;1.4</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">2016</td>
<td valign="top" align="char" char=".">0.78</td>
<td valign="top" align="char" char=".">0.77</td>
<td valign="top" align="char" char=".">0.74</td>
<td valign="top" align="char" char=".">2.5</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." scope="row">2017</td>
<td valign="top" align="char" char=".">0.75</td>
<td valign="top" align="char" char=".">0.75</td>
<td valign="top" align="char" char=".">0.75</td>
<td valign="top" align="char" char=".">&#x2212;2.2</td>
</tr>
<tr>
<td rowspan="6" valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">All except the test year</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">2012</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.61</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.59</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.55</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">9.9</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="background-color:rgb(242,242,242)" scope="row">2013</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.54</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.51</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.37</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">18.6</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="background-color:rgb(242,242,242)" scope="row">2014</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.59</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.58</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.63</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">6.9</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="background-color:rgb(242,242,242)" scope="row">2015</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.57</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.55</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.63</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">&#x2212;10.3</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="background-color:rgb(242,242,242)" scope="row">2016</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.53</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.50</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">0.57</td>
<td valign="top" align="char" char="." style="background-color:rgb(242,242,242)">9.4</td>
</tr>
<tr>
<td valign="top" colspan="1" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">2017</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.49</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.46</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.62</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">&#x2212;14.6</td>
</tr>
</tbody></table></table-wrap>
<p>In general, performance for models trained with all except one watershed or 1&#x00A0;year left out sequentially was satisfactory for all the test watersheds and years (<xref ref-type="table" rid="t07">table&#x00A0;7</xref>). Model performance was only unsatisfactory (for one goodness-of-fit metric) for watershed&#x00A0;3 (logNSE=0.37) and 2013 (logNSE=0.37) (<xref ref-type="table" rid="t07">table&#x00A0;7</xref>), indicating the relations between the predictor variables and monthly streamflow at low flows in watershed&#x00A0;3 and in 2013 were difficult to explain with data from the other watersheds or years. Less similarity of data among the three watersheds than of data among the 6&#x00A0;years may explain the better performance of the model in testing on 1&#x00A0;year left out from model training than on one watershed left out from model training. However, another explanation may be fewer data were excluded from model training by leaving out 1&#x00A0;year than by leaving out one watershed, and consequently, more data (with more variance) were included in model testing on the left-out watershed than on the left-out year&#x2014;on average, each watershed and year had 324 and 162&#x00A0;observations, respectively.</p>
<p>Sensitivity of the MLFLOW model to the quantity of data used was qualitatively assessed by progressively reducing streamflow observations by percentage of sites or months available for use in model fitting. Performance was better for models fitted to fewer sites than to fewer months of observations (<xref ref-type="table" rid="t08">table&#x00A0;8</xref>), which indicated the MLFLOW model was more sensitive to temporal than to spatial differences in the data. The goodness-of-fit metrics for models fitted to 50&#x00A0;percent fewer months were worse than for models fitted to 50&#x00A0;percent fewer months by 0.12 for <italic>R</italic><sup>2</sup>, 0.14 for NSE, 0.17 for logNSE, and 6.9&#x00A0;percent for absolute value of PBIAS (<xref ref-type="table" rid="t08">table&#x00A0;8</xref>). In addition, further reduction in sites and months from 50 to 15&#x00A0;percent affected the goodness-of-fit metrics more for models fitted to fewer months than to fewer sites. On average, for three goodness-of-fit metrics (<italic>R</italic><sup>2</sup>, NSE, and logNSE), values decreased by 0.21 for models fitted to 85&#x00A0;percent fewer sites but decreased by 0.27 for models fitted to 85&#x00A0;percent fewer months (<xref ref-type="table" rid="t08">table&#x00A0;8</xref>). Performance was still satisfactory for models fitted to 50&#x00A0;percent of sites and was marginally satisfactory for models fitted to 50&#x00A0;percent of months but was no longer satisfactory for models fitted to only 15&#x00A0;percent of sites or months (<xref ref-type="table" rid="t08">table&#x00A0;8</xref>).</p>
<table-wrap id="t08" position="float">
<label>Table 8</label><caption><title>Goodness-of-fit metrics for the model fitted to all observations or for models fitted to observations progressively reduced by percentage of sites or months available, indicating qualitative model sensitivity to quantity of the data used.</title>
<p content-type="toc">Table 8.&#x2003;Goodness-of-fit metrics for the model fitted to all observations or for models fitted to observations progressively reduced by percentage of sites or months available, indicating qualitative model sensitivity to quantity of the data used.</p>
<p>[Metrics are the mean of 50 resamples (tenfold cross-validation with 5 repetitions) per model. <italic>R</italic><sup>2</sup>, coefficient of determination; NSE, Nash-Sutcliffe efficiency; logNSE, NSE with log-transformed data; PBIAS, percent bias]</p>
</caption>
<table rules="groups">
<col width="22.96%"/>
<col width="12.54%"/>
<col width="13.53%"/>
<col width="14.58%"/>
<col width="8.27%"/>
<col width="8.27%"/>
<col width="10.38%"/>
<col width="9.47%"/>
<thead>
<tr>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Sampling of data<break/>used to fit model</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median number of sites per sample</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median number of months per sample</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">Median number of observations per sample</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt"><italic>R</italic><sup>2</sup></td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">NSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">logNSE</td>
<td valign="middle" align="center" scope="col" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">PBIAS</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt" scope="row">All sites and months</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">125</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">35</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">971</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.80</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.79</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.82</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; border-bottom: solid 0.50pt">0.7</td>
</tr>
<tr>
<th valign="middle" colspan="8" align="center" style="border-top: solid 0.50pt; background-color:rgb(255,255,255)" scope="col">Data reduced by site</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">50 percent of sites</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">63</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">34</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">492</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.58</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.56</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.62</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">1.5</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">15 percent of sites</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">19</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">32</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">148</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.38</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.36</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.39</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">1.9</td>
</tr>
<tr>
<th valign="middle" colspan="8" align="center" style="border-top: solid 0.50pt; border-bottom: solid 0.50pt; background-color:rgb(255,255,255)" scope="col">Data reduced by month</th>
</tr>
<tr>
<td valign="top" align="left" style="border-top: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">50 percent of months</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">125</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">18</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">510</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.46</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.42</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">0.45</td>
<td valign="top" align="char" char="." style="border-top: solid 0.50pt; background-color:rgb(242,242,242)">8.6</td>
</tr>
<tr>
<td valign="top" align="left" style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)" scope="row">15 percent of months</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">97</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">6</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">155</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.23</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.16</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">0.14</td>
<td valign="top" align="char" char="." style="border-bottom: solid 0.50pt; background-color:rgb(242,242,242)">&#x2212;1.8</td>
</tr>
</tbody></table></table-wrap>
<p>The MLFLOW model fitted to all data performed very well (<italic>R</italic><sup>2</sup>=0.80, NSE=0.79, logNSE=0.82, and PBIAS=0.7&#x00A0;percent, <xref ref-type="table" rid="t08">table&#x00A0;8</xref>), but performance in general progressively decreased for models fitted to progressively reduced percentages of sites or months (<xref ref-type="table" rid="t08">table&#x00A0;8</xref>). However, the reductions in sites and months were applied using random samples of the data, which may explain the poor performance of models fitted to only 15&#x00A0;percent of sites or months. The 15-percent samples of sites or months used to train the MLFLOW model may not have been adequately representative of the variance in the remaining 85&#x00A0;percent of the data used to test the model. If sites could be sampled more uniformly across a watershed, focusing on proportional representation of small, medium, and large streams (by drainage area), or if months could be sampled more intentionally throughout the year, focusing on key moments of the annual hydrograph for streams at different elevations (low, medium, and high), then data for fewer sites and months might be sufficient for the MLFLOW model to perform as well as the model performed with 971&#x00A0;streamflow observations.</p>
</sec>
<sec>
<title>Streamflow Predictions</title>
<p>The MLFLOW model predicted spatially and temporally continuous monthly streamflow for the study area (17,518&#x00A0;cells; <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) and study period (72&#x00A0;months; 2012&#x2013;17), using 971&#x00A0;discrete streamflow observations (<xref ref-type="table" rid="t01">table&#x00A0;1</xref>). Spatial and temporal variations in the streamflow predictions are discussed in this section using, as an example, a subset of the study area (part of the channel network of watershed&#x00A0;2; figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) and a subset of the study period (every month of 2017 and every year in August).</p>
<p>Intra-annual variation in streamflow was simulated realistically by the MLFLOW model&#x2014;seasonality of streamflow was well characterized. Predicted flows in 2017 were much lower from January through March (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>A</italic>&#x2013;<italic>C</italic></xref>) and from August through December (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>H</italic>&#x2013;<italic>L</italic></xref>), months before or after the snowmelt high flows during April through June (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>D</italic>&#x2013;<italic>F</italic></xref>). Using February, May, August, and November to represent winter, spring, summer, and fall, respectively, mean of cells on the channel network was 8.9&#x00A0;ft<sup>3</sup>/s in May (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>E</italic></xref>) but only 1.0, 1.8, and 0.51&#x00A0;ft<sup>3</sup>/s in February, August, and November, respectively (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>B</italic>, <italic>H</italic>, <italic>K</italic></xref>, respectively). Furthermore, in May, some cells had predicted flows of almost 17&#x00A0;ft<sup>3</sup>/s (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>E</italic></xref>), whereas in February and November, many cells on the channel networks had predicted flows of 0&#x00A0;ft<sup>3</sup>/s (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>B</italic>, <italic>K</italic></xref>, respectively). However, because 2017 was a wetter year, no cells on the channel network in August had predicted flows of 0&#x00A0;ft<sup>3</sup>/s (zero flow; <xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>F</italic></xref>), whereas in August of 2012, which was a drier year, about 45&#x00A0;percent of cells on smaller channels had zero-flow predictions (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>A</italic></xref>). In February, predicted flows were higher in smaller channels than in larger channels (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>B</italic></xref>), but in May, larger channels had considerably higher predicted flows (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>E</italic></xref>), and in August and November, streamflow receded, remaining higher in more intermediate channels (<xref ref-type="fig" rid="fig13">fig.&#x00A0;13<italic>H</italic>, <italic>K</italic></xref>, respectively). Higher flows were observed at sites with smaller drainage areas and higher snow water equivalent in February, at sites with larger drainage areas in May, and at sites with larger drainage areas or higher precipitation in August and November (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>).</p>
<fig id="fig13" position="float" fig-type="figure"><label>Figure 13</label><caption><p>Intra-annual variation in streamflow predictions on part of a channel network (figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) for each month of 2017. <italic>A</italic>,&#x00A0;January; <italic>B</italic>,&#x00A0;February; <italic>C</italic>,&#x00A0;March; <italic>D</italic>,&#x00A0;April; <italic>E</italic>,&#x00A0;May; <italic>F</italic>,&#x00A0;June; <italic>G</italic>,&#x00A0;July; <italic>H</italic>,&#x00A0;August; <italic>I</italic>,&#x00A0;September; <italic>J</italic>,&#x00A0;October; <italic>K</italic>,&#x00A0;November; and <italic>L</italic>,&#x00A0;December.</p><p content-type="toc">Figure 13.&#x2003;Diagrams showing intra-annual variation in streamflow predictions on part of a channel network for each month of 2017.</p></caption>
<long-desc>Predicted streamflow varies intra-annually and ranges from 0 to 18 cubic feet per second for each month of 2017.</long-desc><graphic xlink:href="rol21-0070_fig13"/></fig>
<fig id="fig14" position="float" fig-type="figure"><label>Figure 14</label><caption><p>Interannual variation in streamflow predictions on part of a channel network (figure area, <xref ref-type="fig" rid="fig01">fig.&#x00A0;1</xref>) for each year in August. <italic>A</italic>,&#x00A0;2012; <italic>B</italic>,&#x00A0;2013; <italic>C</italic>,&#x00A0;2014; <italic>D</italic>,&#x00A0;2015; <italic>E</italic>,&#x00A0;2016; and <italic>F</italic>,&#x00A0;2017.</p><p content-type="toc">Figure 14.&#x2003;Diagrams showing interannual variation in streamflow predictions on part of a channel network for each year in August.</p></caption>
<long-desc>Predicted streamflow varies interannually and ranges from 0 to 5 cubic feet per second for each year in August.</long-desc><graphic xlink:href="rol21-0070_fig14"/></fig>
<p>The MLFLOW model also realistically simulated interannual variation in streamflow, well representing yearly hydroclimatic conditions. Predicted flows in August were low in 2012 and 2013 (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>A</italic>, <italic>B</italic></xref>, respectively), which were drier years, but warmer in 2012 and cooler in 2013 (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>), whereas predicted flows were high in 2014 and 2017 (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>C</italic>, <italic>F</italic></xref>, respectively), which were years with more normal temperature but wetter, with more rainfall in 2014 and more snowpack in 2017 (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>). Although 2015 was a drier, warmer year similar to 2012, predicted flows were higher in 2015 (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>D</italic></xref>) than in 2012 (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>A</italic></xref>), most likely because 2015 followed 2014, a wetter year. Similarly, although 2016 was a wetter year similar to 2014 and 2017, predicted flows in 2016 were more intermediate (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>E</italic></xref>) similar to 2015 (<xref ref-type="fig" rid="fig14">fig.&#x00A0;14<italic>D</italic></xref>), most likely because the preceding year, 2015, was drier. Using 2012 and 2017 to represent a range of hydroclimatic conditions from warmer and drier to cooler and wetter, respectively, mean predicted flow was only 0.12&#x00A0;ft<sup>3</sup>/s in 2012 but 1.8&#x00A0;ft<sup>3</sup>/s in 2017. Moreover, maximum predicted flow in 2012 was only 0.61&#x00A0;ft<sup>3</sup>/s, whereas minimum predicted flow in 2017 was 0.19&#x00A0;ft<sup>3</sup>/s. In 2017, a year with more snowpack, mean predicted flow in August was 1.8&#x00A0;ft<sup>3</sup>/s, whereas in 2014, a year with more rainfall, mean predicted flow in August was 2.4&#x00A0;ft<sup>3</sup>/s, indicating the responsiveness of the model to recent-month precipitation versus earlier-year snow water equivalent (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>). In August, observed flows at sites on larger channels were lower in 2012 than in 2013 because these sites had lower snow water equivalent in 2012 than in 2013. Observed flows were higher at sites on larger channels in 2017 but were higher at sites on more intermediate channels in 2014. This difference resulted from higher snow water equivalent at sites on larger channels in 2017 than in 2014 and higher precipitation at sites on more intermediate channels in 2014 than in 2017 (<xref ref-type="bibr" rid="r27">McShane and Eddy-Miller, 2021</xref>).</p>
</sec>
<sec>
<title>Utility and Limitations of Modeling Approach</title>
<p>The greatest utility of the modeling approach is the ease of use and the speed of processing input data, running the model, and interpreting the model output, whereas the greatest limitation is the need for spatially and temporally representative streamflow observations to drive the model. Not only is the quantity of observations an important limitation but the quality of observations also is an important limitation. Streamflow measurements might be made by poorly trained persons and errors could be proportionally larger at low flows than at high flows; however, in this study, all measurements were made by well trained personnel following standard USGS techniques and methods (<xref ref-type="bibr" rid="r38">Rantz and others, 1982</xref>; <xref ref-type="bibr" rid="r40">Sauer and Turnipseed, 2010</xref>). In addition, the MLFLOW model performed well with the available streamflow measurements, which were made at a spatially representative number of sites and in a temporally representative number of months.</p>
<p>Streamflow predictions seemed to well represent the annual hydrograph within the study area during the study period. However, streamflow measurements were not made in January, March, October, or December, so model performance was not computable for those 4&#x00A0;months. In addition, for the other 8&#x00A0;months, streamflow measurements were not made at every site in every month of all 6&#x00A0;years, so the evaluation of predictive performance was more limited than what would be possible for a model developed using time series data from a USGS streamgage. Also, model predictions in this study were assumed representative of natural streamflow. Although three anthropogenic predictor variables (water diversions, roads, and water wells) were used in the MLFLOW model, the variables were merely informative of potential hydrologic alteration in a qualitative sense. Therefore, the MLFLOW model did not quantitatively characterize any actual streamflow regulation that was due to diversion, release, or return of water.</p>
<p>These observations, although single measurements in a month, were treated as representative of mean monthly streamflow for use in the MLFLOW model. Because the annual hydrograph of the study area is characterized by a relatively short period (about 3&#x00A0;months) of higher flows and a much longer period (about 9&#x00A0;months) of lower flows, a streamflow measurement made on any day of the month is likely representative of the mean monthly streamflow for most months of the year. However, during the rise and fall of the high-flow period, measurements made earlier and later in a month may be quite different from each other. Measurements may be lower in early May than in late May, and measurements may be higher in early June than in late June. Therefore, the magnitude of the highest flow may not be accurately represented with the streamflow observations, but the magnitude of the shorter high-flow period (2&#x00A0;months) relative to the magnitude of the longer low-flow period (8&#x00A0;months) was evident with the available streamflow observations.</p>
<p>The MLFLOW model, fitted as is with the data used in this study, is not transferable to another watershed unless the other watershed has similar statistical distributions of the physiographic, anthropogenic, and climatic variables. Transfer of the model to another watershed with values of these variables at the margins of the variables&#x2019; statistical distributions would require judgment on the reliability of the model&#x2019;s application. However, because of the data-driven nature of the machine learning approach in this study, new streamflow observations in addition to the current observations can be used to refit the model. Nonetheless, transferring the MLFLOW model, fitted as is, to a watershed with a different hydroclimatic regime would not be as reasonable as fitting a new model to data for the new regime.</p>
<p>The modeling approach in this study was not a process-based model with mathematical functions derived from first principles or empirical research. Instead, temporal and spatial conditioning processes were applied to the predictor variables as a means to substitute simple properties of the data&#x2014;temporally and spatially averaged or decayed values of climatic, physiographic, and anthropogenic variables&#x2014;for some of the complex process-based functions (requiring much parameter optimization) for climate and land surface elements in a physically based hydrologic model.</p>
<p>Temporal conditioning was an effective and efficient way of increasing the information content of the dynamic climatic variables. A moving average was computationally simple and provided a simplified means for simulating short to long periods of the water storage change in a runoff model. For monthly streamflow as modeled in this study, the effect of climate is less immediate than in an event-based model. Generating a progression of moving average values of the time series data provided the model with more dimensionality for exploring the relation of climate to streamflow. However, the multiplicity of moving-average variants of the dynamic variables may have increased the use of these variables in the MLFLOW model because of the greater abundance of the dynamic variables relative to the static variables. In addition, too many moving-average variants of the climate variables can complicate evaluation of variable importance, including interpretation of partial dependence plots.</p>
<p>Spatial conditioning also had an important effect on the information content of the dynamic and static variables. Spatial conditioning was especially important to the static variables with binary data because no or few nonzero values of water diversions, springs, roads, water wells, surficial geology contacts, or bedrock geology faults corresponded to any sampling site, meaning these six variables (without spatial conditioning) had no variance for the MLFLOW model to explain. However, spatial conditioning of the unconditioned variables altered them from discrete (0 or 1) to continuous (ranging from 0 to 1) data for each cell (with only nonzero values on the channel network). Consequently, all sampling sites had a nonzero value of the six spatially conditioned variables&#x2014;variance was generated in these variables for the MLFLOW model to explain. Area-averaged or distance-decayed accumulation of the spatial data not only increased explanatory power of the variables but also modified incompatible spatially discrete variables into applicable spatially continuous variables.</p>
<p>However, conditioning of the variables was less important than selection of dynamic variables for use in the MLFLOW model. The variance of the time series data was more attuned to the variance of the streamflow observations because every monthly streamflow observation had a corresponding monthly value of evapotranspiration, precipitation, snow water equivalent, and temperature. Therefore, the selection of additional dynamic variables might increase performance more than any static variable already selected for use in the MLFLOW model.</p>
<p>Gradient boosting machines used in this study are just one of many available machine learning models, and other circumstances might make a different model better. However, most machine learning models can be readily applied in R, such as neural networks, support vector machines, and random forests. The &#x201C;caret&#x201D; package in R has more than 100&#x00A0;different models (<xref ref-type="bibr" rid="r20">Kuhn, 2020</xref>). <xref ref-type="bibr" rid="r6">Carlisle and others (2016)</xref> evaluated five machine learning models, including random forests and gradient boosting machines, and determined that random forests performed more ably than the other four machine learning models. However, in this study, the MLFLOW model also was developed with a random forest, but performance was not as good as for the MLFLOW model developed with the gradient boosting machine.</p>
<p>The modeling approach in this study is relatively simple to apply. Other predictor variables can be used in the MLFLOW model, and variables that were most important in the model are available for larger areas and longer periods than were applicable to this study. The R&#x00A0;scripts are adaptable&#x2014;other machine learning models can be substituted for the gradient boosting machine used in the MLFLOW model. Although familiarity with R is necessary, only a working knowledge of hydrology (for selecting appropriate predictor variables and evaluating the quality of streamflow observations) and a rudimentary understanding of machine learning models are needed. Therefore, this modeling approach is practicable for other scientists who work with water but who are not hydrologists.</p>
</sec>
</sec>
<sec>
<title>Summary</title>
<p>Streamflow is a necessary ecological resource for many animal and plant species. However, scant availability of streamflow data can impede the utility of streamflow as a variable in ecological models of aquatic and terrestrial species, especially when studying small streams in watersheds that lack streamgages. Much ecological research on small streams is concentrated on species-habitat relations throughout the channel network of a watershed, but streamflow data, needed at fine resolution and broad extent for this research, are typically sparse. For instance, collaborators with the Wyoming Cooperative Fish and Wildlife Research Unit needed more detailed streamflow information for researching the effects of multiple stressors on fish and invertebrates in several ungaged watersheds in the upper Green River Basin in southwestern Wyoming, where streamflow data are sparse.</p>
<p>Several approaches to modeling streamflow at various scales have been developed that may improve the utility of sparse streamflow data to ecological research in small streams, including machine learning models that can fit potentially complex relations between streamflow observations and environmental predictor variables. We developed a machine learning approach in R for modeling spatially and temporally continuous monthly streamflow from 2012 through 2017 in three semiarid montane-steppe watersheds (with drainage areas of 26&#x2013;55&#x00A0;square miles and mean elevations of 8,031&#x2013;8,455&#x00A0;feet) on the Wyoming Range.</p>
<p>A machine learning streamflow (MLFLOW) model was developed to predict monthly streamflow using static and dynamic variables derived from geospatial and time series data and to explain the multidimensional relation of monthly streamflow to the predictor variables. Streamflow measurements were made at 125&#x00A0;sites in 35&#x00A0;months during 2012&#x2013;17, totaling 971&#x00A0;discrete observations; only a single measurement was made at any site in any month. Based on the potential to affect streamflow, 24&#x00A0;variables describing physiographic, anthropogenic, and climatic characteristics were chosen as predictors in the MLFLOW model&#x2014;20&#x00A0;variables described static physiographic and anthropogenic conditions, and 4&#x00A0;variables described dynamic climatic conditions.</p>
<p>The dynamic and static variables were temporally and spatially conditioned to amplify the relation of predictor variables to monthly streamflow. To generate a diverse lagged effect of climatic conditions on streamflow, a temporal conditioning process was applied to the climate variables. The process consisted of moving averages of the time series data that ranged from the prior month to the prior year. Two geospatial processes were used to account for area and distance effects on streamflow. An area-averaged accumulation of the spatial data generated upstream effects of a variable. A distance-decayed accumulation of the spatial data generated more localized effects of a variable.</p>
<p>The MLFLOW model used a gradient boosting machine that is one of many machine learning models that are used in various hydrologic modeling, including modeling of streamflow. A gradient boosting machine is a machine learning model used to solve regression or classification problems. The gradient boosting machine produces a predictive model that is an ensemble of many weaker models, which are typically implemented as decision trees. Development of the MLFLOW model proceeded in an iterative process of model calibration and validation. Streamflow observations used to fit the model were split into training and testing samples using repeated <italic>k</italic>-fold cross-validation.</p>
<p>The MLFLOW model was initially fitted to all data to explain monthly streamflow in relation to the predictor variables for all the study area and study period. Additional models were fitted, one model for each watershed and year to explain spatial or temporal variation in the relation of monthly streamflow to the predictor variables. Multiple models were fitted to qualitatively assess sensitivity of the MLFLOW model to variations of the predictor variables and streamflow observations. Models fitted to different combinations of predictor variables were used to assess sensitivity to selection and conditioning of the predictor variables. Model sensitivity to fitting data from a different area or period was assessed by differentially grouping streamflow observations by watershed or year for use in model training and testing. Model sensitivity to the quantity of data fitted was assessed by progressively reducing streamflow observations by percentage of sites or months available for use in model fitting.</p>
<p>The MLFLOW model fitted to all data had satisfactory agreement between observed and predicted streamflow (coefficient of determination [<italic>R</italic><sup>2</sup>]=0.80, Nash-Sutcliffe efficiency [NSE]=0.79, NSE with log-transformed data [logNSE]=0.82, and percent bias [PBIAS]=0.7&#x00A0;percent). The equivalence between NSE (0.79) and logNSE (0.82) indicated the MLFLOW model performed equally well for high and low flows. PBIAS (0.7&#x00A0;percent) indicated the MLFLOW model did not overpredict or underpredict monthly streamflow in general. The MLFLOW model performed equally well for all months with streamflow observations.</p>
<p>The most important variables (statistically important in the MLFLOW model) for explaining monthly streamflow were moving averages of precipitation and snow water equivalent. Importance of the static and dynamic variables did not differ substantially among the three watersheds but differed considerably among the 6&#x00A0;years. The 20&#x00A0;most important variables in the MLFLOW model had simple to more complex relations with monthly streamflow as interpreted with partial dependence plots. Monthly streamflow increased with increasing moving averages of precipitation and snow water equivalent and decreased with increasing moving averages of evapotranspiration and temperature. Monthly streamflow also increased with increasing drainage area and decreased with increasing forest cover and elevation.</p>
<p>Temporal and spatial conditioning intensified the relation of many predictor variables with monthly streamflow, resulting in more information that the MLFLOW model could use for predicting streamflow. However, conditioning of the variables was less important than selection of dynamic variables for use in the MLFLOW model.</p>
<p>The MLFLOW model was most sensitive to selection of dynamic climatic variables. Unconditioned dynamic climatic variables alone explained 54&#x00A0;percent of the variance (<italic>R</italic><sup>2</sup>=0.54) in monthly streamflow, whereas adding static physiographic and anthropogenic variables only explained 12&#x00A0;percent more of the variance (<italic>R</italic><sup>2</sup>=0.66), indicating the greater importance of the time series data. Also, spatial conditioning of all variables together with temporal conditioning of dynamic variables increased the variance explained in the MLFLOW model by another 14&#x00A0;percent (<italic>R</italic><sup>2</sup>=0.80).</p>
<p>For models trained with all watersheds and years, performance was better in testing on observations from each watershed than from each year separately, indicating the MLFLOW model had greater sensitivity to temporal than to spatial differences in the data. However, in contrast, models trained with all except 1&#x00A0;year left out sequentially and tested on the left-out year performed better than models trained with all except one watershed left out sequentially and tested on the left-out watershed.</p>
<p>Performance was better for models fitted to fewer sites than to fewer months of observations, which indicated the MLFLOW model was more sensitive to temporal than to spatial differences in the data. The MLFLOW model fitted to all data performed very well, but performance in general progressively decreased for models fitted to progressively reduced percentages of sites or months.</p>
<p>Streamflow predictions seemed to well represent the annual hydrograph within the study area during the study period. Intra-annual variation in streamflow was simulated realistically by the MLFLOW model&#x2014;seasonality of streamflow was well characterized. The MLFLOW model also realistically simulated interannual variation in streamflow, well representing yearly hydroclimatic conditions.</p>
<p>The greatest utility of the modeling approach is the ease of use and the speed of processing input data, running the model, and interpreting the model output, whereas the greatest limitation is the need for spatially and temporally representative streamflow observations to drive the model. Although familiarity with R is necessary, only a working knowledge of hydrology (for selecting appropriate predictor variables and evaluating the quality of streamflow observations) and a rudimentary understanding of machine learning models are needed. Therefore, this modeling approach is practicable for other scientists who work with water but who are not hydrologists.</p>
</sec>
</body>
</book-part>
</book-body>
<book-back>
<ref-list>
<title>References Cited</title>
<ref id="r1"><mixed-citation publication-type="other">Barnhart, T.B., Sando, R., Siefken, S.A., McCarthy, P.M., and Rea, A.H., 2020, Flow-conditioned parameter grid tools: U.S. Geological Survey software release, accessed May 6, 2021, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5066/P9W8UZ47">https://doi.org/10.5066/P9W8UZ47</ext-link>.</mixed-citation></ref>
<ref id="r2"><mixed-citation publication-type="web">Barrett, A.P., 2003, National operational hydrologic remote sensing center snow data assimilation system (SNODAS) products at NSIDC: National Snow and Ice Data Center Special Report 11, p. 19, accessed March 11, 2018, at <ext-link ext-link-type="uri" xlink:href="https://nsidc.org/sites/nsidc.org/files/technical-references/nsidc_special_report_11.pdf">https://nsidc.org/sites/nsidc.org/files/technical-references/nsidc_special_report_11.pdf</ext-link>.</mixed-citation></ref>
<ref id="r3"><mixed-citation publication-type="other">Bartolino, J.R., Konrad, C.P., Sando, R., and Hockman-Wert, D.P., 2019, RockType to permeability crosswalk table, Northwest U.S.: U.S. Geological Survey data release, accessed March 12, 2019, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5066/P95DIXXT">https://doi.org/10.5066/P95DIXXT</ext-link>.</mixed-citation></ref>
<ref id="r4"><mixed-citation publication-type="web">Bellos, V., and Carbajal, J.P., 2020, Machine learning applied to hydraulic and hydrological modelling: special issue <italic>of</italic> Water, accessed March 13, 2020, at <ext-link ext-link-type="uri" xlink:href="https://www.mdpi.com/journal/water/special_issues/Machine_Learning_Hydraulic_Hydrological">https://www.mdpi.com/journal/water/special_issues/Machine_Learning_Hydraulic_Hydrological</ext-link>.</mixed-citation></ref>
<ref id="r5"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Bowen</surname>, <given-names>Z.H.</given-names></string-name>, <string-name><surname>Aldridge</surname>, <given-names>C.L.</given-names></string-name>, <string-name><surname>Anderson</surname>, <given-names>P.J.</given-names></string-name>, <string-name><surname>Assal</surname>, <given-names>T.J.</given-names></string-name>, <string-name><surname>Bern</surname>, <given-names>C.R.</given-names></string-name>, <string-name><surname>Biewick</surname>, <given-names>L.R.H.</given-names></string-name>, <string-name><surname>Boughton</surname>, <given-names>G.K.</given-names></string-name>, <string-name><surname>Carr</surname>, <given-names>N.B.</given-names></string-name>, <string-name><surname>Chalfoun</surname>, <given-names>A.D.</given-names></string-name>, <string-name><surname>Chong</surname>, <given-names>G.W.</given-names></string-name>, <string-name><surname>Clark</surname>, <given-names>M.L.</given-names></string-name>, <string-name><surname>Fedy</surname>, <given-names>B.C.</given-names></string-name>, <string-name><surname>Foster</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Garman</surname>, <given-names>S.L.</given-names></string-name>, <string-name><surname>Germaine</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Hethcoat</surname>, <given-names>M.G.</given-names></string-name>, <string-name><surname>Homer</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Kauffman</surname>, <given-names>M.J.</given-names></string-name>, <string-name><surname>Keinath</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Latysh</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Manier</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>McDougal</surname>, <given-names>R.R.</given-names></string-name>, <string-name><surname>Melcher</surname>, <given-names>C.P.</given-names></string-name>, <string-name><surname>Miller</surname>, <given-names>K.A.</given-names></string-name>, <string-name><surname>Montag</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Potter</surname>, <given-names>C.J.</given-names></string-name>, <string-name><surname>Schell</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Shafer</surname>, <given-names>S.L.</given-names></string-name>, <string-name><surname>Smith</surname>, <given-names>D.B.</given-names></string-name>, <string-name><surname>Sweat</surname>, <given-names>M.J.</given-names></string-name>, and <string-name><surname>Wilson</surname>, <given-names>A.B.</given-names></string-name></person-group>, <year>2014</year>, <source>U.S. Geological Survey science for the Wyoming Landscape Conservation Initiative&#x2014;2012 annual report</source>: <series>U.S. Geological Survey Open-File Report</series> <volume>2014&#x2013;1093</volume>, <size units="page">71</size>&#x00A0;p. <comment>[Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ofr20141093">https://doi.org/10.3133/ofr20141093</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r6"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Carlisle</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Wolock</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Howard</surname>, <given-names>J.K.</given-names></string-name>, <string-name><surname>Grantham</surname>, <given-names>T.E.</given-names></string-name>, <string-name><surname>Fesenmyer</surname>, <given-names>K.</given-names></string-name>, and <string-name><surname>Wieczorek</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2016</year>, <source>Estimating natural monthly stream flows in California and the likelihood of anthropogenic modification</source>: <series>U.S. Geological Survey Open File Report</series> <volume>2016&#x2013;1189</volume>, <size units="page">27</size>&#x00A0;p. <comment>[Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ofr20161189">https://doi.org/10.3133/ofr20161189</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r7"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chaney</surname>, <given-names>N.W.</given-names></string-name>, <string-name><surname>Minasny</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Herman</surname>, <given-names>J.D.</given-names></string-name>, <string-name><surname>Nauman</surname>, <given-names>T.W.</given-names></string-name>, <string-name><surname>Brungard</surname>, <given-names>C.W.</given-names></string-name>, <string-name><surname>Morgan</surname>, <given-names>C.L.S.</given-names></string-name>, <string-name><surname>McBratney</surname>, <given-names>A.B.</given-names></string-name>, <string-name><surname>Wood</surname>, <given-names>E.F.</given-names></string-name>, and <string-name><surname>Yimam</surname>, <given-names>Y.</given-names></string-name></person-group>, <year>2019</year>, <article-title>POLARIS soil properties&#x2014;30-m probabilistic maps of soil properties over the contiguous United States</article-title>: <source>Water Resources Research</source>, v.&#x00A0;<volume>55</volume>, no.&#x00A0;<issue>4</issue>, p.&#x00A0;<fpage>2916</fpage>&#x2013;<lpage>2938</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1029/2018WR022797">https://doi.org/10.1029/2018WR022797</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r8"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chase</surname>, <given-names>K.J.</given-names></string-name>, <string-name><surname>Haj</surname>, <given-names>A.E.</given-names></string-name>, <string-name><surname>Regan</surname>, <given-names>R.S.</given-names></string-name>, and <string-name><surname>Viger</surname>, <given-names>R.J.</given-names></string-name></person-group>, <year>2016</year>, <article-title>Potential effects of climate change on streamflow for seven watersheds in eastern and central Montana</article-title>: <source>Journal of Hydrology. Regional Studies</source>, v.&#x00A0;<volume>7</volume>, p.&#x00A0;<fpage>69</fpage>&#x2013;<lpage>81</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.ejrh.2016.06.001">https://doi.org/10.1016/j.ejrh.2016.06.001</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r9"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Daly</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Halbleib</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Smith</surname>, <given-names>J.I.</given-names></string-name>, <string-name><surname>Gibson</surname>, <given-names>W.P.</given-names></string-name>, <string-name><surname>Doggett</surname>, <given-names>M.K.</given-names></string-name>, <string-name><surname>Taylor</surname>, <given-names>G.H.</given-names></string-name>, <string-name><surname>Curtis</surname>, <given-names>J.</given-names></string-name>, and <string-name><surname>Pasteris</surname>, <given-names>P.P.</given-names></string-name></person-group>, <year>2008</year>, <article-title>Physiographically sensitive mapping of climatological temperature and precipitation across the conterminous United States</article-title>: <source>International Journal of Climatology</source>, v.&#x00A0;<volume>28</volume>, no.&#x00A0;<issue>15</issue>, p.&#x00A0;<fpage>2031</fpage>&#x2013;<lpage>2064</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/joc.1688">https://doi.org/10.1002/joc.1688</ext-link><comment>.]</comment> </mixed-citation></ref>
<ref id="r10"><mixed-citation publication-type="book">Esri, 2019, ArcGIS Desktop: Redlands, Calif., Environmental Systems Research Institute, ver. 10.6.1.</mixed-citation></ref>
<ref id="r11"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Friedman</surname>, <given-names>J.H.</given-names></string-name></person-group>, <year>2001</year>, <article-title>Greedy function approximation&#x2014;A gradient boosting machine</article-title>: <source>Annals of Statistics</source>, v.&#x00A0;<volume>29</volume>, no.&#x00A0;<issue>5</issue>, p.&#x00A0;<fpage>1189</fpage>&#x2013;<lpage>1232</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1214/aos/1013203451">https://doi.org/10.1214/aos/1013203451</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r12"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Friedman</surname>, <given-names>J.H.</given-names></string-name></person-group>, <year>2002</year>, <article-title>Stochastic gradient boosting</article-title>: <source>Computational Statistics &amp; Data Analysis</source>, v.&#x00A0;<volume>38</volume>, no.&#x00A0;<issue>4</issue>, p.&#x00A0;<fpage>367</fpage>&#x2013;<lpage>378</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/S0167-9473(01)00065-2">https://doi.org/10.1016/S0167-9473(01)00065-2</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r13"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Girard</surname>, <given-names>C.E.</given-names></string-name>, and <string-name><surname>Walters</surname>, <given-names>A.W.</given-names></string-name></person-group>, <year>2018</year>, <article-title>Evaluating relationships between native fishes and habitat in streams affected by oil and natural gas development</article-title>: <source>Fisheries Management and Ecology</source>, v.&#x00A0;<volume>25</volume>, no.&#x00A0;<issue>5</issue>, p.&#x00A0;<fpage>366</fpage>&#x2013;<lpage>379</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/fme.12303">https://doi.org/10.1111/fme.12303</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r14"><mixed-citation publication-type="web">Greenwell, B., Boehmke, B., and Cunningham, J., 2020, gbm&#x2014;Generalized boosted regression models: R package, ver. 2.1.8, accessed August 18, 2020, at <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/gbm">https://cran.r-project.org/web/packages/gbm</ext-link>.</mixed-citation></ref>
<ref id="r15"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Hastie</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Tibshirani</surname>, <given-names>R.</given-names></string-name>, and <string-name><surname>Friedman</surname>, <given-names>J.</given-names></string-name></person-group>, <year>2009</year>, <source>The elements of statistical learning&#x2014;Data mining, inference, and prediction</source> (2d&#x00A0;ed.): <publisher-loc>New York</publisher-loc>, <publisher-name>Springer</publisher-name>, <size units="page">745</size>&#x00A0;p.</mixed-citation></ref>
<ref id="r16"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Homer</surname>, <given-names>C.</given-names></string-name>, <string-name><surname>Dewitz</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Yang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Jin</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Danielson</surname>, <given-names>P.</given-names></string-name>, <string-name><surname>Xian</surname>, <given-names>G.</given-names></string-name>, <string-name><surname>Coulston</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Herold</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Wickham</surname>, <given-names>J.</given-names></string-name>, and <string-name><surname>Megown</surname>, <given-names>K.</given-names></string-name></person-group>, <year>2015</year>, <article-title>Completion of the 2011 national land cover database for the conterminous United States&#x2014;Representing a decade of land cover change information</article-title>: <source>Photogrammetric Engineering and Remote Sensing</source>, v.&#x00A0;<volume>81</volume>, no.&#x00A0;<issue>5</issue>, p.&#x00A0;<fpage>345</fpage>&#x2013;<lpage>354</lpage><comment>. [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://www.ingentaconnect.com/content/asprs/pers/2015/00000081/00000005/art00002">https://www.ingentaconnect.com/content/asprs/pers/2015/00000081/00000005/art00002</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r17"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jaeger</surname>, <given-names>K.L.</given-names></string-name>, <string-name><surname>Sando</surname>, <given-names>R.</given-names></string-name>, <string-name><surname>McShane</surname>, <given-names>R.R.</given-names></string-name>, <string-name><surname>Dunham</surname>, <given-names>J.B.</given-names></string-name>, <string-name><surname>Hockman-Wert</surname>, <given-names>D.P.</given-names></string-name>, <string-name><surname>Kaiser</surname>, <given-names>K.E.</given-names></string-name>, <string-name><surname>Hafen</surname>, <given-names>K.</given-names></string-name>, <string-name><surname>Risley</surname>, <given-names>J.C.</given-names></string-name>, and <string-name><surname>Blasch</surname>, <given-names>K.W.</given-names></string-name></person-group>, <year>2019</year>, <article-title>Probability of streamflow permanence model (PROSPER)&#x2014;A spatially continuous model of annual streamflow permanence throughout the Pacific Northwest</article-title>: <source>Journal of Hydrology: X</source>, v.&#x00A0;<volume>2</volume>, p.&#x00A0;<elocation-id>100005</elocation-id>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.hydroa.2018.100005">https://doi.org/10.1016/j.hydroa.2018.100005</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r18"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kratzert</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Klotz</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Herrnegger</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Sampson</surname>, <given-names>A.K.</given-names></string-name>, <string-name><surname>Hochreiter</surname>, <given-names>S.</given-names></string-name>, and <string-name><surname>Nearing</surname>, <given-names>G.S.</given-names></string-name></person-group>, <year>2019</year>, <article-title>Toward improved predictions in ungauged basins&#x2014;Exploiting the power of machine learning</article-title>: <source>Water Resources Research</source>, v.&#x00A0;<volume>55</volume>, no.&#x00A0;<issue>12</issue>, p.&#x00A0;<fpage>11344</fpage>&#x2013;<lpage>11354</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1029/2019WR026065">https://doi.org/10.1029/2019WR026065</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r19"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kuhn</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2008</year>, <article-title>Building predictive models in R using the caret package</article-title>: <source>Journal of Statistical Software</source>, v.&#x00A0;<volume>28</volume>, no.&#x00A0;<issue>5</issue>, p.&#x00A0;<fpage>1</fpage>&#x2013;<lpage>26</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18637/jss.v028.i05">https://doi.org/10.18637/jss.v028.i05</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r20"><mixed-citation publication-type="web">Kuhn, M., 2020, caret&#x2014;Classification and regression training: R package, ver. 6.0&#x2013;86, accessed April 14, 2020, at <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/caret">https://cran.r-project.org/web/packages/caret</ext-link>.</mixed-citation></ref>
<ref id="r21"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Kuhn</surname>, <given-names>M.</given-names></string-name>, and <string-name><surname>Johnson</surname>, <given-names>K.</given-names></string-name></person-group>, <year>2013</year>, <source>Applied predictive modeling</source>: <publisher-loc>New York</publisher-loc>, <publisher-name>Springer</publisher-name>, <size units="page">600</size>&#x00A0;p.</mixed-citation></ref>
<ref id="r22"><mixed-citation publication-type="other">Leavesley, G.H., Lichty, R.W., Troutman, B.M., and Saindon, L.G., 1983, Precipitation-runoff modeling system&#x2014;User&#x2019;s manual: U.S. Geological Survey Water Resources Investigation Report 83&#x2013;4238, 207 p. [Also available at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/wri834238">https://doi.org/10.3133/wri834238</ext-link>.]</mixed-citation></ref>
<ref id="r23"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Legates</surname>, <given-names>D.R.</given-names></string-name>, and <string-name><surname>McCabe</surname>, <given-names>G.J.</given-names>, <suffix>Jr</suffix></string-name></person-group>., <year>1999</year>, <article-title>Evaluating the use of &#x201C;goodness-of-fit&#x201D; measures in hydrologic and hydroclimatic model validation</article-title>: <source>Water Resources Research</source>, v.&#x00A0;<volume>35</volume>, no.&#x00A0;<issue>1</issue>, p.&#x00A0;<fpage>233</fpage>&#x2013;<lpage>241</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1029/1998WR900018">https://doi.org/10.1029/1998WR900018</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r24"><mixed-citation publication-type="book">Markstrom, S.L., Regan, R.S., Hay, L.E., Viger, R.J., Webb, R.M.T., Payn, R.A., and LaFontaine, J.H., 2015, PRMS&#x2013;IV, the precipitation-runoff modeling system, version 4: U.S. Geological Survey Techniques and Methods, book 6, chap. B7, 158 p. [Also available at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/tm6B7">https://doi.org/10.3133/tm6B7</ext-link>.]</mixed-citation></ref>
<ref id="r25"><mixed-citation publication-type="other">Massey, R., Sankey, T.T., Yadav, K., Congalton, R.G., Tilton, J.C., and Thenkabail, P.S., 2017, Making Earth system data records for use in research environments (MEaSUREs) global food security-support analysis data (GFSAD) cropland extent 2010 North America 30 m: National Aeronautics and Space Administration, accessed May 14, 2019, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5067/MEaSUREs/GFSAD/GFSAD30NACE.001">https://doi.org/10.5067/MEaSUREs/GFSAD/GFSAD30NACE.001</ext-link>.</mixed-citation></ref>
<ref id="r26"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>McCabe</surname>, <given-names>G.J.</given-names></string-name>, and <string-name><surname>Markstrom</surname>, <given-names>S.L.</given-names></string-name></person-group>, <year>2007</year>, <source>A monthly water-balance model driven by a graphical user interface</source>: <series>U.S. Geological Survey Open-File Report</series> <volume>2007&#x2013;1088</volume>, <size units="page">6</size>&#x00A0;p. <comment>[Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ofr20071088">https://doi.org/10.3133/ofr20071088</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r27"><mixed-citation publication-type="other">McShane, R.R, and Eddy-Miller, C.A., 2021, Input data, model output, and R scripts for a machine learning streamflow model on the Wyoming Range, Wyoming, 2012&#x2013;17: U.S. Geological Survey data release, <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5066/P9XCP1AE">https://doi.org/10.5066/P9XCP1AE</ext-link>.</mixed-citation></ref>
<ref id="r28"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Miller</surname>, <given-names>M.P.</given-names></string-name>, <string-name><surname>Carlisle</surname>, <given-names>D.M.</given-names></string-name>, <string-name><surname>Wolock</surname>, <given-names>D.M.</given-names></string-name>, and <string-name><surname>Wieczorek</surname>, <given-names>M.</given-names></string-name></person-group>, <year>2018</year>, <article-title>A database of natural monthly streamflow estimates from 1950 to 2015 for the conterminous United States</article-title>: <source>Journal of the American Water Resources Association</source>, v.&#x00A0;<volume>54</volume>, no.&#x00A0;<issue>6</issue>, p.&#x00A0;<fpage>1258</fpage>&#x2013;<lpage>1269</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/1752-1688.12685">https://doi.org/10.1111/1752-1688.12685</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r29"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Moriasi</surname>, <given-names>D.N.</given-names></string-name>, <string-name><surname>Arnold</surname>, <given-names>J.G.</given-names></string-name>, <string-name><surname>Van Liew</surname>, <given-names>M.W.</given-names></string-name>, <string-name><surname>Bingner</surname>, <given-names>R.L.</given-names></string-name>, <string-name><surname>Harmel</surname>, <given-names>R.D.</given-names></string-name>, and <string-name><surname>Veith</surname>, <given-names>T.L.</given-names></string-name></person-group>, <year>2007</year>, <article-title>Model evaluation guidelines for systematic quantification of accuracy in watershed simulations</article-title>: <source>Transactions of the ASABE</source>, v.&#x00A0;<volume>50</volume>, no.&#x00A0;<issue>3</issue>, p.&#x00A0;<fpage>885</fpage>&#x2013;<lpage>900</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.13031/2013.23153">https://doi.org/10.13031/2013.23153</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r30"><mixed-citation publication-type="web">Multi-Resolution Land Characteristics Consortium, 2017, National Land Cover Database (NLCD) 2011: Multi-Resolution Land Characteristics Consortium, accessed July 14, 2017, at <ext-link ext-link-type="uri" xlink:href="https://www.mrlc.gov">https://www.mrlc.gov</ext-link>.</mixed-citation></ref>
<ref id="r31"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nash</surname>, <given-names>J.E.</given-names></string-name>, and <string-name><surname>Sutcliffe</surname>, <given-names>J.V.</given-names></string-name></person-group>, <year>1970</year>, <article-title>River flow forecasting through conceptual models, part 1&#x2014;A discussion of principles</article-title>: <source>Journal of Hydrology (Amsterdam)</source>, v.&#x00A0;<volume>10</volume>, no.&#x00A0;<issue>3</issue>, p.&#x00A0;<fpage>282</fpage>&#x2013;<lpage>290</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/0022-1694(70)90255-6">https://doi.org/10.1016/0022-1694(70)90255-6</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r32"><mixed-citation publication-type="other">National Operational Hydrologic Remote Sensing Center, 2004, Snow data assimilation system (SNODAS) data: National Snow and Ice Data Center, accessed March 11, 2018, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.7265/N5TB14TC">https://doi.org/10.7265/N5TB14TC</ext-link>.</mixed-citation></ref>
<ref id="r33"><mixed-citation publication-type="other">O&#x2019;Donnell, M.S., Fancher, T.S., Freeman, A.T., Ziegler, A.E., Bowen, Z.H., and Aldridge, C.L., 2014, Large scale Wyoming transportation data&#x2014;A resource planning tool: U.S. Geological Survey Data Series 821, 21 p., accessed March 13, 2020, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ds821">https://doi.org/10.3133/ds821</ext-link>.</mixed-citation></ref>
<ref id="r34"><mixed-citation publication-type="other">Oriel, S.S., and Platt, L.B., 1980, Geologic map of the Preston l degree x 2 degree quadrangle, southeastern Idaho and western Wyoming: U.S. Geological Survey Miscellaneous Investigations Series Map I&#x2013;1127, 1 sheet. [Also available at <ext-link ext-link-type="uri" xlink:href="https://ngmdb.usgs.gov/Prodesc/proddesc_8978.htm">https://ngmdb.usgs.gov/Prodesc/proddesc_8978.htm</ext-link>.]</mixed-citation></ref>
<ref id="r35"><mixed-citation publication-type="web">POLARIS, 2019, 30-m probabilistic maps of soil properties over the contiguous United States: Durham, N.C., Duke University, accessed June 15, 2019, at <ext-link ext-link-type="uri" xlink:href="http://www.polaris.earth">www.polaris.earth</ext-link>.</mixed-citation></ref>
<ref id="r36"><mixed-citation publication-type="web">PRISM Climate Group, 2018, Parameter-elevation regressions on independent slopes model (PRISM) climate data: Corvallis, Oreg., Oregon State University, accessed March 11, 2018, at <ext-link ext-link-type="uri" xlink:href="https://prism.oregonstate.edu">https://prism.oregonstate.edu</ext-link>.</mixed-citation></ref>
<ref id="r37"><mixed-citation publication-type="web">R Core Team, 2021, R&#x2014;A language and environment for statistical computing: Vienna, Austria, R Foundation for Statistical Computing, ver. 4.0.5., accessed March 4, 2021, at <ext-link ext-link-type="uri" xlink:href="https://www.R-project.org">https://www.R-project.org</ext-link>.</mixed-citation></ref>
<ref id="r38"><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Rantz</surname>, <given-names>S.E.</given-names></string-name>, and <etal>others</etal></person-group>, <year>1982</year>, <source>Measurement and computation of streamflow</source>: <series>U.S. Geological Survey Water-Supply Paper</series> <volume>2175</volume>, <size units="page">631</size>&#x00A0;p. <comment>[Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/wsp2175">https://doi.org/10.3133/wsp2175</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r39"><mixed-citation publication-type="web">Ridgeway, G., 2020, Generalized boosted models&#x2014;A guide to the gbm package: R vignette, accessed August 18, 2020, at <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/gbm/vignettes">https://cran.r-project.org/web/packages/gbm/vignettes</ext-link>.</mixed-citation></ref>
<ref id="r40"><mixed-citation publication-type="book">Sauer, V.B., and Turnipseed, D.P., 2010, Stage measurement at gaging stations: U.S. Geological Survey Techniques and Methods book 3, chap. A7, 45 p. [Also available at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/tm3A7">https://doi.org/10.3133/tm3A7</ext-link>.]</mixed-citation></ref>
<ref id="r41"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Senay</surname>, <given-names>G.B.</given-names></string-name>, <string-name><surname>Bohms</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Singh</surname>, <given-names>R.K.</given-names></string-name>, <string-name><surname>Gowda</surname>, <given-names>P.H.</given-names></string-name>, <string-name><surname>Velpuri</surname>, <given-names>N.M.</given-names></string-name>, <string-name><surname>Alemu</surname>, <given-names>H.</given-names></string-name>, and <string-name><surname>Verdin</surname>, <given-names>J.P.</given-names></string-name></person-group>, <year>2013</year>, <article-title>Operational evapotranspiration mapping using remote sensing and weather datasets&#x2014;A new parameterization for the SSEB approach</article-title>: <source>Journal of the American Water Resources Association</source>, v.&#x00A0;<volume>49</volume>, no.&#x00A0;<issue>3</issue>, p.&#x00A0;<fpage>577</fpage>&#x2013;<lpage>591</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/jawr.12057">https://doi.org/10.1111/jawr.12057</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r42"><mixed-citation publication-type="web">Shen, C., Elshorbagy, A., Gupta, H., and Nearing, G., 2018, Big data and machine learning in water sciences&#x2014;Recent progress and their use in advancing science: special section <italic>of</italic> Water Resources Research, accessed March 13, 2020, at <ext-link ext-link-type="uri" xlink:href="https://agupubs.onlinelibrary.wiley.com/doi/toc/10.1002/(ISSN)1944-7973.MACHINELEARN">https://agupubs.onlinelibrary.wiley.com/doi/toc/10.1002/(ISSN)1944-7973.MACHINELEARN</ext-link>.</mixed-citation></ref>
<ref id="r43"><mixed-citation publication-type="web">Soil Survey Staff, 2016, U.S. general soil map (STATSGO2): Natural Resources Conservation Service, accessed February 11, 2019, at <ext-link ext-link-type="uri" xlink:href="https://gdg.sc.egov.usda.gov">https://gdg.sc.egov.usda.gov</ext-link>.</mixed-citation></ref>
<ref id="r44"><mixed-citation publication-type="web">Soil Survey Staff, 2017, Gridded soil survey geographic (gSSURGO) database for the conterminous United States: Natural Resources Conservation Service, accessed May 14, 2019, at <ext-link ext-link-type="uri" xlink:href="https://gdg.sc.egov.usda.gov">https://gdg.sc.egov.usda.gov</ext-link>.</mixed-citation></ref>
<ref id="r45"><mixed-citation publication-type="web">Stoeser, D.B., Green, G.N., Morath, L.C., Heran, W.D., Wilson, A.B., Moore, D.W., and Van Gosen, B.S., 2005, Preliminary integrated geologic map databases for the United States&#x2014;Central States&#x2014;Montana, Wyoming, Colorado, New Mexico, Kansas, Oklahoma, Texas, Missouri, Arkansas, and Louisiana: U.S. Geological Survey Open-File Report 2005&#x2013;1351, digital data, accessed March 10, 2017, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ofr20051351">https://doi.org/10.3133/ofr20051351</ext-link>.</mixed-citation></ref>
<ref id="r46"><mixed-citation publication-type="web">Tarboton, D.G., 2016, Terrain analysis using Digital Elevation Models (TauDEM): Logan, Utah, Utah State University, ver. 5.3.7, accessed December 19, 2017, at <ext-link ext-link-type="uri" xlink:href="https://hydrology.usu.edu/taudem">https://hydrology.usu.edu/taudem</ext-link>.</mixed-citation></ref>
<ref id="r47"><mixed-citation publication-type="web">U.S. Geological Survey, 2016, Watershed boundary dataset: U.S. Geological Survey digital data, accessed September 15, 2016, at <ext-link ext-link-type="uri" xlink:href="https://nrcs.app.box.com/v/gateway/folder/18546994164">https://nrcs.app.box.com/v/gateway/folder/18546994164</ext-link>.</mixed-citation></ref>
<ref id="r48"><mixed-citation publication-type="web">U.S. Geological Survey, 2018, Simplified surface energy balance (SSEBop) actual evapotranspiration data for the conterminous U.S.: U.S. Geological Survey data release, accessed April 12, 2018, at <ext-link ext-link-type="uri" xlink:href="https://www.sciencebase.gov/catalog/item/54dd5d21e4b08de9379b38b6">https://www.sciencebase.gov/catalog/item/54dd5d21e4b08de9379b38b6</ext-link>.</mixed-citation></ref>
<ref id="r49"><mixed-citation publication-type="web">U.S. Geological Survey, 2019a, National Hydrography Dataset Plus: U.S. Geological Survey digital data, accessed February 11, 2019, at <ext-link ext-link-type="uri" xlink:href="https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus">https://www.epa.gov/waterdata/nhdplus-national-hydrography-dataset-plus</ext-link>.</mixed-citation></ref>
<ref id="r50"><mixed-citation publication-type="web">U.S. Geological Survey, 2019b, National Hydrography Dataset Plus High Resolution: U.S. Geological Survey digital data, accessed May 14, 2019, at <ext-link ext-link-type="uri" xlink:href="https://www.usgs.gov/core-science-systems/ngp/national-hydrography/access-national-hydrography-products">https://www.usgs.gov/core-science-systems/ngp/national-hydrography/access-national-hydrography-products</ext-link>.</mixed-citation></ref>
<ref id="r51"><mixed-citation publication-type="book">U.S. Geological Survey, 2021, USGS surface-water data for Wyoming, <italic>in</italic> USGS water data for the Nation: U.S. Geological Survey National Water Information System database, accessed June 16, 2020, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.5066/F7P55KJN">https://doi.org/10.5066/F7P55KJN</ext-link>. [State information directly accessible at <ext-link ext-link-type="uri" xlink:href="https://waterdata.usgs.gov/wy/nwis/sw">https://waterdata.usgs.gov/wy/nwis/sw</ext-link>.]</mixed-citation></ref>
<ref id="r52"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Walker</surname>, <given-names>R.H.</given-names></string-name>, <string-name><surname>Girard</surname>, <given-names>C.E.</given-names></string-name>, <string-name><surname>Alford</surname>, <given-names>S.L.</given-names></string-name>, and <string-name><surname>Walters</surname>, <given-names>A.W.</given-names></string-name></person-group>, <year>2020</year>, <article-title>Anthropogenic land-use change intensifies the effect of low flow on stream fishes</article-title>: <source>Journal of Applied Ecology</source>, v.&#x00A0;<volume>57</volume>, no.&#x00A0;<issue>1</issue>, p.&#x00A0;<fpage>149</fpage>&#x2013;<lpage>159</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1111/1365-2664.13517">https://doi.org/10.1111/1365-2664.13517</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r53"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Walters</surname>, <given-names>A.W.</given-names></string-name>, <string-name><surname>Girard</surname>, <given-names>C.E.</given-names></string-name>, <string-name><surname>Walker</surname>, <given-names>R.H.</given-names></string-name>, <string-name><surname>Farag</surname>, <given-names>A.M.</given-names></string-name>, and <string-name><surname>Alvarez</surname>, <given-names>D.A.</given-names></string-name></person-group>, <year>2019</year>, <article-title>Multiple approaches to surface water quality assessment provide insight for small streams experiencing oil and natural gas development</article-title>: <source>Integrated Environmental Assessment and Management</source>, v.&#x00A0;<volume>15</volume>, no.&#x00A0;<issue>3</issue>, p.&#x00A0;<fpage>385</fpage>&#x2013;<lpage>397</lpage>.<comment> [Also available at </comment><ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1002/ieam.4118">https://doi.org/10.1002/ieam.4118</ext-link><comment>.]</comment></mixed-citation></ref>
<ref id="r54"><mixed-citation publication-type="web">Wolock, D.M., 2003, Base-flow index grid for the conterminous United States: U.S. Geological Survey Open-File Report 2003&#x2013;263, digital data, accessed February 10, 2018, at <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3133/ofr03263">https://doi.org/10.3133/ofr03263</ext-link>.</mixed-citation></ref>
<ref id="r55"><mixed-citation publication-type="web">Wyoming State Engineer&#x2019;s Office, 2016, SEO Wells: Wyoming State Engineer&#x2019;s Office, accessed July 15, 2018, at <ext-link ext-link-type="uri" xlink:href="https://sites.google.com/a/wyo.gov/seo/seo-files">https://sites.google.com/a/wyo.gov/seo/seo-files</ext-link>.</mixed-citation></ref>
<ref id="r56"><mixed-citation publication-type="web">Wyoming State Geological Survey, 2014, Wyoming bedrock geology: Wyoming State Geological Survey digital data, accessed May 14, 2019, at <ext-link ext-link-type="uri" xlink:href="https://www.wsgs.wyo.gov/pubs-maps/gis.aspx">https://www.wsgs.wyo.gov/pubs-maps/gis.aspx</ext-link>.</mixed-citation></ref>
<ref id="r57"><mixed-citation publication-type="web">Wyoming State Geological Survey, 2015, Wyoming surficial geology: Wyoming State Geological Survey digital data, accessed May 14, 2019, at <ext-link ext-link-type="uri" xlink:href="https://www.wsgs.wyo.gov/pubs-maps/gis.aspx">https://www.wsgs.wyo.gov/pubs-maps/gis.aspx</ext-link>.</mixed-citation></ref>
<ref id="r58"><mixed-citation publication-type="web">Wyoming Water Development Office, 2007, Points of diversions: Wyoming Water Development Office digital data, accessed July 15, 2018, at <ext-link ext-link-type="uri" xlink:href="https://waterplan.state.wy.us/plan/statewide/2007/gis">https://waterplan.state.wy.us/plan/statewide/2007/gis</ext-link>.</mixed-citation></ref>
<ref id="r59"><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zielinski</surname>, <given-names>G.W.</given-names></string-name>, <string-name><surname>DeCoursey</surname>, <given-names>G.M.</given-names></string-name>, <string-name><surname>Drahovzal</surname>, <given-names>J.A.</given-names></string-name>, and <string-name><surname>Ruperto</surname>, <given-names>J.M.</given-names></string-name></person-group>, <year>1985</year>, <article-title>Hydrothermics in the Wyoming Overthrust Belt&#x2014;United States</article-title>: <source>The American Association of Petroleum Geologists Bulletin</source>, v.&#x00A0;<volume>69</volume>, no.&#x00A0;<issue>5</issue>, p.&#x00A0;<fpage>699</fpage>&#x2013;<lpage>709</lpage>.</mixed-citation></ref>
</ref-list>
<notes notes-type="colophon">
<sec>
<p>For more information about this publication, contact:</p>
<p>Director, USGS Wyoming-Montana Water Science Center</p>
<p>3162 Bozeman Avenue</p>
<p>Helena, MT 59601</p>
<p>406&#x2013;457&#x2013;5900</p>
<p>For additional information, visit: <ext-link ext-link-type="uri" xlink:href="https://www.usgs.gov/centers/wy-mt-water/">https://www.usgs.gov/centers/wy-mt-water/</ext-link></p>
<p>Publishing support provided by the</p>
<p>Lafayette and Rolla Publishing Service Centers</p>
</sec>
</notes>
</book-back>
</book>
