USGS Scientific-Investigations Report 2010–5201 - Empirical Models of Wind Conditions on Upper Klamath Lake, Oregon

Results of Gap-Filling Wind Models

The MARS algorithm was used to determine which explanatory variables were the most significant (based on optimized values of nk, table 1) and should be kept in each model. These selected variables were then used as inputs to both the ANN and MARS models. The optimized, nondefault values of parameters size and decay for the ANN models are provided in table 1.

A different set of the most significant explanatory variables was determined for each dependent variable and for each of the preprocessing methods used (table 2). Some features of the different sets are consistent with the geographic context. For example, site BLB is in the sets of explanatory variables for MDN models more than in the sets for MDL models. Similarly, site SSHR is in the sets of variables for MDL models more than in the sets for MDN models. These results reflect the closer proximity of BLB to MDN and SSHR to MDL. In general, the sets of variables in table 2 are notable for the number and variety of variables that contribute significantly to the models. The set for each dependent variable contains data from all sites around the lake, as well as the east-west and north-south components.

Goodness-of-fit statistics for the MARS models (MARS_PPM1, MARS_PPM2, and MARS_NOPPM) and the neural network models (ANN_PPM1, ANN_PPM2, and ANN_NOPPM) are shown in table 3. The Nash-Sutcliffe error (Weglarczyk, 1998), defined as , where MSE is the mean-squared-error of the computed values, an is the variance of the observed values. NASH is a measure of the error between the simulated and observed time series normalized by the variability in the original time series, and approaches 1 as the mean-squared-error approaches zero. In general, NASH values greater than zero indicate a model fit that is more valuable than simply using the mean of the measured data. Likewise, NASH values less than zero indicate that a more accurate model could be achieved by using the mean of the measured data. The mean error, defined as (where m_c= mean of simulated values and m_o = mean of observed values), was calculated as a measure of the relative skewness of model results.

During calibration, ANN and MARS models resulted in similar fit statistics, whereas the validation period resulted in better statistical fits (indicated by higher NASH values and lower absolute BIAS values) for ANN models than for MARS models (table 3). The difference between validation and calibration of NASH values ranged from -0.07 to 0.05 for all ANN models, and from -0.26 to 0.06 for MARS models, indicating that ANN models resulted in a more consistent fit between validation and calibration periods than MARS models. This trend was most evident in the north-south MDL component MARS models during validation, where all three MARS models resulted in higher BIAS values than the north‑south MDL ANN models. Overall, NASH values were higher (indicating more accurate fits) for models of the MDL north-south component than for the MDL east-west component, but the opposite was true for the models of the MDN north-south and east-west components. At site MDL, the north-south component was larger in magnitude than the east-west component, and at site MDN, the east-west component was larger in magnitude than the north-south component (figs. 6 and 7). Thus, the models generally do a better job of fitting the dominant component of the wind vector at both sites. Among all models, peak winds generally were underpredicted. Comparisons of the three most accurate models are shown in figures 6 and 7. Of these three models, peak winds tended to be simulated more accurately by the ANN_NOPPM model at both MDN and MDL, which captures the high-frequency signal better than the other models. Possibly indicative of this ability to model the high-frequency signal, the NOPPM models had a greater number of significant inputs and greater input variable diversity than the PPM1 and PPM2 models (table 2).

One objective of the gap-filling models is to generate wind data at the raft sites on the lake for use in a spatial interpolation that incorporates data from these sites as well as sites on the shoreline into a spatially variable wind field to drive the hydrodynamic model. A measure of how well the gap-filling models meet this objective is how well the hydrodynamic model, using the wind forcing created using the simulated winds over the lake, is able to simulate observed water currents. The most accurate gap-filling models (ANN models) were used to simulate the winds at sites MDN and MDL from mid-July through August 2006. These simulated winds were then used in combination with the measured winds at the other four sites around the lake to create a spatially variable wind. The three-dimensional UnTRIM hydrodynamic model of the lake, which is described in Wood and others (2008), was then used to simulate water currents and water temperature with this new wind forcing. The period July 26–August 31, 2006, was selected because this was the period selected previously for model validation (Wood and others, 2008). Goodness-of-fit statistics for currents simulated using only observed winds and observed winds at shoreline sites in combination with winds simulated with the gap-filling models were compared to site ADCP1 (table 4, fig. 1), where measurements from an Acoustic Doppler Current Profiler (ADCP) were available (Gartner and others, 2007).

The root-mean-square error and particularly the bias as measured by the mean error increased substantially with the use of the simulated winds at MDN and MDL. From the comparison of the simulated currents (fig. 8), it seems that the increased error largely is attributable to weaker peak currents resulting from the model forced with simulated winds. This is true on a daily basis, but particularly is noticeable during times of high velocities. Simulated currents derived from observed winds also underpredicted measured currents, but the error is greater when the simulated winds are used to force the hydrodynamic model. The increase in the mean error (bias) when the best ANN wind model is used to simulate the winds is 0.7 cm/s, and the increase in the root-mean-squared error is about 1.2 cm/s. In general, the gap-filling models seem to be adequate to fill data gaps over short periods of time (a few days or less). Given the increase in the error statistics, the use of the gap-filling models over periods of many days should be evaluated in the context of the problem being addressed.

The accurate simulation of the water currents in the lake is important in part because the transport of water-quality constituents and passively drifting larval fish is determined by the water currents. Errors in the simulation of currents will propagate into errors in the simulation of transport. A numerical tracer was used to assess the difference in simulated transport resulting from the use of the simulated winds instead of the observed winds at the two raft sites. The numerical tracer was initialized to zero at the beginning of the simulation. The only source of the tracer during the simulation was the Williamson River, where the tracer was put into the lake at a concentration of 10 (units are arbitrary). The time evolution of the concentration of the tracer at two sites in the lake, in Goose Bay (GBE) and north of Buck Island (NBI; fig. 1), is shown in figure 9. Overall, the tracers derived from simulated winds closely matched those derived from observed winds. Specifically, the mean absolute difference over the 47-day simulation between the concentration of the tracer as simulated using only observed winds and the concentration of the tracer as simulated using simulated winds at the raft sites was 0.052, 0.031, and 0.047 (units are arbitrary) at site GBE for ANN_NOPPM, ANN_PPM1, and ANN_PPM2, respectively. At site NBI, the mean absolute difference was 0.022, 0.023, and 0.011 for ANN_NOPPM, ANN_PPM1, and ANN_PPM2, respectively. Equally as important as the mean absolute difference is the lack of error propagation in the simulated tracers through time. That is, small differences occurring earlier in the tracer concentration time series do not compound and result in much larger differences later in the simulation.