Skip Links

USGS - science for a changing world

Scientific-Investigations Report 2010–5201

Empirical Models of Wind Conditions on Upper Klamath Lake, Oregon

Statistical Methods and Model Algorithms

Two statistical modeling algorithms were used and compared in both the gap-filling and historical models: Artificial Neural Networks (ANN) and Multivariate Adaptive Regressive Splines (MARS). ANNs provide a flexible method of relating input and output variables through an interconnected mesh of nonlinear transfer functions. The R-package nnet provided the feed-forward, single-hidden-layer ANN that was used in this work (Venables and Ripley, 2002). Nondefault parameters used in the nnet algorithm were size (number of units in the hidden layer) and decay (weight decay). The number of units in the hidden layer (size parameter) generally determines the complexity of the neural-network model and is relative to the complexity of the dataset that is being used to train or calibrate the model. The weight decay (decay parameter) is dependent on the size parameter, such that the former will not have much effect until the latter is of a sufficient value. Therefore, the minimum number of hidden-layer units (size) was determined first, as it primarily determined the complexity of the ANN. After size was determined, the optimum decay value was determined while holding size constant. The best model results were achieved through an optimization process in which size and decay were adjusted by trial-and-error iteratively until model-fit statistics from the calibration and validation time periods matched as closely as possible. The goal of this optimization process was to obtain parameter values, which resulted in models that could accurately simulate a wide range of wind conditions.

MARS is a multiple-regression technique that allows for nonlinearity between the dependent and independent variables described by a discrete change in slope between linear segments. MARS can achieve a closer fit to measured data using a larger number of input variables than least-squares regression while still utilizing recursive linear-regression techniques (Friedman, 1991). The R-package earth incorporates the MARS algorithms and has the ability to sort through a large initial set of input variables by removing input variables stepwise from this initial model framework until the most statistically effective subset of variables is left, resulting in a model that is not overfit (Milborrow, 2009). The nondefault parameter in the earth package that was manipulated for this work was nk, which is one of the criteria used to limit the number of input variables that are used to predict the dependent variable in the MARS model construction process. The value of nk was set to the smallest possible value (thereby limiting the complexity of the model) that did not show a substantial degradation in performance over a more complex model (one with a higher nk value). This iterative process was guided by minimizing the difference between fit statistics calculated over the calibration and validation time periods.

Gap-Filling Wind Models

The dependent variables of the gap-filling models numbered four: the east-west and north-south components of the wind at two sites, MDN and MDL. The independent (explanatory) variables were the east-west and north-south components of the wind at six sites (HDB, BLB, WMR, SSHR, AGKO, KFLO), and air temperature at one site (WMR, fig. 1). Each time series was passed to both ANN and MARS models in three ways—with no preprocessing (NOPPM) and with two types of preprocessing with different degrees of smoothing. These two methods are denoted as preprocessing method 1 (PPM1), in which the original data were decomposed into low- and high-frequency components, and preprocessing method 2 (PPM2), in which an eigenvector filter was used to smooth the data (fig. 2).

The first preprocessing method (PPM1) consisted of decomposing each measured wind time series into low- and high-frequency components. The low-frequency component was a 24-hour moving average of the original data. The high-frequency component was obtained by subtracting the low-frequency component from the original data. This high‑frequency component was smoothed using a 3-hour moving average. The low- and high-frequency components were passed to separate models that were used to simulate the low- and high-frequency components of the dependent variables with both ANN and MARS. Low-frequency model output was added to the list of possible inputs that were used in the high-frequency models. The final output time series was obtained by adding the two simulated components together (fig. 3).

The second method of preprocessing the input data made use of an eigenvector filter, essentially performing a principal component analysis on the original time series and lagged copies of the time series. The function decevf in the R-package pastecs was used to perform this calculation (Ibañez and others, 2009). The parameter lag, in this filtering algorithm, was set to 4 hours after a trial-and-error optimization process. The goal of this filter optimization was to smooth each time series while minimizing the loss of important high-frequency fluctuations. The time series was then reconstructed using only the two most important eigenvectors, resulting in a smoothed version of the original time series. The preprocessed time series was then passed to both ANN and MARS models (fig. 2).

In order to incorporate the large-scale spatial features of the wind over the lake, time series of estimates of the two‑dimensional divergence (the amount of spreading of the wind vectors) and curl (the amount of rotation of the wind vectors) of the wind were calculated and used as two additional inputs to the wind models. Based on a two-dimensional representation (at ground level) of the wind at sites WME (the northernmost lake sites) and SSHR (the southernmost lake sites), the divergence (div) and curl (curl) of the wind field W (x,y) were calculated at each observation time as follows:


-, (1)

and


- (2)

The final preprocessing step was to scale the magnitude of each time series to the interval [0,1]. This assured that the range of all model inputs was the same, and in the case of the ANN model, that the range matched the range of the internal ANN output units (Venables and Ripley, 2002). Model inputs were scaled linearly using a method adopted from Rajee and others (2009):

- (3)

This scaling was repeated for each time series; thus, the values of min(x) and max(x) were unique to each time series, and furthermore, were calculated for the calibration dataset and used unchanged in the validation dataset. The output of all models was postprocessed using equation 4:

- (4)

In addition to preprocessing, lags were applied to each input variable to create additional model inputs that were mathematically independent. All variables were lagged by -6, +6, +12, or +24 hours, where a positive lag indicates that the time series was shifted forward in time and a negative lag indicates that it was shifted backward in time. All gap-filling models were calibrated using data from May 12, 2007, to September 29, 2007, and validated with data from May 12, 2006, to September 30, 2006. These dates bracketed the period when the rafts were located at sites MDN and MDL.

Historical Wind Models

Two methods were used to simulate a daily wind record over a long period of time. The first method, denoted HIST1, used as a dependent variable the daily mean wind speed at WMR (fig. 1). For HIST1, the daily mean wind direction at the Klamath Falls Airport (KLMT), with a constant rotation, was used as a proxy for the wind direction at WMR. (For this purpose, the daily mean wind direction is defined as the direction of the vector whose orthogonal components are the daily mean of the east-west and north-south wind components.) The rotation of +5 degrees was determined from the distribution of the difference between the daily mean wind direction at WMR and KLMT over the calibration time period (fig. 4). The independent (explanatory) variables for this method were the daily mean wind speed, relative humidity, air temperature, and daily cumulative solar radiation measured at AGKO and KFLO, as well as the daily mean wind speed, sky cover, dew point temperature, air temperature, and altimetric pressure at KLMT.

The second method, denoted HIST2, used as dependent variables the daily mean of the east-west and north-south components of the wind at WMR and simulated each component separately (fig. 5). The independent variables for HIST2 were the daily mean east-west and north-south components of the wind, daily cumulative solar radiation, daily mean relative humidity, and daily mean air temperature measured at AGKO and KFLO, as well as, the daily mean east-west and north-south components of the wind, sky cover, dew point temperature, air temperature, and altimetric pressure at KLMT.

Both historical wind models were calibrated using data from 2006 through 2007 and validated using data from 2008 through 2009. Preprocessing for HIST1 and HIST2 methods consisted of scaling each input variable to [0,1], as was done for the gap-filling models. The scaled time series were then passed to both MARS and ANN models (eq. 3). Postprocessing consisted of rescaling the output using equation 4.

First posted October 27, 2010

For additional information contact:
Director, Oregon Water Science Center
U.S. Geological Survey
2130 SW 5th Avenue
Portland, Oregon 97201
http://or.water.usgs.gov

Part or all of this report is presented in Portable Document Format (PDF); the latest version of Adobe Reader or similar software is required to view it. Download the latest version of Adobe Reader, free of charge.

Accessibility FOIA Privacy Policies and Notices

Take Pride in America logo USA.gov logo U.S. Department of the Interior | U.S. Geological Survey
URL: http://pubsdata.usgs.gov/pubs/sir/2010/5201/section4.html
Page Contact Information: GS Pubs Web Contact
Page Last Modified: Thursday, 10-Jan-2013 19:19:13 EST