USGS Scientific Investigations Report 2010–5246  - Three-Dimensional Model of the Geologic Framework for the Columbia Plateau Regional Aquifer System, Idaho, Oregon, and Washington 

Geologic Modeling Methodology 

The only previous attempt to build a cohesive and consistent three-dimensional representation of the major geologic units that covers the bulk of the CPRAS was accomplished as part of the Columbia Plateau Regional Aquifer-System Analysis (RASA) (Drost and others, 1990). For this work, more than 2,000 wells in Washington and Oregon were used in conjunction with available geologic interpretations to construct maps of elevation of top and thickness of the Saddle Mountains, Wanapum, and Grande Ronde Basalt units, as well as thickness of the Overburden unit and the Mabton and Vantage interbeds (Drost and others, 1990). Since that work has been completed, several detailed studies of the geology and hydrology of parts of the CPRAS have been conducted (for example, Jones and others, 2006; Leek, 2006; Lindsey and others, 2007; Tolan and others, 2007; and Jones and Vaccaro, 2008) with considerable effort towards refinement of the geologic framework to better understand the hydrogeology. Additional well data from these published data sources plus the USGS National Water Information System (NWIS) results in a total of 13,226 wells with relevant information that may be used to refine the RASA work. The goal of the work described in this report was to rigorously incorporate all of this additional data into an improved three-dimensional representation of the CPRAS. 

For the RASA unit top elevation and thickness maps, a subset of the currently available well data was manually contoured to derive hydrologic unit elevations and thicknesses. Using few data has the advantage that contours may easily match data exactly. The following method is a numerical implementation of this process, except the resulting surfaces are trend surfaces that match all data on average. After construction of trend surfaces, geologic principles were used to define model surfaces and their extents. The trend modeling techniques were selected to aid in the evaluation of uncertainty of model-generated stratigraphic elevations, which is an important consideration when constructing a hydrogeologic framework for a groundwater flow model. 

During development of the geologic model, data density was insufficient to resolve the interbed geometries explicitly. The interbed units are present over much of the study area, but are thin when compared with the thickness of the basalt units, and the interbed units are often discontinuous over a short distance. The interbed units constitute a portion of the thickness between two adjacent basalt unit tops, and the typical thickness of the interbed units may be described as a function of the total distance between these tops. This relation is described in section, “Modeling Major Sedimentary Interbeds,” but these units are not explicitly shown on resulting maps or in cross-sections. 

General Geologic Modeling Workflow 

Geologic principles were used in the interpretation of the compiled data to generate all model unit geometries (fig. 3). The general process from compiling the data through creation of a three-dimensional solids model and assessing model fit is shown in figure 3A. It is called a solids model because each model-generated unit is completely defined at all points in space, filling a volume defined by the unit’s extent, top, and bottom. In this way, the top, bottom, thickness, and volume are all fully consistent, meaning that the bottom of a unit is the top of the underlying unit, and the thickness at a given point is the distance between the two surfaces defining a unit’s top and bottom. This was not true of the RASA interpreted surfaces (Drost and others, 1990) except in an approximate sense, because tops were created by contouring stratigraphic picks and thicknesses were created by contouring thickness points. The interpretation was independent for each top and each thickness, except where model-generated thickness was used to infer top elevation where no data were present. 

In contrast, the current method applies techniques to create model-generated surfaces on the basis of stratigraphic picks for the top of each of the major rock units. Then geologic principles are used, such as the laws of superposition, original horizontality, and lateral continuity, to create a solids model from which thicknesses are derived (fig. 3A). Because the major rock units to be represented are flood basalts, it is assumed that the flood basalts flowed out onto the paleotopography (the underlying geologic unit), filling depressions and flowing laterally away from the source. As a result, whenever model-generated surfaces cross each other, the overlying unit top is truncated where it meets the underlying unit top, and the void between the surfaces is filled with the overlying unit lithology. In the final step, the model is truncated with a surface representing the erosional top of rock across the study area. For much of the study area, CRBG or Pre-Miocene rock is at or very near land surface, so the solids model is truncated by the land-surface digital elevation map (DEM). For areas with thick sequences of sedimentary overburden, the erosional top of rock is corrected as described in section, “Modeling the Overburden Unit.” 

Modeling Top Surfaces of the Major Units 

Various methods are available to create surfaces from data points for use in solids modeling. These methods may be divided into exact interpolators and trend models. Exact interpolation methods create surfaces that pass through each measured value and interpolate values between these known points. The rules of interpolation between known points are highly variable and prescribed by the method selected. In many cases, the surfaces generated by these interpolation methods may look physically unrealistic in a geologic sense, which is often an artifact of data clustering. In areas with a high density of data that have an appreciable amount of variability, the resulting surface is irregular (rugose), with a shape that is defined more by the data spacing and interpolation algorithm artifacts than the true underlying shape of the geologic deposit. Commonly, data are closely spaced in some areas and sparse in others. As a result, surfaces generated using exact interpolators are characterized by high rugosity in areas of high data density and a high degree of smoothness in areas with few data. The resulting surfaces commonly do not resemble the modeler’s conceptualization of the shape of the geologic unit being simulated. However, even though the data may be “noisy,” it may exhibit obvious trends for each of the units. 

As an alternative, interpolation can be viewed as a statistical problem. Trend fitting algorithms tend to smooth the resulting interpolated surface in areas of high data density and provide a good fit to data in sparse data areas. The mismatch between the data and the trend surface (called residuals) is a measure of information that is not represented by the trend. The ideal trend surface is one for which the residuals are random or, more precisely, stationary. Stationary means that there are no trends in the randomness of the residuals across the region being simulated. This stationary trend surface is a best estimate (in a quantifiable sense) given that the data may not support resolution of fine-scale features into geologically relevant shapes. The residuals provide a method of estimating error associated with using the trend surface to estimate geologic surface elevation at an unknown location by using the kriging paradigm. Further, because stationarity of the data to be interpolated is an implicit assumption of most geostatistical algorithms (including kriging) (Deutsch, 2002, p. 180), decomposition of the data into a trend and stationary residual strengthens the assumptions used for error estimation of the resulting surfaces. 

The process of creating smooth trend surfaces is shown for the two-dimensional case in figure 3B. First, data sources are compiled for use in the interpolation (step 1). Compilation may include an assessment of confidence in the data, resulting in a weighting scheme that reflects this confidence. High-density geologic data exhibit variability associated with measurement error, interpretive error, and natural variability of stratigraphic surfaces resulting from depositional and erosional processes. These data are then fit with a smooth curve (step 2). Misfit between data and the trend curve (residuals) are then assessed (step 3). Although the distribution of residuals may appear to be symmetrical in a histogram, implying the residuals are random, the spatial distribution of residuals also must be examined to detect persistent spatial trends. Spatial trends in residuals would indicate that the residuals are not stationary, as desired for the resulting model.  

Following step 3, the fit is evaluated for adequacy. If the fit is determined to be inadequate, then additional information is added, and steps 2 and 3 are repeated. In the example (fig. 3B), a mapped fault is interpreted as the reason for the systematic trends in residuals. Addition of this fault as an interpolation barrier allows the data to be fit piecewise with smooth lines, thus greatly improving the fit as measured by both the histogram (or summary statistics) and the absence of persistent visual trends, indicating that the residuals are stationary. For this project, a piecewise implementation of the S-Plus local regression model called loess (Cleveland and others, 1992) was used. (To prevent confusion of the loess interpolation method with the wind-blown loess deposits within the CPRAS, the loess interpolation method is distinguished in this report by use of italics.) 

The loess algorithm uses a local linear or quadratic function [specified by the degree variable = 1 (first order polynomial) or 2 (second order polynomial), respectively] to make an estimate of elevation at any given location. During the interpolation, variable weights are assigned to data points on the basis of distance from the point being estimated using a tri-cube weight function. The amount of smoothing is prescribed by the span variable, with smoothness increasing as a function of span. A larger span means that more data are used to make the estimate at each location. The loess algorithm has many optional parameters, but the final parameter of note here is family. This parameter may have values of “Gaussian” or “Symmetric,” which helps control the fit objective for the algorithm. The default value of “Gaussian” was used here, which results in a best local linear or quadratic fit (specified by the degree) that attempts to attain normally distributed residuals with a zero mean, resulting in a trend surface with the desired random residuals. 

Loess is considered a data-intensive method because it requires densely sampled datasets for best results, and numerical artifacts may occur in low data density situations. 

In practice, steps 2 and 3 (fig. 3B) proceed iteratively. After the first iteration, a visual analysis of residuals allows rapid assessment of locations where the loess trend surface does a poor job of fitting the data. For example, in heavily faulted areas, such as the Yakima area, elongate groups of adjacent dark red and dark blue colored residuals are indicative of significant offset on faults. These bands are the two-dimensional equivalent of the residuals shown in step 3, with all residuals on one side of a fault trace being persistently positive and the other side being persistently negative. Analysis of all such areas allows compilation of a simplified fault map for the Columbia Plateau that shows the areas of large amounts of offset by major faults or fault zones. As a simplification, all faults are assumed to be vertical, and whenever a unit is assumed to be faulted, all older underlying geologic units also are assumed to be faulted. 

Incorporation of the faults as interpolation barriers (example in fig. 3B) during subsequent interpolations yields a much better fit, but the current version of loess in S-Plus does not support interpolation with barriers. To overcome this restriction, a script was written in the S programming language that allowed loess interpolation groupwise. Group numbers were defined in a geographic information system database by defining polygons that create natural groups that are bounded by geologic structural features. Group numbers were assigned to data residing in each group, and each group was interpolated separately. Although faults commonly provide clear boundaries for groups, faults in the study area are frequently discontinuous, requiring the modeler to select where to separate groups for the groupwise interpolation. This artificial separation of groups in the absence of a fault, however, may result in interpreted surfaces not joining smoothly (fig. 4). To rectify this potential problem, group boundaries were defined as necessary to facilitate loess trend removal, but after this step, loess estimates were retained only at known data points and for selected support points that ensure loess trends are preserved. These loess trend points are then reinterpolated using the default ArcGIS spline with barriers routine (exact interpolator) where the only barriers used are mapped faults that have sufficient offset to be visible in the residuals distribution. This technique results in a final surface that is smoothly connected between interpolation groups except where high-offset mapped faults allow disconnection (compare figs. 3B and 4).  

Because the spline is exact and smooth and the loess points vary smoothly, all properties of the residual are retained and the resulting trend surface is virtually indistinguishable from the loess trend surface in most areas. The areas where the spline surface differs from the loess trend are in the space between interpolation groups where no trend support points are retained. In the absence of loess trend points, the spline will smoothly connect the closest points, which may not be the same as the trend model prediction (fig. 4). This method works well where the offset is small, but if the apparent offset is the result of a large structural fold rather than a fault, then the tightness of the model-generated fold will be a function of the data and loess support points retained for the spline model. 

Many folds within the CPRAS tend to be relatively tight and over a short distance with faults commonly transitioning into folds along the fault trace. This feature of the geology implies that loess trend support points are good estimators of geologic contact elevation until the trend points are close to tight folds or faults. As a modeling assumption, prior to interpolation using spline with barriers, all loess trend support points are removed from the interpolation datasets if the points are within 2 mi of a group barrier that is not mapped as a fault. This assumption means that unless there are data points within the 2-mi buffer to support a tighter fold, the spline surface will smoothly connect the trend points from adjacent groups over a 4-mi distance. If the fold is gentle (such as in fig. 4), then the effect is minimal; but if the fold is tighter than 4 mi across (2 mi on each side of the group line), then the model-generated fold radius may be different from reality unless data are available that define the fold more accurately (recall that only trend support points were removed, but that data were retained if present within the 2-mi buffer). An additional benefit of this modeling strategy is that faults may grade from high offset to zero offset along their trace as faults transition into a fold as is observed in many areas of the CPRAS. 

Another benefit of using groups in the loess script is that it allows the modeler to vary the loess parameters degree and span on a groupwise basis. This ability was used sparingly and only when data density supported this action and reflected the need to capture some feature of the system, such as a tight fold exposed in the surficial geology and supported by well data. For all trend surfaces created, degree was set to one (local linear), and span was varied to create the smoothest fit to the data that removed persistent spatial trends in the residuals. In areas of high data density, it is possible to continue to dissect the data and remove additional small-scale trends, but given the regional modeling objectives, small-scale features were not resolved in all cases. The point at which small-scale trends or data anomalies are considered negligible relative to the modeling objectives is a modeling choice. 

In summary, piecewise loess trend surfaces were created for the bounding surfaces of each of the rock units to be used in the modeling process (fig. 3B). These surfaces are smooth, and this smoothness represents the uncertainty in the predicted surface, which may be quantified using the data residuals. This smoothness will be retained in the final model except where the land surface DEM intersects the unit (fig. 3A). Because the Mabton and Vantage interbeds will be assumed to be present only where overlying CRBG units are present, the trend surfaces created for the CRBG and Older Bedrock units completely define the extent of the solids model except where erosion has created and filled space now occupied by the Overburden unit. 

Modeling the Overburden Unit 

Only sequences of overburden that potentially serve as aquifers connected with the regional flow system were modeled. The thickness of the Overburden unit is modeled as filling the space between the land surface and the top of the uppermost bedrock unit; however, it is a poor modeling assumption that sediment fills the distance between the underlying model trend surface and the land-surface DEM. This assumption works well for thick sedimentary deposits, but performs poorly in areas dominated by bedrock outcrops at land surface. Poor performance in areas with thin deposits occurs because the bedrock has an irregular surface, so using the trend of this surface will result in all bedrock highs being simulated as sedimentary deposits instead of erosion-resistant bedrock (fig. 5).  

Another option that provides some benefits but introduces other problems is to model the overburden as a thickness that will be subtracted from the land-surface elevation. This method improves the representation of sedimentary overburden in areas dominated by bedrock outcrops, but subtracting thickness from a land-surface DEM using sparse data or a model-generated thickness trend map results in translating surface rugosity inherited from the DEM to the model-generated top of bedrock. Modeling the surface in this manner is inconsistent with the previous modeling assumption that buried surfaces should be simulated as smooth to reflect the uncertainty in the model-generated surface. Because the two approaches described above perform well in different situations, a hybrid of these approaches was used to create a top of bedrock surface (fig. 6), thereby defining the geometry of the Overburden unit. 

Sufficient coverage of overburden thickness data (from wells and mapped surficial geology) is available to create a sedimentary overburden thickness map for areas within the CPRAS that form extensive sedimentary aquifers (fig. 6, step 1). To ensure that all streams are underlain by sediments, the model assumes that 25 ft of overburden underlies streams for which no proximal data are available. Subtracting the smooth sedimentary thickness surface from the land-surface DEM gives a top of bedrock map (fig. 6. step 2); however, the resulting map inherits the rugosity of the topography even though the data do not support this resolution. This high rugosity top of bedrock is an estimate of the true top of bedrock. 

The elevation of the top of thick hydraulically important deposits is controlled by depositional processes and is defined by the land-surface elevation of thick sequences of sedimentary overburden and stream elevations (fig. 6, step 3). Intersecting the high-rugosity top of bedrock estimate (step 2) with the depositional level surface (step 3) delineates areas where each surface is higher than the other (step 4). If the top of bedrock surface is higher than the depositional level surface, then erosionally resistant highs are assumed to occur (step 5). If the depositional level surface is higher than the top of rock surface, then sediment is assumed to overlie the rock. These assumptions work well in areas where bedrock is exposed in valley sides, providing a zero thickness of sedimentary overburden, or in steep-sided valleys. 

The resulting map is used to define areas where sufficient sedimentary overburden is present to be included in the model. In areas where sedimentary overburden is assumed to be negligible, the top of bedrock is land surface (step 6). In areas where sedimentary overburden is to be simulated, the two-dimensional loess algorithm was utilized to construct a trend surface of the elevation of the top of rock using data from wells. The final top of bedrock model is constructed by stitching this trend surface to the areas where the bedrock is simulated as exposed. 

The result is a bedrock surface that reflects the undulating land surface where bedrock is exposed, but is smooth where bedrock is overlain by sedimentary overburden (fig. 6, Result). Thick sequences of overburden that are known to contain and transmit groundwater will be represented. This model-generated top of bedrock is now used in place of the land-surface topography DEM for the process illustrated in figure 3A, and overburden is simulated as filling the distance between the model-generated top of bedrock and the land-surface topography. 

Using Thickness Maps for Quality Assurance and Model Revision 

After generation of a geologic model, the results are analyzed numerically and visually for fit to assess that the resulting geology is reasonable (fig. 3A). The previous sections describe generation of all surfaces necessary to create the model given the available data (fig. 1). However, the well data often do not provide data about deeper units because the purpose of most wells is to extract water from the uppermost productive zone permitted by law (fig. 7 “Reality”; productive aquifers are assumed to be present at each unit top). As a result, sampling is biased and, in the presence of geologic structure, trend modeling may result in undesirable artifacts (fig. 7, step 1, “modeled pinchout”). The thickness of the CRBG deposits is assumed to vary smoothly in space, generally thicker near the source of lava, and thinning distally. This pattern may be complicated by deposition onto faulted, folded, or eroded topography; however, anomalous thickening or thinning in the resulting model that is not explained by structure is assumed to be a numerical artifact resulting from trend surface generation. 

If the resulting model is determined to be inadequate (fig. 7, step 2), then new trend surfaces are generated using additional guide points (fig. 7, step 3). In areas with little or no data available for deeper layers, guide points are generated by inference. Geologic contact elevations in wells are used to estimate underlying geologic contact elevations by subtracting estimated thickness of geologic units. Thickness values at each point are extracted from crude thickness trend maps that are developed using loess interpolation of thickness data. These new guide points are then added to the trend modeling dataset to ensure that model-generated thicknesses are consistent with the conceptual model. This process is repeated until the final model fits the conceptual model. 

Post-Processing of Model-Generated Units 

Because the modeling process uses the intersection of trend surfaces, the process may predict that a unit is present in a location where it is not known to be present. Two possibilities exist: (1) Either the unit does exist at the location, but has not been mapped or identified in a borehole, or (2) the unit is a numerical artifact associated with the modeling process. Most commonly, the numerical artifact is created as a result of using trend models that represent the average elevation of a surface, rather than the high points associated with erosion-resistant portions of the geologic deposit. This situation is analogous to the situation that created the isolated deposits on topographic highs shown in figure 5, and, again, the erroneously simulated unit has the same resulting pattern. 

In order to remove the numerical artifacts for the bedrock layers, a post-processing filter was used to clean the final solids model. This filter removed portions of a model-generated layer if the portion of the layer was small, discontinuous, and not known to be present at that location. These isolated model artifacts are called orphans. The filter works by finding all locations where the unit was mapped or found in a well, and if no portion of the orphan is intersected by any of these locations, the orphan thickness is removed from that model-generated layer and added to the underlying layer. Generally, only small orphans are removed because thick portions of the model-generated unit tend to be connected to some location where the unit is known to be present, implying that the resulting geometry is geologically reasonable. Only a few large orphans were removed, consisting of Saddle Mountains Basalt in the northeastern part of the model area. This area was scoured by the catastrophic Missoula Floods (Richmond and others, 1965; Waitt and Thorson, 1983) and is mapped as being covered by loess deposits (fig. 1A). If any of the Saddle Mountains Basalt orphans were removed erroneously, then the orphans would be simulated as Wanapum Basalt (underlying unit), which is not problematic for the purposes of groundwater flow modeling.  

Modeling Major Sedimentary Interbeds 

In the modeling process, only two sedimentary interbeds are considered: the Vantage interbeds and the Mabton interbed. These deposits correspond to the two lengthy periods between depositions of the major CRBG formations. The Vantage interbeds lie between the Wanapum and the Grande Ronde Basalts, and the Mabton interbed lies between the Saddle Mountains and Wanapum Basalts. Two modeling assumptions guide the analysis. The first modeling assumption is that the interbeds are preserved only where the lava overlies it. This assumption is a simplification and a reasonable hydrogeologic assumption, because where the overlying sediments are in contact with the interbed, the geologic and hydrogeologic properties are similar. The second modeling assumption is that sedimentary sequences that have been preserved as interbeds were thickest in low-lying areas. Because the CRBG lavas fill these low areas, thicker interbeds correspond to thicker deposits of the overlying basalt. Using this assumption, the thickness of the interbeds is generated as a function of distance between the overlying and underlying basalt unit tops (the combined thickness of the interbed and the overlying basalt unit). 

Modeling of Faults 

Several assumptions regarding faults were implemented to facilitate modeling. All faults are represented as vertical faults, and closely spaced, parallel faults commonly were simplified to a single fault for the interpolation. Faults that cut younger strata are assumed to cross-cut older strata, but if a fault in an older unit was not required to explain the distribution of younger deposits, then the fault was not used as an interpolation or group barrier for the younger units. For this reason, there are fewer barriers in the upper units during the final interpolation of each layer.