Scientific Investigations Report 2006–5316
U.S. GEOLOGICAL SURVEY
Scientific Investigations Report 2006–5316
DOE/ID-22201
The goal of geostatistical analysis was to understand the distribution of sediment in a stratigraphic, three-dimensional spatial context in order to make defensible decisions during geostatistical modeling. In particular, statistical similarity among geologic units was evaluated to determine whether the sedimentary data were statistically stationary and could be modeled with kriging.
The total thickness of sediment encountered in boreholes around the INL is summarized in figure 4. Most sediment-rich boreholes occur within or near the boundary of the putative sedimentary basin known as the Big Lost Trough (dashed polygon). Note the relative paucity of boreholes that penetrate composite units older than unit 6. The data have been segregated into groups representing wells that fully penetrate each of the composite stratigraphic units listed in table 1. More than 95 percent of all wells that encounter at least 25 ft of sediment in each composite unit lie within or on the margins of the Big Lost Trough and are generally within 10 to 15 mi of the present-day location of the Big Lost River.
In statistically characterizing a spatial data set, the effect of spatial clustering must be considered. Spatially clustered data provide information redundancy in densely sampled areas and can bias calculated statistics toward values that reflect the most densely sampled areas, particularly if redundant sampling occurs in areas where the measured attribute is valued unusually high or low. In such cases, the spatial data must be appropriately weighted (declustered) to minimize this bias. In this study, the impact of such information redundancy is negligible. As shown in figure 5, the sample statistics changed only slightly when appropriate declustering weights were assigned, because boreholes are not preferentially clustered in areas of unusually high or low sediment content. Similar results were obtained when examining subsets of the data, broken down either by composite unit or by model layer.
The analysis of clustered data can be critical, however, in evaluating such characteristics as spatial autocorrelation and statistical stationarity of a population. Comparisons between areas of clustered and nonclustered data can provide evidence of spatial stationarity and help identify small-scale autocorrelation behavior. Furthermore, kriging automatically accounts for information redundancy, thus eliminating the need for declustering corrections.
All subsequent analyses in this study were performed without declustering. In all cases in which raw and declustered data were compared, the effect of declustering was to diminish the apparent differences between the sample populations being compared; in this sense, the raw data provided a worst-case test for comparing statistical similarity among composite units.
Of far greater concern than clustered data was the effect of partial penetration. In a borehole which only partially penetrates a composite unit, the reported thickness of that unit or its sediment content is censored (quantified by an inequality, such as “greater than” an apparent thickness; Helsel and Hirsch, 2002). Bias arises when the censored information is compared with data from other boreholes that fully penetrate the same unit. Unlike other types of environmental information (where censoring occurs consistently in only one sense, such as when imposed by a detection limit, for example), borehole partial penetration bias is bi-directional and random. That is, if a composite unit happens to be sediment rich, then the proportion of sediment sampled by a partially penetrating borehole would represent an upper limit; alternatively, if the borehole penetrates a sediment-poor zone, the measurement would be biased low. In the case of the INL data, partial penetration should not introduce bias because such censoring is expected to be bi-directional and randomly expressed.
Wherever possible, only those sediment thicknesses defined by fully penetrating boreholes were considered. Fortunately, sediment typically represents less than 15 percent of a borehole’s length, so that partial penetration bias is infrequent and has negligible effect. Except in the most infrequently sampled composite units, the attendant reduction in sample size is about 5 percent or less. It was necessary to use all available sedimentary thickness data because of limited sample size only in the aggregated composite unit 8 to 14, where just five deep boreholes fully penetrate.
Only the ground-water model layers (A, B, and C [fig. 3]), corresponding to the uppermost 300 ft of the saturated zone, were considered in this study. An insufficient number of fully penetrating boreholes were available in layers D and deeper to justify a statistical analysis. For comparative purposes, the unsaturated zone (layer U) was also statistically analyzed but not kriged.
The statistical variability of sediment thickness in boreholes that fully penetrate each composite unit is shown in figure 6. The exception is the aggregate package, composite units 8 to 14, whose oldest units are fully penetrated by only five boreholes and which was analyzed without an adjustment for partial penetration.
The box plots in figure 6 clearly show that sediment comprises a significantly larger proportion of the thickness of composite unit 1 than of any of the older units. A similar pattern holds when considering sediment as a percentage of composite unit thickness (fig. 7). The distribution of sediment in composite unit 1 is strikingly different from that in all other composite units.6 Of the older composite units, some of whose sediment thickness distributions are statistically similar, the distribution in composite unit 5 is the most skewed and has the lowest median sediment content. However, it also represents the shortest accumulation period of any of the seven youngest composite units (65 Ka, table 1), raising the possibility that its anomalous character reflects an under-sampled stratigraphic interval.
6 Tests of similarity with other composite unit medians returned a Mann-Whitney p-value of 0.00 in all instances.
To test this hypothesis and to further evaluate the case for statistical similarity over time, different groupings of composite units were combined into longer time spans of similar duration. Selected results for grouped units are shown in figure 8, for sediment percentages and figure 9, for sediment thicknesses. Nonparametric tests of similarity were conducted to identify whether median sediment abundances are more similar than not; statistical comparisons for various groupings that have similar distributions of sediment percentage and of sediment thickness, respectively, are summarized in tables 2 and 3. Overall, the results indicate that the relative sediment contents of composite units older than unit 1 are statistically more similar than they are different. An important caveat is that some of the sediment data representing composite units 8 through 14 originates from partially penetrating boreholes and therefore may be biased. The inference of statistical similarity between these and younger units, therefore, remains tentative.
These results underscore the difficulty in assessing statistical similarity in geologic units over time and space. Substantial variation in sediment thickness or in percentage of sediment in total thickness among units arises because of limited geographic representation or because the geologic time intervals being considered are not representative of long-term average sediment accumulation rates. The results of tables 2 and 3 indicate that time intervals of about 100–200 Ka or larger are required if meaningful comparisons are to be made in this sense.
Other than the relatively high sediment content noted in composite unit 1, sediment distributions appear similar when examined over sufficiently long time spans to average out short-term differences in sediment accumulation rates and sampling-induced noise. The data reveal no apparent trend over time in either accumulated sediment thickness or in its proportion relative to basalt. Composite unit 1’s high sediment content may reflect the period of relative volcanic quiescence over the past 200 Ka (Anderson and others, 1997; Champion and others, 2002), the interval during which most of composite unit 1 accumulated. If so, then sedimentation rates during this period may not have necessarily increased relative to earlier periods during which basalt accumulation rates were higher.
From the preceding analysis, it appears that except for the time span represented by composite unit 1, sediment abundance relative to basalt has not changed substantially over time. For the purposes of spatial modeling, such a conclusion is vital to any approach that aggregates borehole data into still larger groupings (such as ground-water model layers) that crosscut aquifer stratigraphy (fig. 3).
Locations of fully penetrating boreholes in each of the uppermost three ground-water model layers that represent the saturated zone are shown in figure 10. All subsequent statistical analyses were carried out on the basis of sediment percentages within these three, 100-ft thick layers (where percentages are numerically equivalent to thickness in feet). Similarities and differences among the sediment contents of these layers, as well as in the layer that represents the unsaturated zone, are summarized in figure 11. At the 95‑percent confidence level, nonparametric tests of similarity among pairs of model layers show that sediment distributions in layers A, B, and C range from marginally distinguishable to statistically indistinguishable.7 Considering this together with the similarity of median abundances of sediment among composite units, the results indicate that the amount of sediment in the uppermost 300 ft of the aquifer is statistically stationary in the vertical sense.
7 A Kruskal-Wallis nonparametric test of similarity among layers A, B, and C returned a p-value of 0.032 indicating that their sediment distributions are marginally different; layers A and C are the most different (Mann-Whitney p-value = 0.01), whereas layers A and B are statistically similar, as are B and C (Mann-Whitney p-values = 0.28 and 0.11, respectively).
To evaluate statistical stationarity in the geographic sense, nonparametric tests of similarity were applied to sediment data from three different geographic groupings of boreholes in each of the model layers A, B, and C (fig. 10): (1) sediment data from nonclustered boreholes south of the stratigraphic discontinuity, (2) data from clustered boreholes south of the discontinuity, and (3) data from clustered boreholes north of the discontinuity. Histograms of these data groupings, together with the associated p-values for tests of similarity among the geographic groupings within each layer are presented in figure 12. Note that insufficient data precluded a comparison for the southern clustered area in layer C. Within each layer, the three data groupings are statistically indistinguishable at the 95-percent confidence level, indicating that median sediment abundance does not vary in a geographic sense.
Sediment abundances within the composite units appear to be statistically similar over time when sufficiently long time spans (groupings of composite units) are compared. When the data are grouped by model layer, or by geographic area within model layers, median sediment abundances in the uppermost 300 ft of the saturated zone appear to be statistically similar, further corroborating the inferences made on the basis of the composite unit analysis. Dissimilarity is apparent only in the very youngest composite unit (representing most of the shallowest part of the unsaturated zone) and makes the unsaturated zone, as a whole, statistically unique in terms of its sediment content. On this basis, it is concluded that the median sediment contents of rocks below the water table are statistically similar when grouped by either composite unit, by model layer, or by geographic area within a model layer, indicating that the sediment content of the aquifer may be statistically stationary in a three-dimensional spatial sense.
The analysis of autocorrelation behavior involves the computation of sample statistics that characterize the degree of similarity of data values in space (information redundancy), followed by the inference of a model of autocorrelation structure from those sample statistics. In a nonrandom process such as sediment accumulation, the degree of similarity is greatest when pairs of closely spaced measurements are compared; measurements taken farther apart tend to be less similar. Autocorrelation statistics such as the autocovariance or the semivariogram provide a quantitative measure of the rate at which this similarity decreases with separation distance. These statistical measures, collectively referred to as variogram statistics, are the foundation on which all two-point geostatistical inference and modeling is based. They are represented by plots of a variance-like statistical quantity (for example, the semivariogram) as a function of separation distance. The geometric structure of such plots is described in terms of a model of spatial autocorrelation, a curvilinear function characterized by a nugget (uncorrelated variance at zero separation), a range (separation distance over which autocorrelation disappears), and a sill (uncorrelated sample variance among pairs of data values separated by distances greater than the range). The presence of a horizontal sill that corresponds to the sample variance is the strongest indication that the data are second-order stationary, as required for most kriging applications.
The results of autocorrelation analysis of sediment thickness within individual composite units are summarized in figure 13, for composite units 1 through 6 (the spatial density of boreholes within composite units 7 and 8-14 was too low to warrant a similar analysis). A relative variogram estimator (the inverted correlogram) was used to simplify intercomparison of autocorrelation structure among the composite units. Variogram statistics were calculated at two different spatial lags to better assess the structure of the variogram’s transition zone and the magnitude of its nugget. Because of its small sample size (n = 68), the variogram for composite unit 6 is poorly defined, but it appears to fit the same autocorrelation model as other composite units. Similar autocorrelation structure is evident for all composite units shown, with a well-defined sill and clear transition zone, a relative nugget of about 40 percent, and similar autocorrelation ranges.
It is noteworthy that all composite units share a similar autocorrelation structural model. Their well-defined sills, together with the uniformity of their autocorrelation ranges (from 20,000 to 25,000 ft) corroborates the inference of statistical stationarity across time-stratigraphic boundaries. In particular, the similarity of composite unit 1’s autocorrelation structure to other composite units substantiates the conjecture that its relative sediment content differs not because of fundamental changes in sedimentation rate or spatial patterns, but because of a decrease in lava accumulation rate.
The statistical similarity of medians among different geographic groupings of boreholes within each model layer (fig. 12) and the similarity in autocorrelation structure among composite units (fig. 13) indicate that sediment abundance is a geographically stationary population. Although this conclusion must remain tentative in view of the paucity of spatial information at greater borehole depths and in certain parts of the study area, a preponderance of evidence supports the hypothesis of statistical stationarity within at least the uppermost 300 ft of the ground-water flow model domain.
For the purposes of mIK, the cumulative frequency distribution (CFD) of sediment percentage is discretized and approximated by a series of indicator transformations of the original data. By way of example, the borehole data for layer A (expressed as sediment percentages) was transformed around twelve indicator thresholds at 1 percent increments from 1 to 12 percent sediment (few boreholes in this layer have higher sediment contents, making transformation at higher thresholds impractical). At each threshold, the following indicator transformation was applied: for example, at a 7 percent sediment threshold, each data value was transformed using the following relationship:
Ii = 1 if Zi≤ 7 percent
Ii = 0 if Zi > 7 percent,
where
Zi | are the original data values (sediment percentages), and the |
Ii | are their indicator-transformed values. |
The average of the transformed values provides an estimate of the global CFD’s p-quantile (probability value) at that threshold:
p = (1/n)Σ Ii . (1)
The thresholds at which the indicator transformations were carried out for each layer and the resulting indicator-derived probabilities are shown in figure 14. The dashed lines portray the global CFDs that would be plotted using the untransformed data. In effect, multiple indicator transformation allows a distribution’s shape to be estimated from a series of nonlinear transformations of the original data. A different set of thresholds was applied to each model layer because the shapes of their global CFDs and the amount of data in the upper tails of their distributions differed greatly.
Multiple indicator kriging uses the autocorrelation structure of the indicator variograms at each threshold to calculate optimal kriging weights, from which the local CFD’s conditional probabilities are estimated at each of several thresholds. The principal strength of mIK is that different spatial autocorrelation structure can be accommodated across the range of the data being kriged. Figure 15 shows examples of layer A’s indicator variograms and how the autocorrelation structure changes from one threshold to the next, reflecting the degree to which high and low values are spatially autocorrelated. All indicator variograms display clearly defined sills, providing further support for the hypothesis that the sediment abundance data reflect a globally stationary variable (Isaaks and Srivastava, 1989).
Indicator variogram model parameters for the 12 thresholds subsequently used to make mIK estimates for layer A are summarized in figure 16. Because the choice of CFD thresholds is arbitrary, care must be taken to ensure that the variogram model parameters inferred at one threshold are compatible with parameter values inferred at neighboring thresholds (Goovaerts, 1997). A relatively smooth variation of nugget, sill, and range parameters, such as shown in figure 16, ensures that order-relation corrections in the resulting kriging estimates are minimized and that the mIK-estimated local CFDs will accurately reflect local data dispersion (Deutsch and Journel, 1998). Note that the indicator variogram ranges inferred for layer A (5,000 to 50,000 ft) bracket the 20,000-25,000 value deduced for composite units 4 and 5 which comprise the bulk of layer A, as expected for a second-order stationary variable (Goovaerts, 1997). The autocorrelation ranges tend to be higher at higher thresholds, that is, where thicker accumulations of sediment extend over larger geographic areas. These nuggets, sills and ranges were subsequently used in multiple indicator kriging to define local CFDs at each kriging location.
Kriging relies on data within a nearest-neighbor search area to make estimates at unsampled locations on the basis of the autocorrelation structure of the data. Ideally, where sampling density is high, locally-defined variograms may be calculated solely on the basis of the neighboring data (Whelan and others, 2001), but in most environmental and subsurface situations, data density is almost never sufficient to permit such an approach and a global variogram must be inferred from the entire data set. Evidence of statistical stationarity is then used to justify the relevance of the global variogram to all local neighborhoods where kriging estimates are to be made. In this study, evidence for stationarity was based on the similarity of medians within different geographic portions of each model layer as well as on the autocorrelation structure of their indicator variograms and of the composite units’ variograms.
Multiple indicator kriging was performed with the GSLIB™ software package (Deutsch and Journel, 1998). Indicator variogram parameters were defined at sediment thresholds of 1, 3, 5, 7, 9, and 11 percent for all layers, as well as at thresholds of 18 and 25 percent for layer B and 20, 40, and 60 percent for layer C. Linear tail approximations were used to approximate the CFD above the highest indicator threshold considered (fig. 14). Even though the maximum range of the indicator variograms was about 60,000 ft (fig. 16), a kriging search radius of 150,000 ft was used to return estimates in even the most data-poor regions of each model layer. The disadvantage of this approach is that unsubstantiated estimates can arise in data-poor areas (reflecting simple averages of distant data values), but it is the only way to generate estimates over the entire domain. If preferred, such artifacts could be filtered and removed using a relative variance criterion (such as that generated by ordinary kriging; Isaaks and Srivastava, 1989) or assigned a reduced confidence level.
The median sediment contents of each layer (as read from the local CFDs estimated by mIK) are shown in figure 17, in relation to the surface extent of the Big Lost Trough depositional basin (dashed polygon; after Geslin and others, 1999). Estimation variances also can be determined from the local CFDs, but they do not directly reflect the relative uncertainty that arises from the proximity of neighboring data values. Because the latter is a far more important criterion when kriged sediment content is used to constrain hydraulic conductivity estimates, kriging uncertainty was quantified using the ordinary kriging variance rather than the local CFD variance returned by mIK. A global variogram of the untransformed data in each layer (with a nugget of 0 and normalized to a sill of 1.0) was used to generate maps of ordinary kriging variance, (for example, figure 18). The ordinary kriging variance provides a quantitative basis for classifying the modeled sediment abundance according to the uncertainty in data-poor areas relative to data-rich areas.
Figure 17 shows the highest sediment contents in layers A and B generally occur within the boundary of the Big Lost Trough as delineated by Geslin and others (1999). For layer C, substantial thicknesses of sediment are detected outside the boundary, notably to the east of the depositional basin, indicating that the Big Lost Trough’s spatial extent may have changed over time, possibly in response to volcanic construction.
The predicted distribution of sediment within individual layers bears little resemblance to the spatial extents of sediment-rich and sediment-poor areas currently implemented in the USGS subregional flow model. A map (fig. 19) of the subregional flow model’s sediment-rich and sediment-poor areas, (Ackerman and others, 2006), shows the boundary between sediment-rich and sediment-poor areas inferred from borehole data collected from the uppermost 300 ft of the aquifer, regardless of depth. Their classification is based on an 11-percent sediment classification threshold (fig. 5A). Figure 19B shows the combined mIK results for the model layers recalculated as the median sediment content of the uppermost 300 ft of the aquifer. The mIK model provides a refined picture of the spatial distribution of sediment in the upper part of the aquifer, by representing it on a layer-by-layer basis and reflecting greater confidence in areas with the most borehole control.
The USGS modeling team has shown that optimized hydraulic conductivity values assigned during calibration of the subregional ground-water flow model tend to scale proportionately with zones of high- and low-sediment abundance shown in figure 19A (D.J. Ackerman, U.S. Geological Survey, oral commun., 2005). That is, the calibration process tends to assign lower values of hydraulic conductivity where sediment comprises more than 11 percent of the aquifer’s uppermost 300 ft. Similar relations between aquifer hydraulic properties and sediment abundance have been noted for calibrated vertical:horizontal anisotropy ratios and specific yields (D.J. Ackerman, U.S. Geological Survey, oral commun., 2005). However, the coarse sediment zonation currently implemented in the subregional flow model introduces undesirable modeling artifacts, such as streamline refraction across a zonal boundary where hydraulic properties change abruptly (G.W. Rattray, U.S. Geological Survey, written commun., 2005).
The enhanced spatial resolution of the mIK models allows us to consider more sophisticated approaches to the scaling of hydraulic properties that depend on sediment content. Two types of methodologies are proposed: (1) those based solely on a threshold sediment content, such as the median, and (2) those based on a combination of a threshold and the uncertainty associated with the threshold estimate. In what follows, we consider hydraulic conductivity as an example of how scaling can be implemented, although similar lines of reasoning could be developed for porosity, storativity or any property whose values differ sufficiently between pure sediment and pure basalt.
Scaling involves expressing hydraulic conductivity as a function of the aquifer’s sediment content, such as its median value (or some other, operationally defined level like the 11‑percent threshold currently implemented in the USGS model). In a horizontally layered aquifer composed of fractured, high-permeability basalt and fine-grained, low-permeability sediment, the vertical and horizontal components of hydraulic conductivity (or, more precisely, the components of hydraulic conductivity parallel and normal to layering) can be expressed as a function of the relative thicknesses of the two lithologies (Freeze and Cherry, 1979):
KH = [Z50 • KS + (100–Z50) • KB]/100 (2)
and
KV = 100/[Z50/KS + (100–Z50) • KB]/100 , (3)
where
KS | is the hydraulic conductivity of sediment; |
KB | is the hydraulic conductivity of basalt; |
Z50 | is the median (50th percentile) sediment content; and |
KH and KV | are the resulting components of hydraulic conductivity. |
A threshold other than the median could also serve as the scaling variable. For example, at the 11-percent threshold currently implemented in the USGS subregional flow model, a relative bulk hydraulic conductivity can be estimated from the probability, P11, of not exceeding the 11-percent sediment threshold (the threshold currently implemented in the subregional model):
Kbulk = [(1–P11) • KS + P11 • KB] . (4)
Examples of two mIK-derived local CFDs in a sediment-poor and a sediment-rich location of layer A are shown in figure 20, having median sediment contents of Z50-1 = 4 percent and Z50-2 = 36 percent, respectively. For values of KS = 0.2 ft/d and KS = 11,000 ft/d (Garabedian, 1989), equation 2 results in KH-1 = 10,500 ft/d and KH-2 = 7,000 ft/d; corresponding KV values derived from equation 3 are 5.0 and 0.6 ft/d. The resulting anisotropy ratios (KH/KY) range from 2,100:1 to 13,000:1, similar to those obtained in the flow model calibration process (4,000:1 to 15,000:1; D.J. Ackerman, oral commun., 2005).
Using equation 4, with values of P11-1 = 0.92 and P11-2 = 0.32, and the same KS and KB values, the calculated values of Kbulk at sites 1 and 2 are 10,100 and 3,500 ft/d, respectively.
Figure 21 illustrates how such a sediment-based scaling approach could be used to delineate hydraulic conductivity zones for zone-based parameter estimation. Although only three zones are considered in this illustration, any number could be defined. The mIK median sediment contents were classified into three categories (0–5-, 5–20-, 20–50-percent sediment) and hydraulic conductivity values representing the midpoints of those ranges were calculated from equations 2 and 3. The resulting values are displayed as color-classified maps of KH and KH/KV in figure 21A and 21B, respectively. Similarly, local CFD p-quantiles representing the mid-points of the P11 ranges (0.2-0.4), (0.4-0.8), and (0.8-1.0) were used in equation 4 to calculate values of Kbulk (fig. 21C). The resulting zone maps are applied in the same way that figure 19A is used in the parameter estimation process.
During the parameter optimization process, it may be advantageous to allow K values to vary more widely in areas where the zone boundaries are least certain. To achieve this, the zones defined in figure 21 can be further subdivided according to the kriging variance (fig. 18) into zones of low and high uncertainty. An example is shown in figure 22, where layer A’s domain was classified around an ordinary kriging variance threshold of 0.6 on a relative scale of 0 to 1. The hydraulic conductivity zones shown in figure 21C were subsequently reclassified into five new categories representing areas of greater and lesser confidence in the scaled K values. Such an approach would allow K assignments during the calibration process to range more widely in zones of lower confidence.