USGS OFR 02-370: Techniques for Improved Geologic Modeling

Digital Mapping Techniques '02 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 02-370

Techniques for Improved Geologic Modeling

By Donald A. Keefer

Illinois State Geological Survey
615 E. Peabody Drive
Champaign, IL 61820
Telephone: (217) 244-2786
Fax: (217) 244-2785
e-mail: keefer@isgs.uiuc.edu

INTRODUCTION

It can be a difficult process to create computer models of two-dimensional (2-D) surfaces and three-dimensional (3-D) geologic sequences. If mapping projects are not conducted with enough attention to the demands and constraints of the modeling process, excessive expenditures of time and money can result, and the model may not properly address the intended applications.

In this paper I present techniques for addressing some common 2-D and 3-D geologic modeling difficulties that can have a significant effect on the success of the project. I provide some perspectives on the selection of modeling objectives, the use of declustering as a tool for data management, and ways to control surface models so they agree with the interpretations of the mappers. Finally, I touch briefly on some technical issues on the horizon that will further affect how we approach geologic mapping on the computer.

CHOOSING MODELING OBJECTIVES AND DEFINING MODEL PARAMETERS

The first step to avoiding or minimizing difficulties is to define the modeling objectives properly. Following well-defined modeling objectives can help ensure that the final model output is compatible with its intended applications. At a minimum, modeling objectives need to address the following:

stratigraphy and complexity of the geologic system,
vailability of relevant base map information,
selected methods for addressing uncertainty within the data and the model,
intended uses of modeling results,
spatial distribution of the data,
data quality and quantity,
area or volume to be modeled,
sizes and shapes of anticipated surface features,
desired minimum size of feature that should be identified,
grid spacing in each of the three axes (x, y, z),
available computer hardware resources, and
capabilities of available modeling and visualization software.

The first four considerations are beyond the scope of this paper. I will briefly address the remaining considerations.

The spatial distribution of data can dramatically affect the ability of an interpolation algorithm to define a realistic geologic surface, particularly a complex one. Typically, sources of geologic data (such as water-well logs) are not uniformly distributed throughout a map area, but instead are clustered. The location of the clusters may be strongly dependent on the reasons the data were collected, and these reasons may or may not be related to the occurrence of specific subsurface geologic deposits. For example, water wells in some locations may be clustered above channelized sand and gravel deposits. In other areas where ground-water resources are not spatially limited, the clustering of water wells may be more a reflection of urban-suburban development. The presence, density, and location of data clusters and their potential impact on surface models should be characterized in the early planning stages of the project.

Variations in data quality will affect the uncertainty of any resulting model. The sources of uncertainty, the mapping of changes in uncertainty, and the potential impacts of this uncertainty on the model need to be evaluated and documented. Following this evaluation, it is important to determine whether the intended applications can be sufficiently addressed by model results.

The area or volume to be modeled usually is well defined, but the boundaries of the formal mapping area may be irregular. Gridding software packages, however, typically use square or rectangular grids, which may require that a much larger area be included in the overall model. To define the distinction between the grid area and the mapping area, most modeling software provides some method for blanking out grid cells that fall outside the boundary of the desired mapping area. The grid coordinates and spacing and any blanking or map-area boundary need to be firmly established early in the project if more than one surface model is going to be developed. Changing grid coordinates or study-area boundaries after a project has started can create time delays and data losses.

Determining the minimum feature size that will be included in the model is important because it affects decisions about grid size and optimum data spacing. Geologic surface features are typically asymmetric, and data points cannot be expected to fall directly on the highest and lowest values of each feature. We can consistently describe the general shape of a feature if the data are spaced at approximately one-third of the shortest axis of that feature. If the first point lies on the edge of the feature, a data spacing that is one-third of the shortest axis length will result in four points along the direction of the shortest axis (e.g., point 1 = 0, point 2 = 1/3, point 3 = 2/3, point 4 = 3/3). For example, if the average separation distance between data points is 0.25 mile, then, on average, we will be able to identify features that are 0.75 mile wide in their shortest direction. Although more closely spaced data points will obviously identify smaller surface features, we still need at least four points along each of their principal axes to identify these smaller features. Ideally, the determination of the minimum size of a surface feature should be based on more than just the average data density. This determination should be based on the complexity of the geologic deposits (i.e., the sizes and shapes of anticipated subsurface features), the extent and location of data clusters, the variation in data spacing within these clusters, and the intended applications of the model.

Determining the minimum feature size that will be recognized by the model also is relevant to the selection of the grid spacing. Just as a minimum of four data points is needed in each of the two principal directions to characterize a surface feature, a minimum of four grid cells is needed in each of these same principal directions to adequately model or express each surface feature. It can be helpful to have a grid spacing that is several times smaller than the minimum data spacing. This will create surface models with smoother, more realistic surface morphologies. The interpolation algorithm chosen, however, may place a practical lower limit to the grid spacing. Some algorithms (i.e., splines, minimum curvature) will produce oscillations, or interpolation artifacts, in the surface model if the grid spacing is significantly smaller (i.e., about 10 times) than the data spacing. In this situation, the selection of the optimum grid size requires trial and error.

The total number of grid cells can be very large, especially if 3-D models are being developed, and the grid resolution may have to be coarsened for model computations to be completed within a practical time limit. Selection of grid spacing in the vertical direction should be based on the anticipated minimum thickness of the individual deposits and the total thickness of materials to be modeled. Thick sequences of deposits will probably require more generalization than is desired in the vertical direction because of the large number of grid cells that are generated by finer resolutions. For example, a 2-D grid that is 267 x 184 cells has a total of 49,128 cells. This is a manageable grid size for typical desktop computers. If we are modeling a 400-foot sequence of geologic materials and want to delineate units that are 5 feet thick and greater, this would require 80 cells in the vertical direction and produce a total model of 3,930,240 cells. Depending on how the specific software actually constructs the model, this may be too many grid cells for many desktop computers. A model of this size may even exceed the limitations of the modeling software. Large computer resources will definitely be required if significant visualization and slicing of very large models is desired.

A final consideration in defining the modeling objectives is identifying the software resources that will be used to address various data management, surface modeling, grid manipulations, 3-D model construction, model visualization, and final product development. Although high-end modeling and visualization software can address all these facets, the cost of these packages is too high for most modelers. Many moderately priced software packages available for desktop computers can accomplish one or more of these tasks. Careful evaluation of the available suite of software packages, their interoperability, and the specific roles that each will play may identify possible software incompatibilities before they cause project delays.

DECLUSTERING AS A DATA-MANAGEMENT TOOL

Understanding Declustering

One data-management practice that can save a lot of time and money in modeling projects is spatial declustering, a process of sorting through or reducing the number of data points to a level that is more efficient and effective for modeling. A data set can be considered clustered if either a number of grid cells have more than one data point or part of the map has data spaced more closely than necessary for identifying the minimum-size feature specified in the modeling objectives. The advantages of declustering and the methods available for it need to be evaluated early in the project to determine whether the process is worthwhile for the specific study area.

There are three major advantages to declustering a data set. The first is the savings in time and money. With any data set, time must be spent validating and correcting the locational information of each data point. This is a critical step in mapping or modeling because locational errors not only place the geologic data in the wrong horizontal location, but a significant vertical shift of the unit tops can occur if the elevation of the top of the borehole is significantly in error. In addition, lithologic descriptions, geophysical properties, or other properties encountered in the borehole must be reconciled within the local stratigraphic framework. Unless a data set is declustered prior to undertaking these steps, many more data points may be processed than can be effectively used in the modeling process, causing significant increases in time, effort, and expense. Establishing optimum data spacing prior to data evaluation can help reduce the number of data points that must be validated and correlated. In areas of complex sedimentary successions, some additional data points may be needed to clarify the stratigraphic framework and to increase the reliability of all stratigraphic correlations and the resulting model. However, these additional stratigraphic control points need not be included in the modeling data set unless they provide data quality that is superior to adjacent data points.

The second advantage to declustering relates to the way data are used in most interpolation algorithms. In any grid model, only one value can be assigned to represent each grid cell, even when two or more data points fall within a single grid cell. Unless some method is used to decluster this cell and define a single value, the value assigned to this cell will be some combination of all the clustered values. If the clustered data in the cell have a large range of values, the value assigned to the cell is likely to be skewed toward the most extreme value in the cluster. Thus, clustered data sets are inefficient because of the redundancy of information and inaccurate because of the inability to fit the surface to each observed value.

The third advantage to declustering is that many interpolation algorithms produce surface models that are severely biased by clustered data. This bias is likely to be more severe in areas where the clustered values vary widely and along the margins of data clusters where grid-cell calculations are unduly influenced by the large number of points in one area of the search neighborhood. Some interpolation algorithms handle clustered data better than others. Any algorithm that uses some form of curve fitting between calculated and observed values to determine the cell value should handle clustered data better than those that do not employ such curve fitting. Some algorithms that fall into this category include splines, minimum curvature, kriging, and other radial basis functions. Although these algorithms are not immune to complications from data clustering, they still can be strongly affected if the clustered data have high local variability. Algorithms that use a simple distance weighting function to assign a cell value generally do not work well with clustered data (e.g., algorithms that rely heavily on methods referred to as inverse-distance calculations).

In any mapping effort, there is a natural tendency to assume that more data are always better. In fact, there is always a limit at which additional data do not contribute significantly to the map or model. This limit can be determined by considering the minimum feature size desired for the model. Data that are spaced more closely than one-third of the short axis of the minimum-size feature will not contribute much useful information to a surface model. Additional data collected within this spacing limit can be considered to be too much data from a cost-effectiveness perspective. Additionally, if data can be grouped on the basis of probable reliability, the addition of clustered, poorer quality data may only reduce the influence of adjacent higher quality data, thereby reducing the reliability of the model. Where the natural variability of the surface at the local scale (i.e., small surface features) is similar to or smaller than the likely measurement error of the data, real features will be indistinguishable from data artifacts. These insights suggest that, for most purposes, no significant modeling advantage will be gained from the time and resources spent validating, interpreting, and modeling with too much data.

However, if the project is trying to address uncertainty of the modeling results, then it may be worthwhile to keep additional data, even highly clustered data, in the data set. Although clustered data may be helpful for some uncertainty evaluations, they are not effective or efficient for most surface-modeling efforts. The project objectives and uncertainty evaluations can be used to determine when sufficient data have been gathered. This determination can be made before the locational verification and correlation efforts are begun.

Choosing a Declustering Method

There are several good ways to implement declustering. If the data have inconsistent accuracy and information content, then it is likely that some data points provide more value to the mapping effort than others. In this situation, it makes most sense to integrate some type of data valuation into the declustering method. With the sophisticated capabilities of spreadsheet and database software, a customized data valuation can be conducted systematically on the entire data set.

Some data points may provide valuable information for only one or two surfaces; immediately adjacent points provide better information for other surfaces. For this reason, it may be advantageous to decluster the data set once for each stratigraphic surface being modeled. Logistically, this may be tricky for some model parameters (e.g., lithostratigraphic interpretation) because a data valuation might need to be conducted prior to the correlation of lithologic deposits within the stratigraphic framework. One viable solution is to initially conduct the data valuation for the entire borehole, without worrying about individual surfaces; this will allow the entire data set to be declustered. After declustering, the removed, or unused, data can be kept in a separate file. During the modeling process, if the data control for a specific surface is poor using the declustered set, the unused data set can be queried to identify possible alternative points.

One method of declustering a data set is to either pick one of the clustered data points within each grid cell or to calculate some statistic from the clustered values (e.g., mean, median, root mean square error) and use that single value as the representative datum for the grid cell. Some surface-modeling software programs provide options for this type of declustering and may refer to this type of clustered data as Aduplicate data.@ Many of these packages allow the user to define the distance within which data points will be considered duplicate, or clustered. They allow the user one or more of the above-mentioned methods for assigning a single value to represent these duplicates. Another declustering method is called cell-based declustering. In this method, a grid is overlaid on the data, and grid cells with multiple data values have these values transformed into a single value. This single value is typically the mean of the clustered values. Each software package is likely to differ slightly in the options it provides for treating clustered data. If the software package you are using does not provide any of these options, declustering can be done using either a geographic information system (GIS) or a spreadsheet.

Using a GIS, a grid of user-defined dimensions can be created. The data points then can be overlaid with the grid. Each data point is then assigned the identification number of the grid cell that bounds it. Determining the grid cells that have multiple points identifies data clusters. The user can then apply any criterion to decluster the data.

Using a spreadsheet, a grid can be defined in which each cell would correspond to a row in the spreadsheet. The spatial coordinates of each cell centroid would be defined by incremental changes in the x and y coordinate columns. If a data point is within half the distance to a cell centroid, the identification number of that cell can be assigned to the data point. As with the GIS example, clustered data will have the same grid cell-identification value, and one value per cell can be selected using any appropriate method.

INTRODUCING GEOLOGIC INSIGHT THROUGH USE OF SYNTHETIC DATA

Why Synthetic Data?

Typically, an initial surface model does not completely represent the geologist's interpretation of the surface. This is generally due to the shape constraints that the algorithm uses, to the presence of clustered or highly variable data, or to the lack of data in certain areas. All these problems have some remedies, including the following:

adding individual synthetic data points,
coarse to fine gridding,
grid editing, and
digitization of hand-drawn contour lines.

These methods basically involve the addition of synthetic, or hypothetical, data. Synthetic data are values not obtained from any type of observation or measurement; they are added to a data set by the modeler to help control the shape of a surface model. The values assigned to synthetic points are based on neighboring values and the project team's geologic knowledge and conceptual model of the surface. The synthetic values may be needed because the available data and interpolation algorithms cannot otherwise be used to acceptably express this conceptual model. It is important to identify and document these data to prevent them from being confused with real observations. This will help ensure that the procedure used to create the surface is easy to repeat and able to more readily accommodate any new and significant data that may be collected in the future.

Geologists who are unfamiliar with computer-based surface modeling sometimes argue about the validity of using synthetic data. In my experience, their concerns arise from either a poor understanding of how computer algorithms create surfaces, a use of concepts and terminology that is different from the modeler's usage, or from misunderstandings about the practice of assigning values to every point in the modeled space instead of only along contour lines. When creating a contour map by hand, the mapper makes assumptions about the continuity and value of the surface at points that have not been sampled. Although a specific value is not assigned to points between contour lines, the limited number of possible surface shapes that can occur between observed data points puts specific bounds on the otherwise unspecified values. This is no different from the use of synthetic points to constrain a surface -- even if the synthetic points are not limited to contour lines.

Methods for Adding Synthetic Values

The addition of individual synthetic points can be a fairly simple procedure. Many software packages will allow you to either add points on-screen or to assign the coordinates for a new data point by using the mouse to point to a location on the screen. The geographic coordinates and a desired value can be recorded for the information inserted in the data file for re-interpolation, but synthetic values must be clearly distinguished from observed values.

This method of surface control can be effective for controlling the expression of slopes or bluffs along rivers or gullies. It also can be effective for suggesting the presence of a valuable deposit (such as an aquifer) in locations that are unsampled, but where the occurrence of such a deposit seems likely based on other geologic evidence.

The use of individual synthetic data points allows the modeler to add to a surface model detailed features that reflect significant geologic interpretations that are not evident from the real data alone. Synthetic values are relatively easy to update if new observations or interpretations become available. Using synthetic data points also offers the advantage of allowing you to add a range of values, even around intended break points, thereby preventing interpolation artifacts, such as flattening of the surface, that can occur when contour lines are digitized and used.

Coarse to fine gridding is another technique that can be used to constrain surface models in areas of low data density. This technique can be particularly helpful along model boundaries, where the data density is typically low and surface models may be unduly influenced by a few local points with extreme values (Jones and others, 1986). When used to stabilize a surface in more central parts of a map, coarse to fine gridding is most effective for algorithms that rely heavily on simple inverse-distance calculations and appears to have less benefit with curve-fitting algorithms (e.g., minimum curvature). Some algorithms implement an inverse-distance calculation and a curve-fitting calculation. The effect of coarse to fine gridding with these kinds of algorithms varies and should be tested on a few data sets. To implement coarse to fine gridding, the data are initially gridded using a spacing that is at least five times larger than the desired final grid spacing. The resulting grid should show only the largest features in the modeled surface. The grid is then converted to an ASCII data file and integrated with the data set of observed values. To avoid having these new synthetic grid-based points overprinting a strong regional trend (i.e., generalizing) in areas with a good data density, the synthetic points must be deleted if they are within approximately three grid spacings of an observed data point. Once the unneeded synthetic points are deleted, the modified data set is then interpolated using the final grid spacing. This threshold-separation distance between the synthetic points and the original data points is somewhat arbitrary and can be adjusted on the basis of trial and error.

The coarse-to-fine-gridding method is reproducible if new data become available, thereby allowing the modeler to easily maintain the priority of observed data values and to create synthetic values that are easy to distinguish from the observed values. The disadvantage of this approach is that it only helps to maintain the regional character of a surface and cannot be used to add smaller surface details.

Grid editing can be a fast approach to constraining a model in situations that either require simple changes (e.g., changes to a small number of cells) or involve simple grids. For example, if a 3-D model shows one unit occurring over a larger extent than is desirable, the isopach grid of that unit can be edited so that areas where the deposit should be absent have a zero value. This situation would need additional grid editing to address the change in sediment occurrence, but it illustrates when grid editing might be feasible. This approach can be problematic for typical surface models because it is difficult to create smooth surfaces through editing of individual grid nodes. Also, grid editing can be a slow procedure, although tools may be available within individual software packages to simplify the editing process. The ability to create an acceptable surface through grid editing will depend on the grid resolution, grid complexity, and overall modeling objectives. Although this technique typically does not automatically record the grid changes, it is possible to save the values of only the changed grid nodes in a separate data file, with documentation to describe how and why they were used. The changed grid nodes can be identified by subtracting the original grid from the modified grid; the original values are then added back to all non-zero grid values. The subtraction of the grids will result in non-zero values only where the grid nodes were changed through the manual grid-editing process. These values will be the difference between the original and changed node. A record of the actual values from the final edited grid can be created by adding this difference to the original grid value for only the nonzero nodes.

Digitization of hand-drawn contour lines is another process for controlling surface models. In some desktop software, contour lines can be digitized on the screen using a mouse rather than with a digitizing board or tablet. Typically, these lines must be converted to points, the points must be added to the observed data set, and a new surface model then must be re-interpolated. Some software can use the contour lines directly without converting them to points. Other software packages also allow modelers to drag existing contour lines on-screen; the software automatically modifies the underlying grid model. If the on-screen, line-dragging approach is used and a grid is automatically created, it is important to record the changes to the initially interpolated grid. The procedure outlined in the grid-editing discussion also would work well in this situation. Digitization of contour lines allows the modeler to create any shape in the surface model that seems appropriate. As with all synthetic data techniques, the revised data should be preserved and identifiable from observed values. The time it takes to implement this technique will depend on the software options that are available. Editing a surface model can be accomplished very quickly if you can drag contour lines on-screen and automatically generate a new grid.

With many interpolation algorithms, the use of digitized contour lines or points from these lines creates surface models that have small flat spots, or benches, where the new contour data occur. This is because the use of hand-drawn contour lines typically results in hundreds of new synthetic data points, all with the same value; several of these new points typically fall within the local search neighborhood used to calculate nearby grid cells. This clustering of values with a single elevation has the effect of biasing the grid calculations for all cells adjacent to these clustered points. One way to reduce the expression of this benching artifact is to decluster the number of points used from the digitized contour lines. Ideally, the separation distance used for declustering the original data set would be the most appropriate distance for declustering these values. The time requirements of this technique should be considered for each project. For many projects it may be more efficient to manually add individual synthetic points.

GEOLOGIC MAPPING TECHNOLOGY ON THE HORIZON

The growth in computer modeling of geologic systems is providing geologists with an opportunity to better evaluate and characterize the quality of their data and to present information in ways that are otherwise impossible. This also has the effect of making geologic information much more accessible. Many technical issues will need to be addressed as this technology is embraced more completely. There are two particular technical issues on the horizon of computerized geologic mapping that I think are imminent and worth discussing briefly.

First, advances in surface and volume modeling software are coming at an amazingly fast pace. Desktop software priced less than $1,000 (e.g., Rockworks) provides a fairly robust 3-D geologic mapping and modeling environment; more sophisticated options are available in higher priced software for the Windows and UNIX environments. The availability of many packages makes it increasingly practical to incorporate computer modeling into any mapping effort.

With this growth of modeling, it is important to re-evaluate the focus of mapping projects. Traditional mapping projects focus on a set of map products or perhaps a set of computerized visualizations as the final goal of a project. With computerized modeling supporting the mapping, we can re-frame the goal of these projects to be the development of a set of models that provides a consistent interpretation. A suite of surface and volume models and associated documentation can be created; this suite would include the data and all resulting interpretations for the modeled geologic system. This goal of developing a consistent suite of surface and volume models will have the added benefit of allowing all the graphical products and visualizations to also be consistent. Currently, differences in the location of specific contacts and the geometry of surfaces can readily occur when each surface is created as a separate product. The consistency gained from developing all products from a single model will reduce the total uncertainty of the project's results by reducing the potential for inconsistent display of stratigraphic contacts. This reduction in uncertainty may be a significant benefit to end users who make a range of decisions from a suite of maps.

Second, parallel advancements in development of more sophisticated methods for producing computer-generated maps are leading to the creation of data models and object-based map construction (Hastings and Brodaric, 2001). We need to begin merging the data-model concept with surface and volume models to develop a more complete geologic data model.

REFERENCES

Hastings, J.T., and Brodaric, Boyan, 2001, Evolution of an object-oriented, NADM-based data model prototype for the USGS National Geologic Map Database Project: International Association of Mathematical Geology, Proceedings, Annual Meeting, Cancun, Mexico, September 9-12, 2001.

Jones, T.A., Hamilton, D.E., and Johnson, C.R., 1986, Contouring geologic surfaces with the computer: Van Nostrand Reinhold, Computer Methods in the Geosciences, 315 p.

RETURN TO Contents

National Cooperative Geologic Mapping Program | Geologic Division | Open-File Reports

U.S. Department of the Interior, U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/2002/of02-370/keefer.html
Maintained by David R. Soller
Last modified: 19:15:41 Wed 07 Dec 2016
Privacy statement | General disclaimer | Accessibility

Digital Mapping Techniques '02 -- Workshop Proceedings U.S. Geological Survey Open-File Report 02-370