USGS OFR 01-223: Map Unit Descriptions and the North American Data Model

Digital Mapping Techniques '01 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 01-223

Map Unit Descriptions and the North American Data Model

By Gerald A. Weisenfluh

Kentucky Geological Survey
228 Mining and Mineral Resources Bldg.
University of Kentucky
Lexington, KY 40506-0107
Telephone: (859) 257-5500
Fax:
e-mail: (859) 257-1147jerryw@kgs.mm.uky.edu

INTRODUCTION

Geologic maps contain a significant amount of descriptive information. Graphics explaining aspects of the map area and its geologic features are compiled in map collars. The North American Data Model (NADM) and its variants store information of this kind and relate it to map objects. The NADM has advantages over simple GIS files that contain spatial data and associated attributes because of its capabilities of storing relationships, hierarchies, and metadata. Whatever the data model, however, map descriptions are complex elements that pose a number of challenges for implementation.

The format of map descriptions as text and graphics is not conducive to easy compartmentalization in data structures. A central problem of converting descriptive prose to database formats is determining precisely what each descriptive element refers to. This requires an analysis of the map's components and how they can be represented in a database. Some descriptions refer to specific spatial objects or groups of objects in a map database (i.e., digitized features), while others apply only to abstract concepts (e.g., unmapped subdivisions of a formation). Characteristics of geologic map units found in other reports or databases also give rise to the possibility of multiple descriptions for the same geologic features. Although this is not a significant difficulty for database storage, multiple descriptions can be confusing for the end user.

The language of map description, especially lithologic terminology, presents interesting challenges for map database design. Current efforts at standardizing scientific language will be useful for future data collection; however, historical data are characterized by a diverse, nonstandardized language. Successful data model designs will have to provide mechanisms for treating such information.

This paper discusses elements of map description that relate to data model development. The ideas evolved from the ongoing task of converting Kentucky's existing geologic maps into database format. These maps contain over 10,000 individual text descriptions for geologic units that were prepared by a wide variety of authors. Converting these descriptions into database format and relating them to geologic map features is challenging due to the high variability of format and grammar of the text. The paragraphs contain a large element of lithologic description and this paper discusses methods of treating this information for both historical and newly collected data. These observations are based on past experience with creating core-logging manuals for sedimentary rocks in the coal fields of the United States.

MAP-UNIT DESCRIPTIONS
Map Components

Map unit descriptions on all of Kentucky's geologic map legends occur as discrete text paragraphs that refer to some part of the map (Fig. 1). These descriptions are almost exclusively for sediments and sedimentary rocks, are arranged vertically on the map according to stratigraphic age, and relate to the map by means of a map color and symbol (explained in the legend).

Figure 1. Example map unit description from the Lancaster 7.5-minute quadrangle, Kentucky.

Inspection of paragraphs on almost any map reveals a number of issues that relate to database development. Not every cartographic map unit has a single description whereas some descriptions have no cartographic representation. Figure 2 shows examples of descriptions that have no spatial representation. Subdivisions of geologic units are often described but not mapped, and they may be formal or informal. In Figure 2, the Reba Member has two separate descriptions for its lithologically distinct upper and lower parts (informal units). Remaining formal members each have a single description, but were also not cartographically depicted on the map. In this example, the map unit (Ashlock Formation) has no unique description except for the composite descriptions of its formal and informal members. As map units may have components, each constituent lithology may also have descriptive components. Figure 3A shows the heterolithic character of typical eastern Kentucky geologic units (map-unit components). Figure 3B indicates that some of the lithologies also have distinct components, which is an important design issue for encoding rock composition.

Figure 2. Formal and informal descriptive components of map units.

Figure 3. Map-unit and lithology components for heterolithic units. A. Lithologically heterogeneous map unit. B. Heterogeneous lithology.

Another challenge in designing the database for Kentucky geologic maps relates to map compilations. When maps have been digitally combined from multiple sources to prepare new products, revised descriptions are also compiled for the aggregated map areas. This results in multiple descriptions for the same geologic features in the map database (i.e., one for the original published map and another for the compilation). In most cases descriptive "versions" relate to maps compiled at different scales, but could also result from different authorship at the same scale.

Map-Unit Properties

On Kentucky's maps, much of the descriptive information pertains to components of map units, but a number of properties typically relate to the map unit as a whole. These include

Thickness
Age
Quality of exposure
Geomorphologic form
Diagnostic features
Bedding
Contact relations with adjacent map units.

An important aspect of such map-unit properties is that many are expressed as ranges or multiple values rather than as discrete values.

Implications for the Data Model

The concept of hierarchical components has been integral to the NADM from its inception, and can be designed in terms of parent/child relationships. The design of relationship classes may differ according to implementation (e.g., relational vs. object-oriented technology). The efficiency with which data can be queried may vary significantly among such applications. This will only be known when methods are implemented using a specific software process. The challenge for successful design will be to make complex data relationships appear relatively simple (or at least understandable) to the user. Equally important will be creating intuitive data-entry mechanisms that will allow map unit properties to be cataloged in an efficient way. This requires a means for identifying map components and relating descriptive properties to them. Because of the complexity of map-unit descriptions, it will also be desirable to preserve the full text in the database in order to retain original context.

Geologic features that have more than one description (e.g. derived from two versions of a map, or two scales of compilation) present problems for user interfaces. For example, if a selected map unit has more than one lithologic description, the software system should have a rule set for determining which to return to a user's request. One means of establishing these rules is to assign a rank to descriptions that would indicate preferred data. Ranking could be based on scale, where attributes derived from larger scales would have a higher rank. Ranking descriptions could also permit the design of methods for returning information appropriate to a users' map extent or selection set. This issue will become increasingly important as seamless databases are constructed from multiple source maps.

Some of the concepts found in the Kentucky descriptions (and presumably other geologic maps) would require the addition of new data model tables. Examples are exposure conditions and engineering properties. Current implementations of the data model store much of the lithologic information in a single table. The prevalence of range data (e.g., minimum and maximum thickness) and the potentially large number of parameters suggests that individual tables for each property would be more suitable. Data model implementations should provide the flexibility for adding such features.

LITHOLOGY DESCRIPTIONS

Lithologic description will be vital to the functionality of any map database. Many of the proposed queries for testing the NADM involve some aspect of finding map objects by specifying lithologic properties. A number of efforts are under way to standardize lithologic terminology and classification of sediments and rocks, and this discussion is not intended to supplant them. Many of the problems currently being discussed were also found during the preparation of field core-logging manuals for the coal industry (Ferm and Weisenfluh, 1981; Ferm and others, in press). Solutions found during the development of the core manuals may also be applicable to the NADM.

Rock Classification

In the early and mid-1970's, energy shortages in the U.S. led to an increase demand for coal and resulted in intensified core-drilling programs in the Appalachian coal fields. The rock core from this exploration drilling was being described by a wide variety of personnel with varied experience and logging systems. Research programs designed to analyze the volumes of information that were being generated quickly identified the problem of inconsistent descriptions. This lack of consistency resulted in an inability to compare results from one drill hole to another and led to an effort to develop a new rock classification and field methods to reduce this problem. The outcome was a number of photographic core-logging manuals; subsequently, computer methods were developed for processing the data that resulted from using the manuals. Several tenets (explained below) became evident during this program that define a common-sense approach to rock classification in general.

Consistent results. Any successful rock classification should facilitate repeatable results, particularly among different users. It may seem obvious to state that every practitioner should be able to look at a rock and derive the same name for it. But this is difficult to achieve because of the complexity of some classifications and difficulty in judging category boundaries. During the development of the core books, repeatability was measured by conducting trials in which a group of people were given criteria for classification categories and asked to place a number of samples in the appropriate category. If agreement was not high, the categories were reevaluated and boundaries adjusted to improve consistency. Two problems relating to definition of rock categories became apparent. First was the universal tendency to define too many categories. This resulted in users having to make fine distinctions of properties, usually without a high degree of success. For a given range of a gradational property, more than three or four subdivisions generally led to low levels of agreement.

The second problem was the placement of a classification boundary within a gradational series at a naturally high frequency for a property. This could be judged because samples were taken to represent the frequency of different rock types. When arrays of samples were prepared to assess the variability of important properties (Fig. 4) distinct patterns in frequency distributions often became apparent. Placement of class boundaries at low frequency points reduced error because fewer samples would occur close to that boundary. For arrays in which there were no obvious natural boundaries, the only technique that resulted in high consistency was to keep the number of categories low. For these reasons, rock classifications that use arbitrary class boundaries inevitably result in some classes that are difficult for users to discriminate with consistent results.

Figure 4. Frequency distribution of a hypothetical rock-property based on systematic sampling. Sample array shown for set of core samples.

Standard information. The core-book project was originally undertaken, in part, because previous core logs produced by drilling or coal company personnel lacked important lithologic details. Initial experiments to improve logging used experienced geologists to collect the descriptive information. These efforts were not entirely successful because there was no uniformity about what rock properties should be included in the primary rock and which should be treated as ancillary comments. This is an important distinction because of the operators' tendency to omit comments after long periods of observation (i.e., descriptions tended to become more simple toward the end of the day). Another tendency resulted from repetitive information; if a particular property was nearly invariant, operators would discontinue recording the information over time. For example, if all sandstones in a core were "lithic" in composition, only the first few occurrences would be described as such and subsequent samples would only be described as "sandstone". Although uniformity of the intervals was understood by the operator, future users of the data would be uncertain about sandstone composition.

The solution to this problem was to build as many of the important properties into the rock term as was possible and to make it simple for the operator to record this information in a field or laboratory setting. A hierarchical system of description was created that recorded the properties in a three-digit (subsequently increased to four-digit) numeric code (Fig. 5). These codes were not arbitrarily assigned; rather, each digit had significance with respect to rock properties. For example, the first digit always recorded the primary rock group (sandstone, shale, etc), and the last records sedimentary structures. The use of the middle digit was dependant on the primary rock group. For example, the code 551 indicates a primary rock group of sandstone (first digit 5), mineral composition of quartzose (second digit 5) and crossbedded structure (third digit 1). The numeric classification was documented by full-scale color photographs that depicted the range of properties for each class. English text was assigned to the codes, based on terminology in common use in the region for which each manual was prepared. The resulting numeric logging system was easy to learn and encouraged the recording of detail, because recording 541 (crossbedded lithic arenite) took no more effort than writing 500 (sandstone). The system did not preclude the use of generalized terms (i.e., 500 was a valid code). Many users did prefer to use a text rock term rather than a number and therefore the problem of consistent terminology could not be completely avoided.

Figure 5. Elements of a numeric rock classification system and related properties.

Flexible terminology. Regardless of how much effort is put into standardization of geologic terminology, many users in different locales and with varied training will continue to use variations of rock terms. Moreover, geologic databases must be capable of storing the large amount of historical data that has been collected using various descriptive systems. Nonstandard terminology between regions of the United States (and the world) proved to be a significant problem when the core manuals were prepared. Figure 6 shows an example of a root-penetrated rock that is known by different names in various parts of the country. In Pennsylvania, the term claystone is used for this rock, but that same term has very different usage in other regions. Rather than require users to adopt a single term with which they may be unfamiliar or uncomfortable, the core manuals retained regional terminology while maintaining a consistent numerical classification to unify descriptions. Therefore, the code "137" (or any other) always indicated the same lithology, irrespective of its geographic occurrence; photographs illustrating rock properties helped to reduce ambiguity in nomenclature.

Figure 6. Relating regional terminology through photographs and the numeric code.

Data Model Implementation

All the elements of a hierarchical rock classification are present in current implementations of the NADM; however, none take advantage of numerical coding systems for recording data. Dictionaries for lithologic descriptions will be necessary to address two of the issues discussed above: generalizations and synonyms. An example is shown in Figure 7. The fields of this sample dictionary record the type of term (class), its numeric representation (code), and the English equivalent (text). The second and third records shown in Figure 7 encode generalized versions of the first by use of the digit "0" in the rock code. Each of the three text names is considered the "standard" form of the rock term for this hypothetical dictionary. The last two records are variations of the standard term "quartzarenite" and therefore share the same numeric code. The "class" field is used to indicate a region, classification or dataset to which that name applies. If lithologies are entered in the database in code format, the corresponding text is assigned by selection of a standard or non-standard class. Alternatively, if data are collected and entered as text, the appropriate numeric code can be determined by the same means. While many geologists are adverse to using numerical descriptors, most who have used the core books found the method easy to learn and an effective means of communicating with others about lithologic properties. At the same time, the logging system does not require recording of data in the numeric system.

Generalizations in a lithologic dictionary

Figure 7. Generalizations (records 2-3) and synonyms (records 4-5) in a lithologic dictionary.

Use of a numeric code system for lithologic descriptions increases the efficiency of data collection and is well suited for processing in computer systems. Because the digits are directly related to rock properties, they can easily be linked to other tables of property names that have specified definitions and criteria. Code-based descriptions also facilitate searching of large databases for units that contain certain properties. Figure 8 illustrates a data model design that could be used for storing and retrieving lithologic descriptions. A rock description generally has a source document (published or unpublished), which can be stored in the database as a text block with appropriate metadata. Each lithology component in a description can be related to predefined classifications by associating them with an entry in a system dictionary. The dictionary consists of numeric codes with one standard and many nonstandard text names. Each lithology code will have preassigned standard rock properties that will be stored in separate description tables. Lithologies may also have nonstandard properties that could relate directly to a lithologic occurrence. For example, the rock "551" or crossbedded quartzarenite would have standard properties of grain size (sand), mineral composition (quartzose), and sedimentary structure (cross stratified). An occurrence of "551" could have a nonstandard property of carbonate cement or brittle fractures.

Figure 8. A sample data model design for lithologic descriptions.

Data input for an occurrence would consist of picking the appropriate lithology code or term from the dictionary. Because each term would have pre-defined properties, users would not have to reenter that information. Queries to lithologic databases typically relate to individual properties rather than the rock terms assigned to the map unit. For example, users may desire all units with quartz-rich lithologies or those with a particular grain size. Dictionaries and related property tables will allow for efficient query tools to access lithology information in this manner.

CONCLUSIONS

Important database design issues for storing map-unit descriptions in a data model include:

provide an intuitive means of assigning descriptive elements to the map and lithology components to which they apply
where geologic objects have multiple descriptions, permit ranking to facilitate user access to the data
implement rock classifications as hierarchical, numeric systems
document rock classifications with photographs as well as definitions to reduce ambiguity
use dictionaries to permit nonstandard lithology terms to be related to standard classifications
assign standard properties to dictionary terms to make rock type queries more flexible.

REFERENCES

Ferm, J.C. and Weisenfluh, G.A., 1981, Cored rocks of the Southern Appalachian Coal Fields: Lexington, University of Kentucky, 93 p.

Ferm, J.C., Weisenfluh, G.A., and Smith, G.C., in press, A method for development of a system of identification for Appalachian coal-bearing rocks: International Journal of Coal Geology, Special Publication.

RETURN TO Contents

National Cooperative Geologic Mapping Program | Geologic Division | Open-File Reports

U.S. Department of the Interior, U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/2001/of01-223/weisenfluh.html
Maintained by David R. Soller
Last modified: 18:24:46 Wed 07 Dec 2016
Privacy statement | General disclaimer | Accessibility

Digital Mapping Techniques '01 -- Workshop Proceedings U.S. Geological Survey Open-File Report 01-223