OF-99-386: Development of the Kansas Geologic Names Database

Digital Mapping Techniques '99 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 99-386

Development of the Kansas Geologic Names Database: A Link to Implementing the Geologic Map Data Model Standards

By David R. Collins and Kurt Look

Kansas Geological Survey
Campus West, University of Kansas
1930 Constant Avenue
Lawrence, KS 66047
Telephone: (785) 864-3965
Fax: (785) 864-5317
e-mail: david@kgs.ukans.edu
klook@kgs.ukans.edu

INTRODUCTION

The Kansas Geologic Names Database (NDB) is a relational database of stratigraphic nomenclature. It has evolved, with extensive modification, from the text files of the recently published Lexicon of Geologic Names of Kansas (through 1995), edited by Baars and Maples (1998). The NDB development process has proved to be a valuable and practical aid toward implementing the digital geologic map data model (MDM) proposed by Johnson, Brodaric, and Raines (1998) through the National Geologic Map Database (NGMDB) Project. In particular, the nomenclature database facilitates development of the tables for Kansas related to rock unit definitions within the MDM. It is applicable to stratigraphic nomenclature appropriate to geologic maps and other reports on the geology of Kansas published currently or at any time in the past. When dealing with previously published maps, the nomenclature database links rock unit names in use at the time of publication with both prior and subsequent variations in accepted nomenclature, and with the sources of those changes. This paper reviews development of the nomenclature database and its associated data model, the influence of this project on development of a geologic map data model for Kansas, and variations from the data model of the NGMDB Project.

OBJECTIVES AND MOTIVATION The primary goal of the nomenclature database project is the organization and enhancement of information from the Kansas Lexicon into a form that permits easy access to descriptive information on rock units based on a wide and flexible range of user needs and selection criteria. One geologist may be interested in the current nomenclature of the formations within the Admire Group (Upper Pennsylvanian, Virgilian Series), along with the locations, descriptions, and available images of their type sections. Another geologist's interest may focus on the accepted nomenclature for the Admire Group prior to 1938 (then considered Permian) and its relationship to the previously defined "Admire shales." Someone studying the history of geologic research in Kansas might want a list of all rock units first described in publications having R. C. Moore (a former state geologist) as principal author.

Numerous other goals are tied to flexible access to rock unit information. It is clearly desirable to link descriptive text files for a specific rock unit to corresponding digital map objects, photographs, or document images. On-line edit capabilities permit easy correction of errors found within the nomenclature database, with immediate display of corrections to users. A capability for publication on-demand directly from the database permits prompt, cost-effective publication of enhanced or specialized lexicons as information within the database is improved.

Geologic mapping, broadly defined, is the fundamental data collection and information management activity of the geologist. With the rapid pace of digital geologic map data development in Kansas, the nomenclature database is needed for direct support of mapping and related publication activities. The importance of the nomenclature database for implementation of the NGMDB Project's proposed geologic map data model standards became apparent in the early stages of the project, with recognition that the proposed standards had much to offer toward design of the nomenclature database. Both efforts are viewed as a step toward development of a general model for all geologic data, as suggested by Richard (1998).

Many of these objectives arose in direct response to shortcomings of the Lexicon, which was developed in a word processing environment. The Lexicon's digital text files lack organized database structure and are not publicly accessible. By its nature, a lexicon of stratigraphic nomenclature is characterized by repetitive use of a limited set of information types. Contributions to the Lexicon from numerous stratigraphers, combined with the large number of included names, contributed to inconsistent style, format, and information content for named units. Use of abstracts from earlier lexicons perpetuated previously published errors. This practice also resulted in frequent occurrences of citations with imbedded references to sources described by author, date, and page with no further identifying information to be found anywhere in the Lexicon. In its printed form, the Lexicon provides no visualization of type sections or the geographic extent of named units. Direct support to digital mapping activities is not practical with the structure of the Lexicon's text files. In an effort to conserve space and limit the publication to a manageable size, a considerable amount of useful information was excluded from the Lexicon.

THE NDB DEVELOPMENT PROCESS

Given the objectives of the geologic names database project, relational database capabilities clearly offered a practical means of addressing the task at hand. In a critical first step, the original Lexicon text files were reformatted to facilitate parsing into separate fields. Significant revisions and enhancements occurred as information contained in the original files was checked for errors and omissions.

As development proceeded, it became apparent that similar data structures were appropriate for management of digital geologic map data and for management of historical information about the names of geologic rock units. Information on digital geologic map data models available through the NGMDB Project web-site http://ncgmp.usgs.gov/ngmdbproject/ simplified the development process. It provided a clear starting point, identifying critical tables, data fields, and relations.

An iterative process was implemented to achieve a balance between the additional effort of working around more problems in the full text files and the additional gains from further identification of useful text segments prior to parsing and loading into the relational database. Once that balance was reached and parsing routines were thoroughly tested, the information from the enhanced and reformatted Lexicon files was parsed and loaded into the relational database management system. A large portion of the information went into three tables of the new NDB; the SOURCES, CITATIONS, and ROCK_UNITS tables (Figure 1). There are one-to-many links from SOURCES to CITATIONS (each source may have citations relating to many different rock units) and many-to-one links from CITATIONS to ROCK_UNITS (where citations from many sources may define a particular rock unit). Further parsing, as needed, will be done within the relational database management system.

Figure 1. Primary tables in the NDB, reflecting the interface of information sources and defined rock units with specific citations.

The SOURCES table contains a separate record for each unique information source. There are 914 sources currently identified in the NDB. Records contain basic bibliographic information, source format (book, journal, note, map, etc.), and (where appropriate for specific geologic reports) information on the geographic extent of the study. A recursive source relationship ("contained in") is built into the SOURCES table. One source may contain many other sources. For example, an issue of a journal may contain many articles. Currently, 186 sources are identified as "contained in" 97 of the other sources. Records within the ROCK_UNITS table provide the basic identifying information for each recognized geologic rock unit name. This includes name, name origin, lithostratigraphic or chronostratigraphic rank, the names of each unit of higher rank containing the original unit, text statements of geographic extent and pointers to map objects for visualization of geographic extent. There are 1820 unit names in the database, including about 250 chronostratigraphic names and 1570 lithostratigraphic names. Fewer than 500 of the lithostratigraphic names are currently accepted as formal names in Kansas. A recursive unit relationship ("current_usage") is built into the ROCK_UNITS table to link abandoned unit names to the currently accepted nomenclature for the corresponding unit. Each record of the CITATIONS table, linking SOURCES and ROCK_UNITS, contains descriptions or comments regarding a specific rock unit, obtained from a specific source, with reference to the specific location of the information within that source. There are 5179 separate citations; an average of 2.8 citations per named rock unit.

Figure 2. Generalized structure of Version 4.3 of the digital geologic map data model (Johnson, Brodaric, Raines, Hastings, and Wahl, 1998).

The overall structure of the digital geologic map data model (MDM), as presented in Figure 2-6 of Version 4.3 (Johnson, Brodaric, Raines, Hastings, and Wahl, 1998, p. 7), is generalized here in Figure 2. The model has four major components. The METADATA section provides detailed information about the information sources (i.e., data about data). The SOURCE table is the primary table within the metadata section of the MDM. In the MDM the sources are typically either published maps, sources containing the published maps, or the documents and databases from which the published maps were derived. The COMPOUND OBJECT ARCHIVE of the MDM provides data structures for information related to all complex geologic features found in the real world, including a rock unit table and related descriptive tables. The SPATIAL OBJECT ARCHIVE section maintains data on map objects used in visualizations of particular rock units. The LEGEND, with associated classification schemes, provides the functional details for specific map visualizations achieved through the combination of spatial object representations of rock units.

Figure 3. Metadata tables in the NDB, describing the origins and nature of available information.

A more detailed view of the metadata portion of the NDB is provided in Figure 3. These tables provide information collectively describing the origins and nature of available geologic information. They correspond to the metadata tables of Version 4.3 of the MDM.

A separate AUTHORS table has been added to facilitate access to work by particular authors in the NDB. An intersection table (X_AUTHORS_SOURCES) links authors to each of their publications, identifies their sequence in a list of contributing authors, and links the author to their employing organization for that publication. Sources are linked to publishing and funding organizations through a separate intersection table (X_SOURCE_SUPPORT). This permits many-to-many relationships between funding agencies and information sources, and between publishers and information sources, to be handled as compound one-to-many relationships. The SOURCES_RELATIONSHIPS table links sources within the SOURCES table through relationships such as "complies with [the specified standard]" or "digitized from [the specified source]" as defined in a data dictionary. The PROJECTIONS table provides the additional information unique to information sources with map formats.

Figure 4. Rock unit tables in the NDB, providing descriptions and relationships of rock units.

Details of the portion of the NDB corresponding to a compound objects archive are seen in Figure 4. The "Formal Unit" and "Rock Unit" tables of Version 4.3 of the MDM are merged into a single ROCK_UNITS table in the NDB. Sequences of units appropriate to a particular map, or published in a specific source as "formal" units at a particular time, are listed in the SEQUENCE table. Records in the OCCURRENCES table give specific locations where a rock unit has been described, ranging from the defining holostratotype (original type section) of a rock unit to a local measured section that includes all or part of the unit. The OCCURRENCES_IMAGES table provides pointers to map objects, digital photographs, or scanned records (such as the published type section or a measured section) related to an occurrence of a rock unit. The OCCURRENCES_SECTIONS table is presented here in place of the full range of descriptive tables found in Version 4.3 the MDM. Searchable data related to lithology, composition, fossil assemblages, thickness and other classifying characteristics of a rock unit would be found here, separate from general descriptive statements found in the CITATIONS table. The characteristics found under the OCCURRENCES_SECTIONS table and the relationships found in the UNITS_RELATIONSHIPS table are defined in data dictionaries covering the full range of relationships (hierarchy, classification, correspondence, proportion, and disposition) described by Richard (1998).

STATUS The design of the Kansas Geologic Names Database is consistent with the corresponding elements of the proposed geologic map data model standards, and represents a major step toward full implementation of those standards. Functions defined in tables in the LEGEND portion of Version 4.3 of the MDM (see Figure 2), control production of visualizations of geologic map data. Similar functions are defined for the NDB using commercial report writer software (available either as components of the relational database management system, or as separate systems) to control report generation for a complete and current lexicon of geologic names in Kansas or selected subsets. For example, a lexicon could be extracted from the database for geologic names used by Moore, Jewett, and O'Connor (1951) in their geologic map of Chase County, Kansas.

Universal web access is now under development. A revised lexicon of geologic names in Kansas will be published on-line at the Kansas Geological Survey's web site http://www.kgs.ukans.edu/. The on-line publication will accompany a web conference site of the Kansas Nomenclature Committee for discussion of nomenclature issues, contributions of new information, and reporting of errors within the database. This will be similar to the web conference site used for discussion of the geologic map data model standards by the AASG/USGS Geologic Map Data Model Working Group at http://geology.usgs.gov/dm/.

Merging the NDB with the MDM will result in a geologic data model with sources (for particular nomenclature citations) as attributes of spatial objects used to represent specific rock units within a particular visualization of regional geology. The concept of a geologic names database can be broadened to include the historical development of accepted names for specific occurrences of structures or other geologic features in addition to rock units.

CONCLUSIONS

(1) The high degree of effort required to publish complete lexicons of geologic names by the traditional printing process, accommodating a relatively small proportion of new or revised names, has made such tasks a low priority for most geologists.

(2) Consistent formats throughout large printed volumes (almost impossible to achieve as a manual process and still not easily obtained using word processing software) become feasible in a relational database environment.

(3) Books, including lexicons, are just like published geologic maps -- you always discover important omissions and uncorrected errors after they go to press. The larger the press run, it seems, the more numerous and significant the errors.

(4) On-demand publication and distribution from relational databases provides a more efficient and cost-effective method for geological surveys to maintain formal lexicons and provide access to information on geologic nomenclature.

(5) Universal, on-line, access provides strong incentives for geologists to participate in the contribution of new information or identification of errors within the database by limiting their involvement to productive activities and providing rapid incorporation of contributions into the public domain.

REFERENCES

Baars, D. L. and C. G. Maples, eds., 1998, Lexicon of Geologic Names of Kansas (through 1995): Kansas Geological Survey, Bulletin 231, 271 p.

Johnson, B. R., B. Brodaric, and G. L. Raines, 1998, Digital geologic map data model; version 4.2: AASG/USGS Geologic Map Data Model Working Group, http://ncgmp.usgs.gov/ngmdbproject/standards/datamodel/model42.pdf.

Johnson, B. R., B. Brodaric, G. L. Raines, J. T. Hastings, and R. Wahl, 1998, Digital geologic map data model, Addendum to Chapter 2; version 4.3: AASG/USGS Geologic Map Data Model Working Group, http://geology.usgs.gov/dm/model/Chapter2add.pdf.

Moore, R. C., J. M. Jewett, and H. G. O'Connor, 1951, Areal Geology of Chase County, Kansas; in, Geology, Mineral Resources, and Groundwater Resources of Chase County, Kansas: Kansas Geological Survey, Volume 11.

Richard, S. M., 1998, Digital Geologic Database Model: Arizona Geological Survey, http://www.azgs.state.az.us/GeoData_model.pdf.

Return to Table of Contents
This site is https://pubs.usgs.gov/openfile/of99-386/collins.html
Maintained by the Eastern Publications Group Web Team
Last revised 11-2-99

Digital Mapping Techniques '99 -- Workshop Proceedings U.S. Geological Survey Open-File Report 99-386