USGS visual identity mark and link to main Web site at http://www.usgs.gov/

Digital Mapping Techniques '01 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 01-223

By e-mail:

Will a Standard Data Model Work for Washington, DC Area Geologic Geospatial Data?

By Adam M. Davis

U.S. Geological Survey
e-mail: amdavis@usgs.gov

Department of Geology
Indiana University
1001 E. 10th St
Bloomington, IN 47405

ABSTRACT

The U.S. Geological Survey Eastern Region Earth Surface Processes Team (EESP) assembled a database to accompany Geographic Information System (GIS) geologic layers of the western portion of the Washington DC Metropolitan Area. The process of assembling this database involved combining spreadsheets, map unit description text files, and Arc/Info attribute tables into one database. The database design attempts to provide data storage efficiency, ease of data retrieval, compliance with the standard data model effort, and consistency with the language and style of the data as it was collected and originally entered into digital format. At the end of its compilation process the database was converted to a format compliant with the North American Data Model standard for geologic map data (NADM). Application of the NADM to Washington DC Area geologic map data had several problems. One problem was that the NADM was complex and had abstract terminology that was different from the terminology used by EESP. In addition, there were several data elements that didn't seem to fit into the NADM format, including: Minerals for an entire rock unit rather than for just a rock composition, Clast and Matrix chemistry information for certain types of rocks such as conglomerates, and information about planar features of the map units such as bedding and foliation. The data model does provide ways to preserve the individuality and integrity of the geologist's interpretation and data, but it is complicated, causing confusion about where certain data elements fit into the data model.

INTRODUCTION

The U.S. Geological Survey Eastern Region Earth Surfaces Processes Team (EESP) has created a geologic map database based on bedrock and surficial geology maps of the Washington DC Area. The team attempted to convert its database into a format that conformed to the North American Data Model for geologic map databases, Version 4.3 (NADM). The NADM is part of an effort to standardize methods and language for representing and storing geologic geospatial data. More information about the NADM can be found at the North American Data Model Steering Committee's web site, http://geology.usgs.gov/dm/.

The EESP wanted to use a data model that would preserve the integrity of its data. The team wanted to ensure that the observations and interpretations of the field geologists would not be distorted by the method of digital data representation and storage. The EESP felt that several issues or concerns must be addressed when creating the database. These include:

In addition, geologists have expressed concern that a standard data model can endanger the individuality of geologist interpretations. This individuality may be lost with the use of standard language and/or the standard format, and efforts must be made to preserve this individuality while pursuing the communication benefits of a data model standard.

This paper discusses how well the NADM addresses these concerns, after discussing some specifics about an attempt to place the EESP geologic map data into the NADM. Conveying some of the issues and problems that were encountered during the EESP's application of the NADM will hopefully provide some useful feedback that can be incorporated into future iterations of the model.

METHODS

The EESP has constructed a geologic map database for the Washington, DC Area. It currently contains geologic data from three 30 X 60 minute quadrangles: Frederick, Washington West, and Fredericksburg. The team has taken attributes from the three quadrangle geologic maps and incorporated them into a single database using a process involving collaboration between field geologists and Geographic Information System (GIS) design personnel.

Initially, the geologic map attribute data recorded by EESP geologists for this area were entered into a spreadsheet format that contained 46 columns for each map unit. The fields were: <MAPUNIT>, <SURF_BED>, <SURFTYPE>, <LITHPRI>, <LITHSEC>, <LITHTER>, <FORM>, <MEMBER>, <GROUP>, <SUPERGROUP>, <ROCKCLASS>, <AGE>, <GEOCHR>, <GEOCHRTECH>, <GEOCHRREF>, <FOSSIL>, <FOSSILTYPE>, <FOSSILREF>, <CORRELEXTR>, <ORIGIN>, <RES>, <RESREF>, <COLOR>, <MINPRI>, <MINSEC>, <MINOTH>, <CLASTPRI>,<CLASTSEC>, <CEMENT>, <THICKAPPRX>, <THICKRANGE>, <BEDTHIN>, <BEDMEDIUM>,<BEDTHICK>, <CONTUP>,<CONTLOW>, <FOLPRI>, <FOLSEC>, <FOLTER>, <CMPM>, <RMPM>, <;RMRM>, <DEFORMAGE>, <DEFORMTECH>, <DEFORMREF>, and <COMMENTS>. Table 1 is an example of the first 7 fields of this spreadsheet.

Table 1. First 7 fields of the single table that initially held all of the data for the DC area geologic map database.

First 7 fields of the single table that initially held all of the data for the DC area geologic map database

[Click HERE for a Word version of this table.]

These data were then normalized (i.e., number of fields required to store the data reduced) where possible and put into a format that is easier to convert to the NADM. Please see Davis et al. (2001) to examine the resulting MS Access database (dcdb_eesp.mdb) in detail. Highlights of the format transformation include placing lithological, mineralogical, planar feature, and clast information into their own tables. Some data elements were not readily normalized or split off as reference ("look-up") tables. These fields of data compose the table [CHARACTER], shown in Table 2.

Table 2. First 8 fields of the CHARACTER table of dcdb_eesp.mdb.

First 8 fields of the CHARACTER table of dcdb_eesp.mdb

[Click HERE for a Word version of this table.]

After the initial database design was complete, queries were written to convert to the NADM format and new reference tables were created. The converted database was stored as the file dcdb_dm.mdb (Davis et al., 2001).

Version 1.0 of the two databases (the Eastern Earth Surface Processes Team and the NADM versions - dcdb_eesp.mdb and dcdb_dm.mdb, respectively) were evaluated by geologists of the Eastern Earth Surface Processes Team through reviews for Davis et al. (2001) and various informal discussions. These databases were compared and the NADM was evaluated in terms of its success in accommodating the data of the EESP.

RESULTS

In several ways, the NADM was difficult to implement. It is a complex model and does not seem to accommodate all of the data elements that are specific to the EESP geologic maps. As a result, only approximately 50 per cent of the EESP data was placed into the NADM. The following data elements were not placed into the NADM:

These data were not placed in the standard format for a variety of reasons. Either the place for them within the model was difficult to determine, they were not accounted for by the model, or some combination of these factors. In some cases, entire tables would need to be added to the NADM in order to accommodate the EESP data, because appropriate tables do not currently exist. This is certainly an anticipated part of the NADM evolution. In other cases, new fields would need to be added to existing NADM tables. Among the data that did go into the NADM, some of it didn't fit quite right and some of the NADM field specifications were modified. Some notes and further description concerning the lack of fit of some of these data elements are:

As the EESP finds more uses for the dcdb_eesp.mdb and its accompanying ArcView shape files containing the spatial data, new methods of classifying the data will be required, resulting in the need to add fields to the database. For example, two applications have required that map units be grouped according to Appalachian Physiographic Province, so this data element was added into the database as the field "Province".

DISCUSSION AND CONCLUSIONS

During this process of attempting to fit EESP data into the NADM, the NADM was evaluated with regard to usability characteristics, including:

The NADM provides options for preserving the integrity of geologist observations by allowing the addition of tables and fields where appropriate, but requires placement of data into fields and tables whose names and position in the data model are not intuitive to geologists. One example of an unnecessary complexity that makes the model less intuitive is the fact that lithologic data is separated into "Rock Unit" data and "Rock Composition" data, which are housed in different portions of the data model. Of course, software tools eventually can be built to "insulate" the geologist from the NADM, but these tools are expensive to develop and not yet available.

The NADM does not account for many of the specific needs of the EESP and its clients. Fields and tables had to be added to the NADM in order to ensure that these needs can be met. For example, clast chemistry in the Leesburg conglomerate is important for landfill siting considerations and clast chemistry doesn't have a place in the NADM. For another example, a field needed to be added that described the resource potential of map units, and the NADM table to which it should belong is uncertain.

Extracting data from the database into user-friendly formats is an issue for both of the databases (dcdb_eesp.mdb and dcdb_dm.mdb). In fact, it is a common issue that database designers face. Typically, "queries" (or "views" depending on the database management software) are written to put the data into a format that geologists and others can work with. The amount of query writing required is directly proportional to the complexity of the data storage model, and is large for the dcdb_dm.mdb (the NADM compliant version), but not much larger than for the dcdb_eesp.mdb database. One example of this type of query is one that was written for the dcdb_eesp.mdb database ("output5", see Table 6) to combine province information with lithology and mineralogy information. This query helped facilitate the communication of various geological processes and features through a series of thematic maps. Queries such as these provide custom snapshots of the data that are very important to the usability of the database. Without these custom views of the data, the EESP geologists have had trouble understanding their own data and finding what they want to know about map units.

Table 6. Some of the fields of [output5] - a query used for data retrieval.

Some of the fields of [output5]

[Click HERE for a Word version of this table.]

The NADM is not easy to implement. Careful scrutiny coupled with trial and error are required to figure out how data should be placed into this model. Removing the legend and symbolization portions and dissolving the compound object-singular object divide would help simplify this standard, but might limit its potential.

The NADM is a very important standard that will help geologists to communicate with other professionals in a uniform way, but should be made easier to implement and be further tested by potential users before it is adopted. Despite its problems, the NADM is a tangible example of a standard geologic map data model that can be improved upon, and Version 4.3 of the NADM has been very useful as a stimulus to discuss and explore data storage and representation issues in the Geologic Mapping community.

REFERENCE

Davis, A.M., Southworth, C.S., Reddy, J., Schindler, J.S., Mixon, R.B., and Lyttle, P., 2001, Geologic Map Database of the Washington DC Area featuring data from three 30 X 60 minute quadrangles: Frederick, Washington West, and Fredericksburg: U.S. Geological Survey Open File Report, OFR-01-227 (CD-ROM).


RETURN TO Contents
National Cooperative Geologic Mapping Program | Geologic Division | Open-File Reports
U.S. Department of the Interior, U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/2001/of01-223/davis.html
Maintained by David R. Soller
Last modified: 18:25:00 Wed 07 Dec 2016
Privacy statement | General disclaimer | Accessibility