U.S. Geological Survey
e-mail: amdavis@usgs.gov
Department of Geology
Indiana University
1001 E. 10th St
Bloomington, IN 47405
The EESP wanted to use a data model that would preserve the integrity of its data. The team wanted to ensure that the observations and interpretations of the field geologists would not be distorted by the method of digital data representation and storage. The EESP felt that several issues or concerns must be addressed when creating the database. These include:
In addition, geologists have expressed concern that a standard data model can endanger the individuality of geologist interpretations. This individuality may be lost with the use of standard language and/or the standard format, and efforts must be made to preserve this individuality while pursuing the communication benefits of a data model standard.
This paper discusses how well the NADM addresses these concerns, after discussing some specifics about an attempt to place the EESP geologic map data into the NADM. Conveying some of the issues and problems that were encountered during the EESP's application of the NADM will hopefully provide some useful feedback that can be incorporated into future iterations of the model.
Initially, the geologic map attribute data recorded by EESP geologists for this area were entered into a spreadsheet format that contained 46 columns for each map unit. The fields were: <MAPUNIT>, <SURF_BED>, <SURFTYPE>, <LITHPRI>, <LITHSEC>, <LITHTER>, <FORM>, <MEMBER>, <GROUP>, <SUPERGROUP>, <ROCKCLASS>, <AGE>, <GEOCHR>, <GEOCHRTECH>, <GEOCHRREF>, <FOSSIL>, <FOSSILTYPE>, <FOSSILREF>, <CORRELEXTR>, <ORIGIN>, <RES>, <RESREF>, <COLOR>, <MINPRI>, <MINSEC>, <MINOTH>, <CLASTPRI>,<CLASTSEC>, <CEMENT>, <THICKAPPRX>, <THICKRANGE>, <BEDTHIN>, <BEDMEDIUM>,<BEDTHICK>, <CONTUP>,<CONTLOW>, <FOLPRI>, <FOLSEC>, <FOLTER>, <CMPM>, <RMPM>, <;RMRM>, <DEFORMAGE>, <DEFORMTECH>, <DEFORMREF>, and <COMMENTS>. Table 1 is an example of the first 7 fields of this spreadsheet.
Table 1. First 7 fields of the single table that initially held all of the data for the DC area geologic map database.
[Click HERE for a Word version of this table.] |
These data were then normalized (i.e., number of fields required to store the data reduced) where possible and put into a format that is easier to convert to the NADM. Please see Davis et al. (2001) to examine the resulting MS Access database (dcdb_eesp.mdb) in detail. Highlights of the format transformation include placing lithological, mineralogical, planar feature, and clast information into their own tables. Some data elements were not readily normalized or split off as reference ("look-up") tables. These fields of data compose the table [CHARACTER], shown in Table 2.
Table 2. First 8 fields of the CHARACTER table of dcdb_eesp.mdb.
[Click HERE for a Word version of this table.] |
After the initial database design was complete, queries were written to convert to the NADM format and new reference tables were created. The converted database was stored as the file dcdb_dm.mdb (Davis et al., 2001).
Version 1.0 of the two databases (the Eastern Earth Surface Processes Team and the NADM versions - dcdb_eesp.mdb and dcdb_dm.mdb, respectively) were evaluated by geologists of the Eastern Earth Surface Processes Team through reviews for Davis et al. (2001) and various informal discussions. These databases were compared and the NADM was evaluated in terms of its success in accommodating the data of the EESP.
These data were not placed in the standard format for a variety of reasons. Either the place for them within the model was difficult to determine, they were not accounted for by the model, or some combination of these factors. In some cases, entire tables would need to be added to the NADM in order to accommodate the EESP data, because appropriate tables do not currently exist. This is certainly an anticipated part of the NADM evolution. In other cases, new fields would need to be added to existing NADM tables. Among the data that did go into the NADM, some of it didn't fit quite right and some of the NADM field specifications were modified. Some notes and further description concerning the lack of fit of some of these data elements are:
Table 3. The CLASTS table of dcdb_eesp.mdb.
[Click HERE for a Word version of this table.] |
Table 4. Some of the possible fields of the Rock_Composition table of the North American Data Model standard for geologic map data, version 4.3
[Click HERE for a Word version of this table.] |
Table 5. The MINERALS table of dcdb_eesp.mdb
[Click HERE for a Word version of this table.] |
As the EESP finds more uses for the dcdb_eesp.mdb and its accompanying ArcView shape files containing the spatial data, new methods of classifying the data will be required, resulting in the need to add fields to the database. For example, two applications have required that map units be grouped according to Appalachian Physiographic Province, so this data element was added into the database as the field "Province".
The NADM provides options for preserving the integrity of geologist observations by allowing the addition of tables and fields where appropriate, but requires placement of data into fields and tables whose names and position in the data model are not intuitive to geologists. One example of an unnecessary complexity that makes the model less intuitive is the fact that lithologic data is separated into "Rock Unit" data and "Rock Composition" data, which are housed in different portions of the data model. Of course, software tools eventually can be built to "insulate" the geologist from the NADM, but these tools are expensive to develop and not yet available.
The NADM does not account for many of the specific needs of the EESP and its clients. Fields and tables had to be added to the NADM in order to ensure that these needs can be met. For example, clast chemistry in the Leesburg conglomerate is important for landfill siting considerations and clast chemistry doesn't have a place in the NADM. For another example, a field needed to be added that described the resource potential of map units, and the NADM table to which it should belong is uncertain.
Extracting data from the database into user-friendly formats is an issue for both of the databases (dcdb_eesp.mdb and dcdb_dm.mdb). In fact, it is a common issue that database designers face. Typically, "queries" (or "views" depending on the database management software) are written to put the data into a format that geologists and others can work with. The amount of query writing required is directly proportional to the complexity of the data storage model, and is large for the dcdb_dm.mdb (the NADM compliant version), but not much larger than for the dcdb_eesp.mdb database. One example of this type of query is one that was written for the dcdb_eesp.mdb database ("output5", see Table 6) to combine province information with lithology and mineralogy information. This query helped facilitate the communication of various geological processes and features through a series of thematic maps. Queries such as these provide custom snapshots of the data that are very important to the usability of the database. Without these custom views of the data, the EESP geologists have had trouble understanding their own data and finding what they want to know about map units.
Table 6. Some of the fields of [output5] - a query used for data retrieval.
[Click HERE for a Word version of this table.] |
The NADM is not easy to implement. Careful scrutiny coupled with trial and error are required to figure out how data should be placed into this model. Removing the legend and symbolization portions and dissolving the compound object-singular object divide would help simplify this standard, but might limit its potential.
The NADM is a very important standard that will help geologists to communicate with other professionals in a uniform way, but should be made easier to implement and be further tested by potential users before it is adopted. Despite its problems, the NADM is a tangible example of a standard geologic map data model that can be improved upon, and Version 4.3 of the NADM has been very useful as a stimulus to discuss and explore data storage and representation issues in the Geologic Mapping community.