Digital Mapping Techniques '00 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 00-325
Prototype Implementation of the NADMSC Draft Standard Data Model, Greater Yellowstone Area
By Ronald R. Wahl1, David R. Soller2, and Steven Yeldell3
1U.S. Geological Survey
Box 25046, Denver Federal Center, MS 913
Denver, Co 80225
Telephone: (303) 236-1320
Fax: (303) 236-0214
2U.S. Geological Survey
908 National Center
Reston, VA 20192
Telephone: (703) 648-6907
Fax: (703) 648-6937
3Technigraphics Systems, Inc.
Fort Collins, CO
Telephone: (970) 224-4996
Work by the National Geologic Map Database Project shows that object-oriented modeling implemented in an object-relational software system has the set of characteristics that may best support a useable national geologic map database. The North American Data Model Steering Committee's draft standard data model, with some modification, would be easy to implement in this technology. In addition, the studied object-relational technology has built-in version control, input data verification, and allows for many people to access the database for data retrieval and edit/update functions better than other investigated technologies. This technology has been tested in a proof-of-concept database for the Greater Yellowstone Area, Idaho, Montana and Wyoming.
The National Geologic Map Database Project (NGMDB) is conducted by the U.S. Geological Survey (USGS), in cooperation with the Association of American State Geologists (AASG). A more complete description of the entire geologic map database process is in Soller and Berg (this volume).
The charge by the Congress to the USGS is expressed in the following quote from the Geologic Mapping Act of 1992 and its reauthorizations: "The purpose of ...[this Act]... is to expedite the production of a geologic-map data base for the Nation, to be located within the United States Geological Survey, which can be applied to land-use management, assessment, and utilization, conservation of natural resources, groundwater management, and environmental protection...The Survey shall establish a national geologic-map data base. Such data base shall be a national archive that includes all maps developed pursuant to sections of this Act, the data bases developed pursuant to the investigations under [the appropriate] sections [of U.S. law]..., and other maps and data as the Survey deems appropriate." The full text of the Geologic Mapping Act of 1997 can be found at http://ncgmp.usgs.gov/ngmact97.html.
NATIONAL GEOLOGIC MAP DATABASE
The USGS and AASG, through the NGMDB, responded with a plan that would build the database in three phases. They are:
- Phase 1 -- build a searchable map catalog containing limited metadata for all published paper and digital maps,
- Phase 2 -- develop a suite of digital geologic map standards, and link from the map catalog to existing geologic map data sets that are built according to the evolving set of standards, and
- Phase 3 -- create a standardized, online national digital geologic map database, concentrating efforts at least initially on intermediate-scale (1:100,000) maps and smaller-scale (e.g., 1:1,000,000) maps of national coverage.
The work described in this paper focuses on the 1:100,000-scale geologic map series, because it was originally proposed as the candidate data set for the database when the Geologic Mapping Act was enacted.
A USABLE MAP DATABASE
A national geologic map database must be usable by a broad customer base. Experience with building databases, especially when making them available for use on the web, shows that such databases must allow easy data entry and editing as well as allow for straightforward search and data retrieval. In addition, it should have at the least the following characteristics, including interaction with other geoscience databases, seamlessness, data content and retrieval standards, and availability over the Internet.
Interaction With Other Geoscience Databases
The technology used to implement the geologic map database should allow existing and future geoscience databases to interact easily with it. We recognize that three general classes of such related databases are important. They are standards data, complementary geoscience data, and non-geoscience data. Examples of standards data are geologic symbol standards (now in preparation) and geologic names. Standard symbols accessible through the map database will aid in uniform annotation and decoration particularly for lines and points. Use of the USGS geologic names database, called the Geologic Names Lexicon (GEOLEX, see Stamm and others, this volume), will enable access to formal unit names and type section data for rock units in the map database.
Complementary geoscience databases encompass gravity, aeromagnetic, geochemical, paleontological, and geochronologic databases. These databases can contain information essential to the understanding of the geology of an area. Databases containing topography, hydrography, surficial geology, and soil characteristics provide information about the nature of rock properties, control of the topography and hydrography by geologic structures, and kinds of weathering and erosion, and soil formation that have taken place in a region under study.
The third class of related databases encompasses non-geoscience information. Data about culture, vegetation, habitats and range of large herbivores and predators, and pollution are examples. The geologic map database must supply information in a form easily integrable with these others databases because experience shows close connections between geology and ecological environments, land use problems, water quality and water and land pollution analysis.
Importance of Seamlessness
Three methods of organization of geologic map data into a database suggest themselves from available technology. These methods range from a data server holding data in a directory but with no additional information to show relationships that exist among data sets, through a tiled system with the information that would tie the individual data sets together, to a seamless database with all of the data stored as a coherent whole.
The first two styles of database would store map data as data tiles based normally on geographic coordinates or political boundaries. This arrangement would certainly allow easy retrieval of information by quadrangles or counties. If, however, data were needed for a drainage basin, a national or state forest, or some other irregular area, one would need to retrieve the various map tiles that cover the area of interest and then assemble the data into a coherent whole. From experience, this is a time consuming process.
Putting the data in a seamless database is a better approach. More time and effort would be needed when editing or adding to the database, and data retrievals by, for example, quadrangles may be slower than when the data are stored in quadrangle tiles, but the problems related to data retrieval from irregular areas are mostly eliminated. This storage type would require "edge" matching of the data both for geometry and for non-geometry attributes of the data as they merged into the database. However, seamless data storage would benefit research efforts greatly when geologic map data are needed for a project.
Data That is Current
All users of spatial data need the latest and best data when performing analyses of GIS data sets to aid in fundamental geoscience research and in the resolution of land use problems. However, most GIS data are out of date. People, money and time are usually not available to provide timely and important data updates. This problem affects many categories of GIS data sets. For example, all users of information from topographic map data sets must deal with the fact that most of the data are out of date. The USGS 7.5-minute quadrangle series data may, on the average, be twenty years old. This problem has arisen mostly because of the high costs of, and limited resources available for, topographic map revision (see Moore, this volume).
Geologic data are not as voluminous as topographic data, but updates of geologic maps are currently just as slow. New and updated geologic map data must be added easily into the map database in a way that will eliminate some of the delays that normally occur when publishing new geologic map data or amending prior map data. Printing maps on demand by clipping the data on the fly and then using standard map collar information and organization should allow more effort to be put into updating the map database.
However, old geologic map data should not just be discarded or ignored. Historic geologic interpretations, and especially those that record geologic conditions prior to changes such as landslides, riverbed changes, floods, and volcanic eruptions, are invaluable to retain while conducting modern geoscience investigations (e.g., Chirico and Epstein, 2000). In addition, adding, updating, and in general revising geologic map data would be easier and more accurate with earlier data available in the map database while the revision process proceeds.
Use of a Geologic Map Data Model Standard
Geologic map data must be available in a standard set of formats with a standard minimal attribute list and organization. The use of a data model standard would eliminate most of the problems related to data attribute content, especially where standardized lists can simplify analysis of these data, and would remove most problems of attribute names. The lack of these standards is a great hindrance to integrating currently released digital geologic data sets.
Web-Access From a Browser
For a database to be useful to the public, it should have at least three provisions. First, a potential user of the database should require minimal applications software to interact with the data. This means that few, if any "plug-ins", for web browsers would be needed. The user should be able to query the database for basic information about a particular feature simply by pointing to it.
Second, the user should be able to view the data with a number of automatic features that could be disabled as needed. Two kinds of views that are of immediate interest are scale-dependent generalization of the current view of the database, and selection of a viewing area by arbitrary geospatial coordinates, political boundaries (e.g., by county), or ecological boundaries (e.g., by drainage basin). Custom views would be quite useful, especially those generated from digitized boundaries created either interactively on screen or offline and sent to the database interface from a standard file format.
Finally, a data user should have the capability to retrieve data clipped by the area they specified -- the data attributes would be stored in a standard data model and the data delivered in a format such as "shape" files. This step is key in completing a transaction with a data user.
A GEOLOGIC MAP DATA MODEL
A data model for a database consists of two parts that resemble the description of a language: a vocabulary which includes word lists and types of words in the list (i.e., a data dictionary), and a grammar (i.e., the set of relationships among the components of the data dictionary). A standard data model for geologic map data would then be an agreed-upon vocabulary and grammar that would place the map data into a form that would require essentially no translation to become useful to the user community. The data model needs to be robust, that is, it must have within it the capability to handle every possible type of geologic information, or better and more practical, it must allow extensions to the model that will in no way compromise the basic model.
The USGS, the AASG, and the Geological Survey of Canada have been working on a data model standard formally since 1996 (Raines and others, 1997). The North American Data Model Steering Committee (NADMSC) current data model results from this cooperation. The present version is 4.3 and is available for review and comment at http://geology.usgs.gov/dm/.
Uses of a Geologic Map Data Model
Currently, geologic map data occur in many forms, and the data content, organization, and file format differ in significant ways. A data model standard will aid the process of data exchange and integration, and analysis. The use of a data model standard for the attribution of geologic map features offers a number of advantages. They include:
- Map Creation will be more efficient if a core set of attribute data is collected for each map regardless of the intended use or purpose of the original map. Standard ways for representing spatial information need to be developed and used for all maps to smooth the progress of retrieval, integration, and analysis.
- Compilation of regional maps from detailed map data will be far less time-consuming if source data are contained in a standard data model, thereby organizing the map data for more efficient manipulation and analysis. Then, the compiler could concentrate on the geologic questions that arise from the change in map scale and the generalization.
- Map publication will be more efficient because those responsible for the publication process would receive data in only one format.
- Geologic map data could be exchanged easily among different organizations because the recipients of such data will know in advance the form of the data content. Spatial analysis of geologic data will be easier because the analyst need not be concerned about various incompatible data attribution and formats.
- With a sufficiently robust data model, generalization and reclassification of geologic data would be much simpler because the analyst will have no need to build data structures to perform these functions.
- Integration of disparate data sets from different disciplines requires a data model standard that is robust and easy to use.
Types of Data Models
The current NADMSC data model deals with data attribution only and places the spatial data into a few boxes in the model. This data model is designed as a relational database model, because the concepts and the terminology of relational database technology are well developed and well understood. Also, such a model is relatively easy to understand and communicate to others.
In contrast, object-oriented modeling is relatively new and holds great promise, but uses rather confusing terminology and suffers from few standard (agreed upon) concepts. Because it is new, object-oriented modeling requires a totally different way to view a digital map and is therefore difficult to accept as either being a valid way to model complex systems or to store data. The Unified Modeling Language (UML) recently has emerged as the apparent standard in which to express the object-oriented approach to analyzing and building new software and data systems. Since users of this technology have yet to agree on object-oriented database concepts and terminology, a hybrid system (an object-relational database design) has been proposed and has found great acceptance with database software systems. This technique allows object modeling to be done in an object-oriented manner and then the actual data to be stored in a relational database. In addition, this technology allows inheritance, encapsulation (with data hiding), polymorphism, and other object-oriented capabilities to be available with the stability of a relational database. See Muller (1999) for a good description of OO terminology as it now appears in most of the literature.
Object-Oriented Data Model for GIS
There are two fundamentally different ways to represent spatial objects in a Geographic Information System (GIS). The most common is a geometry-based system, in which one must choose the geometrical type (polygon, line or point) to represent the object and then attach attributes to the geometry (figure 1a). This kind of system is well known, well defined, and widely used, which gives the user of such a system confidence about the data stored. However, a persistent problem with the geometric-centered system is that users may begin thinking of the spatial objects contained in the system by their geometry types rather that the object they actually are. One may hear geologists referring to geologic map objects in terms of polygons, lines, or points instead of rock outcrops, faults, and strike-and-dip measurements.
Figure 1. a. Geometry-centered geospatial data system. b. Object-oriented geospatial data system.
In contrast, a better way to represent spatial objects is object-oriented (OO) modeling, which allows the user to think in terms of real world objects. Real-world objects such as cars, trees, or, grizzly bears, are described in terms of attributes necessary for the data structure to useful in a particular context. That is, the attribute list for an object in an OO system is not necessarily exhaustive. For an OO GIS, geometry is clearly necessary to represent the object on a map, but it is not the defining attribute for the object. So, on a geologic map stored in an OO GIS an object named "rock outcrop" may use any one of the three planar geometries mentioned above to represent the geometric attribute of an instance of that object.
Object-oriented data models are simpler and easier to build than geometry-centered models. The modeling process is done in terms of real world objects and many of the abstract concepts used in building relational models to support geometry-centered models become irrelevant. Other features are:
- OO models are less dependent on an initial data model for future applications; therefore, the data model can evolve. Generally, changes to the data model do not in most cases mean a total reload of the database.
- In OO models, pieces of program code called methods are "attached" to objects rather than existing in an external program. This makes OO systems more flexible for meeting application needs, and applications are simpler and faster to develop.
- Representations of geologic relationships that involve interactions among geometries and other non-geometric attributes can be built directly into an OO model in terms of methods that describe these relationships as well as attributes. This is possible because geometry is one of the attributes attached to an object in OO GIS systems (figure 1b). Some complex relationships that could be easier to implement in OO models are: the presence of 3-dimensional relations, age relationships modified by other attribute values, and interactions among geometries of a number of objects.
Storage of Geospatial Data
There are two methods used to store completed geospatial data sets. The more common approach stores geometry-centered GIS data as sheets or tiles. This method then uses external software to index and manage the multiple data tiles. With careful design, attribute data for the tiled data sets can be stored in just one database. Mapping by tiles or quadrangles is the traditional way to collect geologic map data principally, because it allows each map product to be linked to the geologist-author (thereby maintaining credit for the work) and because it gives the project organizer an easy-to-manage way of tracking progress. However, any object that is mapped on several maps or quadrangles will be split into as many pieces as there are tiles. For example, geologic data requested for drainage basins, for counties, or for national parks from data stored in a tile-based library must be assembled from the appropriate tiles whenever a user requests a data extraction (figure 2a). A time-consuming evaluation must be made to ensure that the reconstruction of the data has produced an uncorrupted data set.
Figure 2. a. Sheet or tile-based storage system. b. Object-oriented storage system.
However, geospatial data including geologic map data are more logically stored in a seamless database using an OO data model. Objects like fault blocks, moraines, and lava flows retain their real-world descriptions when viewed as connected objects, and requests for basic as well as derivative map data within complex boundaries based on objects are therefore easier to retrieve in an OO system (figure 2b).
A GEOLOGIC MAP DATABASE FOR THE GREATER YELLOWSTONE AREA
To test the feasibility of these ideas, the NGMDB performed a proof-of-concept experiment using a mature object-relational GIS software system (Smallworld software) and geologic data from the Greater Yellowstone Area, or GYA (figure 3). This experiment serves two purposes. First, the idea of such a database as pictured above could be tested; and second, the database could supply data for use with other GIS software for the purposes of edit and update, and analysis. A geologic database for the GYA is desirable for a number of reasons:
- Data are needed by the GYA community to investigate man's impact on the landscape, and geology is an essential element.
- Support of the GYA basic science goal to analyze the factors that influence the habitat and the interactions of the major mammal species.
- A comprehensive geologic database is vital to several interdisciplinary studies in the earth and biological sciences in the GYA. Locke, 1998 wrote: "The primary reason for most of the western national parks is geological (and yet)... geological research needs in the parks are almost entirely driven by the curiosity of outside scientists rather than by national needs... We ignore (geologic research there) at our peril." A comprehensive geologic map database is vital to interdisciplinary studies in the geosciences and ecology.
Figure 3. The Greater Yellowstone Area.
A geologic map database as described above has a number of possible uses including:
- Surficial and groundwater-flow analysis. The regional geologic setting of the GYA affects water volume and water quality outside Yellowstone National Park (YNP). The USGS Water Resources staff is studying these relationships as a part of the National Water Quality Assessment Program (NAWQA).
- Relations between vegetation abundance and diversity, and soil and rock properties. Clear correlations exist between vegetation types and bedrock geology in the GYA. For example, conifer tree types emerging after the 1988 fire in YNP show a definite preference by species to grow on soils from specific volcanic and non-volcanic rock types (Don DeSpain, pers. comm., 1999).
- Soil and rock properties and wildlife presence. One of the tree species that grows on andesite within YNP is white bark pine. Grizzly bears feed on the nuts of that tree for a month (usually September), which implies that if andesite supports white bark pine, grizzlies will be present on the andesite outcrops with white bark pine for that month.
- Analyze subsurface volcanic phenomena. In combination with complementary geophysical data sets, a clear understanding of past volcanic activity in the GYA recorded in the database might offer clues about future volcanic activity there.
- Analyze landslide hazards. In combination with DEMs, slope maps, and vegetation maps, geologic data would help to analyze landslide conditions in the GYA. Historically, landslides have been quite destructive in the GYA.
- Trace minerals in water, plants, and animals. The NGMDB project's work in the GYA has funded analyses of plant and animal samples that show great differences in natural trace element concentrations in various parts of the GYA.
Taken together, the above mentioned uses of geologic data make possible a better understanding of the natural setting of the GYA. For example, building roads and other access and support facilities in places where they would least interfere with the ecosystem and would not exacerbate local geologic hazards would minimize the impact of man on the wilderness.
Proof of Concept Database
The NGMDB project used a mature object-relational GIS system to make some preliminary tests to answer the following questions:
Figure 4. Three-tiered data delivery, with their Applications Programming Interfaces (APIs).
- Can the NADMSC (v.4.3) data model be implemented in such a system? In our test, it was implemented in a limited fashion, addressing only the attribute tables. Advantages of the OO system were, therefore, not exploited.
- Can such a database be seamless? Yes, it can. In addition, alternative versions or "alternatives" of the data can exist in the database while edits and updates for an area are being done. In fact, these "alternatives" can be used to store prior versions of the geologic interpretation of an area, for comparison with the current version.
- Will such a system allow for easy editing and updates? This functionality was not fully examined in this proof of concept. Editing in this GIS would have a learning curve not unlike ARCEDIT. If a system like this is used for the national 1:100,000-scale database, most editing for the near term will still be done in currently owned GIS software.
- Can data be extracted and delivered in well-known data formats using version 4.3 of the data model? Delivery in "shape" files with attribute data in "DBF" or other database formats has been done and was relatively easy to implement.
- Is the database web-accessible without custom software or plug-ins to commercial software? The OO map databases implemented in the GIS technology under study are accessible using basic browsers (Internet Explorer or Netscape) without plugins. This approach is in keeping with the OpenGIS Consortium's three-tiered approach to data distribution (figure 4). For more information on the OGC see http://www.opengis.org/.
The proof of concept is a contribution to the experimentation that is necessary for implementing the National Geologic Map Database. Data from the GYA were converted from ArcInfo coverages to "shape" and "DBF" files, and imported into the object-relational software with supplied code. No difficulties were encountered in the process. Preliminary work with third-party translation software shows that direct conversion from ArcInfo coverages to this system would be possible but more testing needs to be done. Use of web browsers as an interface to the online database has passed an initial test. Software zooms into an area, selection of geologic features by pointing with a mouse, and subsequent display of the attributes of the selected feature works well even over a phone line connection.
Work to Be Done
The geologic mapping community needs to find new ways to share digital geologic map data more efficiently with an audience that is broader than our traditional one. Standards are needed for data organization, geologic word lists, and geologic data file content as well as format. More comprehensive interaction with the online database should be designed, to provide for query and Internet delivery of user-selected data in a useable form. In other words, the geologic mapping community needs a standard data model implemented into a "useful" database. More study and discussion are necessary before building such a map database on a national scale.
In addition, the geologic mapping community needs to agree upon a definition of the term "geologic map." Varnes (1974) offered the following definition nearly three decades ago, well before the use geologic map data in digital applications was widely conceived. In particular, his warning about inappropriate uses of geologic map data reminds us of the inherent limits to what one may obtain from a digital geologic map database.
"A geologic map is a synthesis; it is not information in its most fundamental and versatile form. It is a generalization..., a geologist's interpretation of the geology for a particular purpose. Its lines, units, and descriptions may not be sufficiently defined for another synthesis intended for another purpose. If a geologic map does not contains the proper information... it logically cannot, and therefore should not, be interpreted for special purposes; if it does, it can. Facts cannot be generated by inference."
Chirico, P.G., and Epstein, J.B., 2000, Geographic information systems analysis of topographic change in Philadelphia, Pennsylvania, during the last century: U.S. Geological Survey Open-File Report 00-224, One CD-ROM, Map scale 1:24,000.
Locke, William, 1998, Another voice for science and interpretation: Yellowstone Science, v. 6, no. 3.
Muller, R.J., 1999, Database design for smarties: Academic Press 442p.
Raines, G.L., Brodaric, Boyan, and Johnson, B.R., 1997, Progress report -- digital geologic map data model, in D.R. Soller, ed., Proceedings of a workshop on digital mapping techniques: methods for geologic map data capture, management, and publication: U.S. Geological Survey Open-File Report 97-269, p. 43-46, http://ncgmp.usgs.gov/pubs/of97-269/raines.html.
Varnes, David J., 1974, The logic of geological maps, with reference to their interpretation and use for engineering purposes: U.S. Geological Survey Professional Paper 837, 48 p.
U.S.Department of the Interior, U.S. Geological Survey
Maintained by Dave Soller
Last updated 08.21.02