Digital Mapping Techniques '99 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 99-386
Using the Proposed U.S. National Digital Geologic Map Data Model as the Basis for a Web-Based
Geoscience Library Prototype
By Boyan Brodaric1, J.M. Journeay2, and S.K. Talwar2
1Geological Survey of Canada
615 Booth Street
Ottawa, Ontario K1A 0E9
Telephone: (613) 992-3562
Fax: (613) 995-9273
e-mail: brodaric@gsc.nrcan.gc.ca
2Geological Survey of Canada
101-605 Robson Street
Vancouver, B.C. V6B 5J3
ABSTRACT
We have developed an internet-based digital library prototype (CordLink) to seamlessly integrate digital maps, images and text on Canadian Cordilleran geology into a comprehensive information resource for the geoscientist and non-geoscientist. The library structure addresses several factors that make geological knowledge challenging to use in an internet environment: (1) geological data is multidisciplinary and diverse (e.g. bedrock mapping, geophysics, geochronology, mineral resources, etc.); (2) it is expressed in multiple forms such as map, text, and image; (3) it is extremely inter-related with no piece of knowledge captured in any one fragment or form; (4) it spans geographic space and geologic time; (5) it is expanding and evolving; and (6) users tend to peruse library holdings within a specific geographic, geologic, or level of expertise context. These factors require an approach to data management that, while it is distributed in nature, is more structured and context-sensitive than the simple linking of a myriad of web pages.
The CordLink holdings are held in a relational database system arranged according to an extended version of the proposed US National Digital Geologic Map Data Model design (http://ncgmp.usgs.gov/ngmdbproject/). The database is coupled to the Internet-based Autodesk MapGuide Geographic Information System (GIS). Users can enter the holdings from the Map, Document, Image, Education or Research perspectives and can then navigate the library according to subject while preserving their initial focus. Derivative thematic maps and reports can be constructed and generated by the user on-the-fly for superior browsing capability. Various library holdings can be downloaded and users can contribute elements to the library. A catalog of on-going research projects is provided as well as an on-line scientific discussion forum. It is anticipated that this wide-ranging, tightly inter-linked resource, delivered in a visual environment, will benefit researchers, students, instructors and the general public.
INTRODUCTION
The Internet possesses enormous potential to enhance the distribution of geological information, and to broaden the scope of its usage. The convenience of the Internet ensures that geological information will be accessed more often and more diversely than ever before, and it will no doubt attract more non-traditional visitors to geological information than, for example, did the libraries or sales offices of geological surveys. In this, however, it could serve to alienate as many people as it attracts, for, like many professions, geology is a field that is difficult to comprehend by the uninitiated. This is disturbing in light of the attention that geological agencies are now directing to outreach and societal benefit. Note, for instance, that the strategic directions of both the Canadian (GSC) and US (USGS) geological surveys emphasize this societal theme:
"Its (GSC's) goals are threefold:
- to create centres of excellence in Canada for scientific research and the advancement and dissemination of knowledge,
- to better apply our scientific knowledge in achieving Canada's economic and social goals, and
- to contribute to a better quality of life for Canadians." (GSC, 1996).
"The challenge for the USGS is to stay focused on a horizon of some ten years out, while realizing that there will be near-term shifts... Beyond these already compelling factors are the public's perception of its investment in science as a means of solving societal problems and society's concept of the 'public good' of science." (USGS, 1997).
Therefore, the challenge of the Internet extends beyond making geological information available, for that is readily done; it must instead make this knowledge more useful to geologists and more relevant to other parts of society.
It seems clear, at least to many geologists, that fundamental geological investigation can play an important role in understanding and managing the relationship between humans and the natural environment. This role, however, is not readily evident in traditional geological products such as maps or databases, but must be teased out of them by geological experts. This effectively undermines the broader use of the information and diminishes its value to the non-geologist. Combining modern GIS tools with the Internet yields technology capable of overcoming this situation; however, how to structure, present and manipulate geological knowledge, in its fullest sense, to other parts of society, is not clear.
What does seem clear is this: if the Internet is to serve as the gateway to the public, it is incumbent on geological data providers to ensure that geological knowledge is represented to its fullest degree, and that it is accessible and usable by professional geologists and casual visitors. This requires two major interacting components, one representational and the other functional: firstly, a data model for geological knowledge is required to meaningfully structure geological information, and secondly, a suite of operations tuned to both geologists and non-geologists is necessary to leverage such a structure. The remainder of this paper describes these two elements. The requirements for a comprehensive geologic site are discussed first, followed by the data representation methods required to make it work. Finally, an Internet site that endeavors to meet these criteria, by serving the geology of Canadian Cordillera, is described.
REQUIREMENTS
Data, Information and Knowledge
What we know is often categorized as data, information and knowledge. In a geological sense we can see this as an increasing gradient in our understanding of the interaction of earth processes and materials: data are measurements, facts and observations; information is data in context, such as the interpretations on a geologic map; and knowledge represents the complete understanding that we possess in our minds of some thing, such as a geological history of an area. Thus the computer representation of real knowledge is very difficult, and arguably impossible, as we can't capture human thought (at least not yet). The most we can do is mimic human thought via knowledge representation strategies and reasoning mechanisms that create the impression of real thinking -- some would argue that this indeed constitutes machine intelligence, if we can't distinguish between machine and human behavior (Turing, 1950). In terms of geological computing, this implies two things: firstly, that we must maximize our knowledge representation efforts, and secondly, that we must provide superior functionality so that the geological content that is represented is used intelligibly and will appear to be, in fact, knowledge.
Knowledge Representation
Knowledge representation for computing purposes requires the objects and their relations in the real world to be translated into objects and relations in the computing world (Luger and Stubblefield, 1998, p. 293). In effect this states that knowledge can be approximated in a computer by carefully cataloging and indexing information (and then using it). We must therefore carefully identify the geologic objects to be represented and maximize the relationships between them. This is analogous to placing the information in as many contexts as possible, and is particularly relevant to geological mapping where information can be viewed quite differently according to a geologist's expertise, education or bias. In non-digital environments the expression of a viewpoint is usually a multi-media affair, often requiring maps, reports, charts, diagrams and oral presentations to convey a message. How can computing mechanisms cope with such diversity?
At a gross level, for computer representation purposes, we can also attempt to categorize the objects of geological knowledge to be maps, documents, images, projects, and references. Though it is worthwhile to view a map alongside its accompanying reports, diagrams, bibliography, and general project description, it is much more useful to actually interconnect the sub-components of these elements to permit investigation at a more detailed scale. For instance, selecting a polygon on a map and viewing its unit description, embedded in reports or articles, as well as displaying related images such as age correlation charts or cross-sections, or obtaining references specific to that map unit, would be much more beneficial. Narrowing our scale of focus to field observations would be of even greater benefit, though the task would be much more difficult as the relationships between field observations and their interpretations is multitudinous and complex. This suggests the basic resolution for the represented objects could be quite fine, and their relationships abundant and intricate. Designers of geological software must select an appropriate scale of representation based on their purposes, and on the quantity and quality of the information content. A medium scale of object resolution could include:
- Map: legend, map unit, occurrence.
- Geological Description: lithostratigraphic age, geochronologic age, lithology, and others.
- Document: volume, chapter, subsection, figure.
- Image: caption and image.
- Project: originating project details.
- Reference: citation and source details.
The ideal system would permit relationships between any of these elements to be represented. For instance, one might relate a lithostratigraphic or geochronologic age to a body of text, to a figure, to a map unit or to a map. Likewise a map unit might be associated with one or more geological descriptions, document segments, figures, projects or references. The ability of establishing such relationships elevates the context of any single information fragment and thus increases its knowledge content. Furthermore, this design is scaleable as geological descriptions could be expanded to contain the more complex field observations.
Knowledge Usage
A fitting model for how geologic objects are to be used, rather than what they are, is the library: libraries are valued for their archival role as well as for their ability to stimulate new thought. Library contents possess the potential to foster new insight and thus to regenerate themselves with new contributions. This implies that information requires discourse to achieve the rank of knowledge. Applying this to digital environments suggests digital libraries must not only represent original author intention but must promote scientific discourse by enabling the representation of different viewpoints. They should be inclusive of diversely formatted and conceived documents, and also encourage various methods of discussion. Today these ideas are perhaps best exemplified by the Internet, its interconnection of computers known as the world wide web, and its loose arrangement of information fragments. Many researchers suggest that this arrangement is not optimal (e.g. Schatz, 1995), and contend that the web requires mechanisms to add context to its information fragments, which essentially means increasing the resolution of its objects and relations. Why is this so ?
Implementation
The world wide web is a form of knowledge management system at the document level. It primarily maintains links between documents (consisting of images and text), and permits searching for text strings within the documents. A first order approach to placing geological information on the web would be reduce the geological information to a set of text and image documents that are inter-linked, and simply use the web's navigation capabilities to traverse them. There are several problems with this approach:
- Spatial representation: Maps are treated as diagrams rather than spatial constructs. Smart graphic formats (e.g. CGM) will allow individual elements to retain linkages to other document components, but how adjoining maps fit together, for instance, is beyond the basic web.
- Spatial operation: searching for information inside a map unit or close to a fault is impossible.
- Contextual searching is impossible: searching for the inclusion or exclusion of words within a document is the limit; applying contextual parameters is impossible: e.g. searching for map units of a certain age will return any geological or non-geological document that contains the specified text.
- There is no provision for the management of information. For instance, no referential integrity is implemented -- this means, for example, that the spelling of terminology could not be standardized, which causes searches to return incomplete results.
Thus, we can view the web as a very valuable document server, but not as an analytic tool. Missing is the geological context that is crucial to its effective use. Industry and academia are spending significant resources to address these problems, through various digital library initiatives (see National Coordination Office for Computing, Information and Communications, 1998). Although considerable progress has been made, the contextual web is not yet a reality today. This suggests that practical applications must, in the short term, implement traditional technologies to manage the objects and relations of interest to us, and this implies that database systems and GIS must be harnessed to the Internet to serve as our knowledge engines. Of the two technologies, only the database system permits conceptual extensions, as GIS are fixed in one spatial model or another. For example, vector-based GIS typically offer points, lines and polygons, and the topological relations between, as their spatial primitives. The user-defined meaning of these spatial objects and their relations is typically contained within a database.
From this it is apparent that it is increasingly important to develop and adopt a database design that contains relevant geological objects and encourages effective usage of them. The medium scale objects noted above (maps, descriptions, documents, images, projects and references) provide a good starting point for a database design that is knowledge-oriented. How well do our current data modeling efforts meet these criteria?
GEOLOGICAL MAP DATA MODEL
The geological map has evolved into a highly concise articulation of the geological history of a specific geographic region. Though its graphic nature makes it eminently suitable for web-based knowledge delivery, its complexity and diversity challenge simplistic conceptions of what a digital geologic map is, and how it functions. The U.S. National Digital Geologic Map Data Model effort addresses the first of these issues, by defining the components of a geologic map. The prototype data model 4.3 (http://ncgmp.usgs.gov/ngmdbproject/) is designed to represent the most common elements of geologic map data, and is thus well suited to representing Map, Geological Description and Reference objects and certain relations amongst them:
- Reference sources, including maps, as well as legends, legend items (e.g. rock units) and their relations, are defined thematically and cartographically. This permits geological map information to be organized coherently, in databases, while preserving their map origins.
- Geological descriptions consisting of lithologies and ages (geochronological and lithostratigraphic) are associated with legend items or individual map occurrences (e.g. with one rock unit polygon, or one thrust fault line).
- Geologic vocabularies for rock types, time scales and structural features can be defined, extended and standardized. The use of hierarchies enables the development of terminological standards that require consensus at higher levels but that may be modified according to user need at lower levels.
- Geologic information is separated from spatial representation, permitting a geological feature to have different spatial representations (point, line, polygon, volume, etc.) by existing on different maps, at various scales, or because of diverse interpretations.
However, in keeping with the stated intention of storing knowledge by maximizing geological context through enhanced representation (objects and relations) and function (digital library operations), several aspects of this design require modification:
- Objects of representation -- missing are document, image, project and dataset objects:
- Document: a repository for bodies of text com prising reports, journal articles, theses, etc.
- Image: a repository of images including georefer enced remote-sensed maps (e.g. geophysics), as well as diagrams from a map's surround, or fig ures from a body of text.
- Project: a record of recent, current and future geo logical activities, their purposes, goals and loca tions.
- Dataset: a description of the physical layers com prising a map including supplemental information about a dataset such as its name, location, layer type (e.g. line, point, polygon).
- Relations between objects:
- Description independence: this requires establish ing a description archive where geological descriptions, such age ranges, rock compositions, and other description types (e.g., images, text, projects, mines, wells, boreholes, etc), exist as independent entities that may optionally be relat ed to one or more rock units or to other descrip tions. In v4.3 this is not the case, as age ranges and rock compositions must be related only to a specific rock unit.
- Description relations: denotes the complex rela tionships between geological descriptions. It would, in effect, permit any description to be related to any other description. This is of specif ic importance to medium scale geologic knowl edge representation as it permits, for example, text fragments to be associated with ages, images, projects, field sites, measurements, etc., thereby cross-indexing the geological content according to different perspectives.
- Subject indexing: a generic, hierarchical, subject listing that can be used to index digital library holdings and thus provide novice as well as expert search capability.
Modifications 2a and 2b have been discussed within the former U.S. national geologic map data model working group (http://ncgmp.usgs.gov/ngmdbproject/) and have been adopted by the CordLink library. The remaining modifications result from implementing a digital library approach to geologic information within the Canadian Cordillera (Brodaric and others, 1999).
A DIGITAL LIBRARY IMPLEMENTATION FOR THE CANADIAN CORDILLERA
A large database of geologic maps for the Canadian Cordillera was constructed according to a data model design (Figure 1) modified from the proposed U.S. national v4.3 prototype. The database represents a digital library that contains five main geologic map series ranging in scale from 1:2,000,000 to 1:32,000,000 and covering a large geographic area including all of British Columbia, the Yukon, and parts of Alberta. These maps contain in excess of 210,000 individual map features, as well as about 3500 legend items, mostly rock units. Each rock unit is described according to absolute age range (e.g. 300-400 million years), geologic age range (Devonian-Siliurian), rock type composition (shale, siltstone, etc.), and is also linked to a cartographic symbol within a legend. The Cordilleran volume from the definitive Decade of Geology of North America (DNAG) (Gabrielse and Yorath, 1992) was converted to digital format and compartmentalized, including more than 800 pages of text and 500 images. Text components were indexed according to their geologic age and subject, and to relevant figures and images, and to related rock. Users are thus able to retrieve authoritative descriptions, including text and images, when viewing any feature instance on a map; users are also able to enter the archive from the text and image perspectives, with all interrelationships maintained. The description and location of research projects can be viewed on-line, and users can enter their own research project information. A discussion room is being provided to encourage discourse on geoscience topics of current interest. Some map data may be downloaded and an education component maintains links to education sites within the geosciences.
Figure 1. A diagram depicting the enhanced digital geologic data model utilized by the CordLink project
The library is constructed to operate in an Internet environment, by utilizing the Autodesk MapGuide software (http://www.mapguide.com/) to display and contain its spatial aspects; the remaining non-spatial aspects of the database are hosted within the MS SQL Server relational database environment. These two components were integrated into an attractive and functional user interface through extensive JAVA programming, performed by an external contractor.
The library can be viewed using MS Internet Explorer v4.0 or Netscape 4.5 once the appropriate plug-in is installed. Although the web site's definitive address is still unknown, the CordLink digital library is accessible via http://www.rgsc.nrcan.gc.ca/ (follow the CordLink link). It is anticipated the site will enhance geological research and geological education through the digital interconnection of the various map, text and image components.
The web site is arranged according to 8 main modules: Maps, Documents, Images, Research, References, Data Download, Education, Search.
Home
The Home page (Figure 2) provides an entry point to the main modules, as well as an overview of the site including information about its intent, content, system requirements, its development team and other partners.
Figure 2. The Home page for CordLink -- a digital library prototype for the Canadian Cordillera.
Maps
The Maps module allows maps and ancillary data to be viewed and queried at various scales. The geologic map and ancillary data available is tailored to the viewing scale: regional views will depict general maps and data, and as the user zooms into a detailed area more relevant, detailed information is provided. Users are able to peruse the holdings according to a subject index or geospatially, through a scale-sensitive legend. Table 1 provides a partial, representative, listing of the available maps and datasets.
Table 1. A partial, representative, listing of geoscience data found in the CordLink digital library.
|
Author
|
Date
|
Title
|
H.Gabrielse and C.J. Yorath
|
12/31/91
|
The Geology of the Cordilleran Orogen in Canada
|
Wheeler J.O., Hoffman, Card, K., Davidson, T., Sanford, Okulitch, and Roest
|
3/31/97
|
Geological Map of Canada - 5M
|
Kirkham, R.V., Chorlton L.B. and Carriere J.J.
|
1/30/95
|
Generalized Geology of the World - 32M
|
Fulton, R.J.
|
1/30/97
|
Surficial Materials of Canada - 5M
|
Journeay J.M.and Williams S.P.
|
3/30/95
|
GIS Map Library: A Window on Cordilleran Geology - Tectonic Assemblages 2M
|
Journeay J.M. and Williams S.P.
|
3/30/95
|
GIS Map Library: A Window on Cordilleran Geology - Terranes 2M
|
Journeay J.M. and Monger, J.W.H.
B.C. Geological Survey
|
3/30/98
|
Coast Belt Geoscience Library - 250K Map holdings, Mineral Inventory, Assessment Reports
|
B.C. Ministry of Environment
|
12/31/94
|
TRIM topographic base data
|
Various
|
12/31/94
|
topographic data
|
Various
|
4/1/99
|
Shaded Relief - DEM
|
Various
|
4/1/99
|
Satellite Image
|
Various
|
4/1/99
|
Bouger Gravity
|
Various
|
4/1/99
|
Aeromagnetic
|
Various
|
4/1/99
|
Earthquake epicentres
|
C.J. Hickson and B. Edwards
|
|
Volcanoes
|
G.J. Woodsworth
|
|
Geothermal resources
|
The Maps module also enables map features to be thematically filtered and inspected according to their legend description (e.g. rock unit, age or lithology) and related document fragments and images. For example figure 3a depicts a portion of the 1:2,000,000 tectonic assemblage map of the Canadian Cordillera (Journeay and Williams, 1995) where a set of map features containing Basalt, at least in part, were identified (outlined with dashed lines). Figure 3b illustrates how a specific polygon in this set is described in terms of rock unit (i.e. Cache Creek), DNAG (Gabrielse and Yorath, 1992) text and image.
Figure 3a. The Map module displaying 1:2M tectonic units comprised, at least in part, of Basalt;
the pointer has selected one specific polygon for further interrogation.
Figure 3b. The library contents of the polygon selected in Figure 3a are displayed as database table, DNAG text and figure.
Documents, Images and Research
The Documents, Images and Research modules are analogous to the Maps module: their contents are indexed according to subject, and provision is made for selecting and viewing module contents according to related map, image, text or project information. Of course, the primary viewing window is focused on the module's media type: a map is displayed in the Maps module, text is displayed in the Documents module, an image is displayed in the Images module, and a project location map is displayed in the Research module. Smaller windows then present the remaining media type information. The Images and Research modules additionally permit users to insert their own contributions to the library via an on-line interface, whose content is reviewed by a digital librarian. Currently, image contributions must be transferred to the Cordlink site, though it is expected they could remain remotely located in the future. The research module also maintains an on-line discussion platform that encourages scientific discourse.
References, Data Download, Education, and Search
The References, Data Download, Education and Search modules provide several important functions:
- References: provides access to references within the CordLink site, and links to other Canadian geoscience reference libraries.
- Data Download: currently permits a select subset of the available maps to be copied by the user to their local computer, in a variety of common geospatial formats.
- Education: provides access to on-line education resources related to Canadian goescience.
- Search: a generic mechanism to locate document level information (i.e. metadata) within the library. Document level information refers to title, author, subject, etc., descriptions of maps, reports, images and other documents; it does not refer to their internal contents.
FUTURE DIRECTIONS
The CordLink project is one component of the ResSources GSC digital geoscience initiative within the Geological Survey of Canada (http://www.rgsc.nrcan.gc.ca/). The goal of this initiative is to link significant GSC data holdings across the Internet, in order to facilitate their dissemination, integration and general interoperability. The ReSSources GSC initiative is itself part of a larger Canadian geoscience program, the Canadian Geoscience Knowledge Network (CGKN), which has similar goals and includes all the major Canadian federal, provincial and other geoscience data providers. CGKN is in turn a part of the Canadian Geospatial Data Infrastructure (CGDI), a national effort to make all geospatial data accessible and useable on the Internet. As CordLink is contributing significant regional Cordilleran geology holdings to these efforts, it is expected that the site will continue to evolve. In particular, it is expected that its current prototype status will be upgraded to a fully operational site in the next year. Longer term concerns include:
- expanding the distributed nature of the library, and
- enhancing its interoperability with other libraries,
without sacrificing the gains made in representing and using greater geological context. Advances in digital library research will resolve some of these issues and the CordLink library should be well positioned to adapt the solutions to its holdings and infrastructure.
CONCLUSIONS
A digital library approach to representing and delivering geological knowledge was implemented for Canadian Cordilleran geology. The digital library utilized the proposed U.S. national geological data model to augment the geological context of the data holdings in terms of database representation and end-user functionality. The proposed U.S. national digital geologic map data model was expanded to accommodate more geological data types and richer relations between the new, and previously defined, data model components. A prototype Internet-based interface was developed to exploit the geological data types and their rich relations. The results have been very satisfactory, in terms of proof of concept, and in terms of prototype implementation, and we anticipate developing the site further. It is hoped the digital library approach will promote the use of geological information, and knowledge, by using technology to enhance scientific investigation and thus to better meet the broader societal demands facing us today.
REFERENCES
Brodaric, B, Journeay M, Talwar, S., and Boisvert, E, 1999, CordLink Digital Library Geologic Map Data Model Version 5.2, (http://www.rgsc.nrcan.gc.ca/: CordLink I, About CordLink), 29 p.
Gabrielse, H., and Yorath, C.J., eds., 1992, Geology of the Cordilleran Orogen in Canada, vol. G-2, Decade of North American Geology: Canada Communications Group, Ottawa, Canada, 844 p.
Geological Survey of Canada, 1992, A Summary of the Geological Survey of Canada's Strategic Plan for Geoscience 1996-2001, http://www.NRCan.gc.ca:80/gsc/strplan_e.html.
Journeay J.M. and Williams S.P., 1995, GIS Map Library: A Window on Cordilleran Geology-Tectonic Assemblages: Geological Survey of Canada Open File 2948, Ottawa, ON.
Luger, G.F., and Stubblefield, W.A., 1998, Artificial Intelligence; Structures and Strategies for Complex Problem Solving: Addison-Wesley, Reading, MA, 824 p.
National Coordination Office for Computing, Information and Communications, 1998, Technologies for the 21st Century; The Digital Libraries Initiative, http://www.ccic.gov/pubs/blue98/dig_libraries.html.
Schatz, B.R., 1995, Information Analysis in the Net: The Interspace of the Twenty- First Century. White Paper for America in the Age of Information: A Forum on Federal Information and Communications R & D, July 6-7, National Library of Medicine, http://www.canis.uiuc.edu/papers/america21.html.
Turing, A., 1950, Computing machinery and intelligence: Mind, 59, 433-460.
U.S. Geological Survey, 1997, USGS Strategic Plan 1997-2005, http://www.usgs.gov/strategic/.