USGS logo

Digital Mapping Techniques '98 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 98-487

Formal Metadata in the National Geologic Map Database

By Peter N. Schweitzer

U.S. Geological Survey
906 National Center
Reston, VA 20192
Telephone: (703) 648-6533
Fax: (703) 648-6560
e-mail: pschweitzer@usgs.gov

Throughout the 1980s and early 1990s, the improving capability of desktop computers to carry out complex analyses has increased the popularity of geographic information systems (GIS). As they became familiar with GIS technology, people at all levels of government, in industry, and in academia have been calling for better access to publicly available geospatial information and more general use of standard terms of reference and of standard formats for the exchange of geospatial data and information. Answering this need is the goal of the National Spatial Data Infrastructure (NSDI), a government-wide coordination effort initiated at the Federal level through Executive Order 12906, which was signed by President Clinton in April of 1994.

A key component of NSDI is the development of a National Geospatial Data Clearinghouse, a general source of information about geospatial data that are available to the public. With the Clearinghouse a user can determine whether geospatial data on a region of interest exist and are appropriate for solving the problem at hand. The Clearinghouse is a distributed network of internet sites providing metadata (information about geospatial data) to users in the same ways. Its success depends on the overall consistency of the metadata that are made available, because users are expected to evaluate metadata from numerous sources in order to determine which data meet their needs.

To promote consistency in metadata, the Federal Geographic Data Committee (FGDC), an interagency council charged with coordinating the Federal implementation of NSDI, has produced the Content Standards for Digital Geospatial Metadata (CSDGM). That document provides standard terms describing elements common to most geospatial data, and encourages people who document geospatial data sets to use these terms. The CSDGM not only describes the terms of reference but also specifies the relationships among those terms. The relationships, many of which are hierarchical, are complex and a formal syntax is provided to specify them.

Because the syntax of the standard is complex and the number of descriptive elements is fairly large (335), creating metadata that conform to the standard is not an easy task. In addition to the problem of assembling the information needed to properly describe the subject data sets, data producers must arrange that information using the terms given in the standard and arrange the terms using the syntactical rules given in the standard. The resulting metadata are formally structured and use standard terms of reference, hence the term "formal metadata" in the title of this report.

The chief advantages of formal metadata are (1) the ability of computer software to process the information meaningfully and (2) the ability of users to locate and recognize within a record the topical components of the information. For these purposes it is important to be able to say with confidence that metadata conform to the structure of the standard. Human review is still required--no software can determine whether metadata are accurate--but human review of the content is easier to do if the syntactical structure is predictable and in accord with the standard.

Our Nation's digital geologic map data form a fundamental part of the its geoscience data infrastructure; making these data more widely known and used is clearly a worthwhile national goal for both the national and state geological surveys. Recognizing the importance of consistent metadata for digital geologic map data, the National Geologic Map Database (NGMDB), a joint project of the USGS and Association of American State Geologists (AASG), formed a Metadata working group to study the implementation of metadata for digital geologic maps. Members of the working group are Peter Schweitzer (USGS, chair), Dan Nelson (Illinois), Greg Herman (New Jersey), Kate Barrett (Wisconsin), and Ron Wahl (USGS). The working group was asked to: (1) look at the Content Standards for Digital Geospatial Metadata for adequacy; (2) examine implementing metadata in a standard format for geologic maps; (3) establish guidelines as to what the metadata elements mean to a geologist; (4) determine a process for facilitating input from state geological surveys not represented at this (1995) meeting; and (5) format a specific set of fields that must be filled out for the NGMDB map catalog.

The working group's report is online at http://ncgmp.usgs.gov/ngmdbproject/standards/metadata/metaWG.html. Briefly, the working group found: (1) The CSDGM works with a highly diverse range of thematic data; geologic maps fit naturally into this range. Additional metadata elements may be helpful, especially for geologic ages. (2) Technology, training, and work-flow strategies have been developed through discussions within the larger geospatial data community; these apply as well to geologic maps as to any other form of geospatial data. (3) Meaning of metadata to a geologists rarely differs from meaning to anyone else. Terminology used in the standard is not in every case the same as research geologists use, but the concepts apply directly. (4) The geologic mapping community is invited and encouraged to collaborate, communicate, and participate with other geospatial data producers in the NSDI. This Nation needs the wisdom of the geologic mapping community at least as much as it needs that of other scientific and technical disciplines. (5) The catalog schema, as already defined by the NGMDB, is acceptable. Records of the National Geologic Map Catalog are a brief subset of metadata because the emphasis of the Catalog is on all published maps, most of which are printed and not available in digital form. For digital map products metadata must be more detailed because these products are more ready to be used in digital spatial analysis.

The National Spatial Data Clearinghouse has come a long way since its inception in January of 1995. At that time the Clearinghouse consisted of a disparate set of web sites, not searchable by a single protocol or through a single gateway, providing metadata that varied substantially in structure, format, quality, and appearance. Since then, the study of the community, aided in no small way by the financial support and coordination of the FGDC, has developed software tools, training materials, and a more comprehensive understanding of the work-flow issues involved. As a result, the Clearinghouse is now a centrally searchable source of mostly high-quality metadata consistent in structure and format. Much work needs to be done to enhance the usability of the Clearinghouse, but it is now evident that investments made in creating formal metadata are beneficial now and will retain their value well into the future.

Organizations contemplating the task of producing metadata should be aware that many of the questions they ponder have been considered by other organizations, both similar and different from them. It is not a painless process by anyone's measure, and much information and informed opinion is available on the internet. From a business perspective, it makes sense to devote time and energy where the value gained is greatest. The value of metadata depends on: (1) the value of the data to the producers (cost to make and support the product, as well as the benefits, if any, gained by other organizations' use of it); (2) the transience of the workforce, meaning the potential cost to the producing organization if the people who understand how and why the data were produced leave; (3) the goals of the organization overall and its purpose in making the data available; and (4) the quality of the metadata themselves.

METADATA IN PLAIN LANGUAGE

One of the difficulties that hinders implementation of metadata among those who are new to the process is the technical jargon within the CSDGM. The jargon tends to focus the attention of both metadata producers and reviewers on details. Details matter, of course, but it is crucial to ensure that the metadata answer satisfactorily and clearly the broadest questions about a data set that one might have.

With this perspective in mind, I have rephrased most of the CSDGM as a series of plain-language questions arranged in a hierarchy. My intent is to provide managers, novice metadata producers, and metadata reviewers with a general framework within which they can judge fairly the information requested or provided by a metadata record. The hierarchy extends, at the finest level of detail to the element names and structure by which the answers are encoded in a record. That level of detail is not presented here; it is best provided in a hypertext medium. The presentation here is an attempt to specify the information contained in a metadata record in a manner independent of the precise form in which the information will be stored. In the hypertext version these questions lead to specific instructions for encoding the answers in a metadata record. The hypertext version is online at http://geology.usgs.gov/tools/metadata/.

  1. What does the data set describe?
    1. What is the title of the data set?
    2. What geographic area does the data set cover?
    3. Does the data set describe conditions during a particular time period?
    4. Is this a digital map or remote-sensing image, or something different like tabular data?
    5. How does the data set represent geographic features?
      1. How are geographic features stored in the data set?
      2. What coordinate system is used to represent geographic features?
    6. How does the data set describe geographic features?
      1. What are the types of features present?
      2. For each feature, what attributes of these features are described?
      3. What sort of values does each attribute hold?
      4. For measured attributes, what are the units of measure, resolution of the measurements, frequency of the measurements in time, and estimated accuracy of the measurements?

  2. Who produced the data set?
    1. Who created the data set?
      1. Formal authors of the published work
      2. Compilers and editors who converted the work to digital form
      3. Technical specialists who did some of the processing but aren't listed as formal authors
      4. Cooperators, collaborators, funding agencies, and other contributors who deserve mention
    2. To whom should users address questions?

  3. Why was the data set created?
    1. What were the objectives of the research that resulted in this data set?
    2. What objectives are served by presenting the data in digital form?
    3. How do you recommend that the data be used?
    4. Are you concerned that nonspecialists might misinterpret the data? If so, of what aspects of the data set should they be especially wary?

  4. How was the data set created?
    1. Where did the data come from?
      1. Are the source data original observations made by the authors and their cooperators?
      2. Were parts of the data previously packaged in a publication or distributed informally?
        1. Were the source data published?
        2. Were the source data compiled at a particular scale?
        3. What time period do the source data represent?
        4. What information was obtained from each data source?
    2. How were the source data modified?
      1. How were the data collected, handled, or processed?
      2. For this activity did you use data from some other source?
      3. Did this activity generate an intermediate data product that stands on its own?
      4. When did this processing occur?
      5. Did someone other than the formal authors do the data processing?

  5. How reliable are the data; what problems remain in the data set?
    1. What can you say about the accuracy of the observations?
    2. How accurately are the geographic locations known?
    3. If data vary in depth or height, how accurately is vertical position known?
    4. Where are the gaps in the data? What is missing there?
    5. Do the observations mean the same thing throughout the data set?

  6. How can someone get a copy of the data set?
    1. Are there legal restrictions on access or use of the data?
    2. Who distributes the data?
    3. What is the distributor's name or number for this data set?
    4. As a distributor, what legal disclaimers do you want users to read?
    5. How can people download or order the data?
      1. In what formats are the data available?
      2. Can users download the data from the network?
      3. Can users get the data on disk or tape?
      4. Is there a fee to get the data?
      5. How long will it take to get the data?
    6. What hardware or software do people need in order to use the data set?
    7. Will these data be available for only a limited time?

  7. Who wrote the metadata?

For further information, please consult the information resources available at the web site of the National Geologic Map Database Project, http://ncgmp.usgs.gov/ngmdbproject/.

Home | Contents | Next

U.S.Department of the Interior, U.S. Geological Survey
<https://pubs.usgs.gov/openfile/of98-487/schweitz.html>
Maintained by Dave Soller
Last updated 10.06.98