USGS

Digital Mapping Techniques '99 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 99-386

Plain-Language Resources for Metadata Creators and Reviewers

By Peter N. Schweitzer

U.S. Geological Survey
918 National Center
Reston, VA 20192
Telephone: (703) 648-6533
Fax: (703) 648-6560
e-mail: pschweitzer@usgs.gov

Mention metadata to people and you'll likely see them cringe. They cringe because metadata lies at the interface between people who create spatial data and the people who would use those data, and it is hard to communicate to a prospective data user all of the wisdom needed to make proper and effective use of the data. Metadata focuses our attention on difficult educational, cultural, and technological issues inexorably linked with the common desire of scientists to see their work valued by other people.

Figure 1 shows how metadata are transferred from data producers to data users. In small, active research groups it is possible to communicate using mostly technical terminology, both because the data producers and data users share a common understanding of the terminology and the circumstances under which the data were produced and because the data users are able to return to the data producers frequently to ask questions. The situation changes drastically when data are made available through the internet to people who are not well known by the data producers. Data users may have different backgrounds and needs, and there are many sources of data available. Consequently data users need metadata that are readily searchable and can be presented in a variety of different formats.

Figure 1 Figure 1. Diagram depicting the flow of metadata from data producers to data users. This information transfer cannot occur using plain language alone because unstructured information cannot be reliably indexed by topic, place, time, and other characteristics, nor can it be reliably reexpressed in a variety of formats suiting the needs of diverse users. Consequently the metadata must be expressed using some technical terminology in a standardized structure and parseable format; these allow computer software to index it appropriately and reexpress the metadata in plain language as well as technical terminology.

Metadata are readily searchable only if they conform to a standardized structure; they can be presented in a variety of different formats only if they are parseable by computer software. These needs require the metadata to be described using technical terminology that is standardized across many scientific and technical disciplines. But that technical terminology is often unfamiliar to specialists in any one discipline. Hence metadata are not only hard for producers to write, they are also hard for reviewers and end-users to read. Plain-language approaches provide solutions directed at easing specific troubles encountered in writing and reading metadata.

PLAIN-LANGUAGE APPROACHES TO CREATING, REVIEWING, AND READING METADATA

1. Understanding Metadata and Learning How to Write it

Metadata in Plain Language: A guide for metadata creators and reviewers:

http://geology.usgs.gov/tools/metadata/tools/doc/ctc/

The goals of these pages are to put the metadata standard's many elements into perspective and to show how to create a metadata record using a heuristic procedure. They present the FGDC metadata standard as a series of plain-language questions; for each question they show which elements of the metadata standard should hold the answers, and how to express the answers correctly.

2. Expressing Metadata as Answers to Standard Questions

The USGS metadata parser mp reads, checks, and re-expresses metadata:

http://geology.usgs.gov/tools/metadata/tools/doc/mp.html

MP checks the structure of metadata against the FGDC standard, indicating where and how the metadata record doesn't conform. MP then re-expresses the metadata in a variety of formats that are likely to be useful in different ways. MP can create SGML, XML, parseable text (its input formats), as well as HTML and DIF, a form used by the NASA Global Change Master Directory. Of particular interest is the new FAQ-style HTML, which presents the metadata as answers to the same plain-language questions given in Metadata in Plain Language. The abbreviation FAQ as commonly found on the internet refers to frequently-asked questions; since metadata are produced before a data set is used extensively by the public, FAQ here refers to frequently-anticipated questions. This form is likely to be more readable by people who are unfamiliar with the FGDC standard.

How to generate FAQ-style HTML using mp:

The simplest method is to invoke mp with the -f option:

mp info.met -f info.html

This will place into the file info.html the FAQ-style HTML output.

An alternative method is to specify the form of the file name in a config file. Under output:html the name of the FAQ-style HTML output is specified using the element faq as follows:

output:
  html:
    file: %s.html
    faq: %s.faq.html

Here the file element contains a template showing how mp should compose the name of the outline-style HTML file; in the template, the string %s is replaced with the name of the input file (with its extension removed). Likewise, the faq element contains a template showing how mp should compose the name of the FAQ-style HTML file.

Note that the Clearinghouse server software currently in use (May-1999) assumes that if the SGML metadata document selected is info.sgml (for example) then the HTML document to be returned to the user is info.html. That strategy will normally return the outline form of HTML to the user. If you wish to return the FAQ-style HTML first, then you should change the values shown above like this:

output:
  html:
    file: %s.out.html
    faq: %s.html
    

Using these config file elements, the FAQ-style output will be the one returned by the server through a Z39.50 PRESENT request.

NOTE: mp now provides in its HTML output a link to each of the other output formats that you requested when running mp. These links are relative to the current directory by default, and will work correctly when someone retrieves a metadata record directly through a web server. However, HTML metadata records retrieved through the Clearinghouse gateway interface come tagged with the URL of the gateway, consequently these links will not work by default with HTML records found through the gateway interface. To make these links work without regard to the retrieval method, place a BASE tag into the HEAD element of the output HTML code. As you might guess, mp can do this for you, but it needs to know the URL where your metadata will be available as web pages. It gets this information from a config file entry as follows:

output:
  html:
    base: URL
    
So if your web site has a URL like

<http://www.our-data.org/metadata/>
    
that will contain your metadata records, put this into your config file:

output:
  html:
    base: http://www.our-data.org/metadata/
    

Obviously you have to use the -c config_file command line option for mp, substituting for config_file the name of the actual config file you'll be using.

How mp writes FGDC metadata elements in plain language:

<http://geology.usgs.gov/tools/metadata/tools/doc/plain.faq.html>
    

This is a specially constructed record showing how mp composes plain-language output from FGDC elements. Generated entirely by mp, it shows where the elements of the metadata standard appear in the output as answers to questions. This file should not be regarded as authoritative, since there are many choices that must be made when creating metadata, however it may help people to understand how mp composes answers to the plain-language questions using the standard FGDC elements.

3. Examples Showing Plain-Language and Technical Terminology

Geology of the onshore part of San Mateo County, California: A digital database

Questions & Answers (FAQ-style HTML)

<http://geo-nsdi.er.usgs.gov/metadata/ofr98137.faq.html>
    

Outline (original-recipe HTML)

<http://geo-nsdi.er.usgs.gov/metadata/ofr98137.html>
    

Parseable text (not HTML)

<http://geo-nsdi.er.usgs.gov/metadata/ofr98137.metl>

Additional examples can be found on the U.S. Geological Survey Geoscience Data node of the National Geospatial Data Clearinghouse, at http://geo-nsdi.er.usgs.gov/.


Return to Table of Contents

This site is https://pubs.usgs.gov/openfile/of99-386/schweitzer.html
Maintained by the Eastern Publications Group Web Team
Last revised 11-2-99