USGS logo

Digital Mapping Techniques '00 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 00-325

Metadata Tips and Tricks

By Peter N. Schweitzer

U.S. Geological Survey
Mail Stop 918 National Center
Reston VA 20192
Telephone: (703) 648-6533
Fax: (703) 648-6560
e-mail: pschweitzer@usgs.gov

Metadata is the means by which we communicate to users of geologic map data what they need to know in order to understand and apply the data properly. A variety of resources are available to assist geologists in creating metadata that are consistent with the documentation provided for other scientific and ancillary data. This paper summarizes recent developments in those resources and encapsulates some useful guidelines by which the process of creating metadata can be managed effectively.

NEWS

Commercial offerings by RTSe-USA and ESRI now complement the freely-available tools developed by the user community. The Spatial Metadata Management System (SMMS) has recently been enhanced (http://www.rtseusa.com/), and the introduction of significant metadata management in ArcCatalog (http://www.esri.com/) promises to make the process more readily available to the GIS data producer. Recent changes in "Tkme" (http://geology.usgs.gov/tools/metadata/) support its use alongside GIS or in separate processing.

New utilities for processing metadata enable specific problems to be addressed more efficiently. When creating Enumerated_Domain sections to explain the meanings of abbreviated data values, the program "enum_dom" reexpresses a textual table of values and their definitions as metadata that can be pasted directly into Tkme. A similar program "src_info" carries out the same action for Source_Information sections, parsing simple bibliographic references into their component parts and writing the relevant metadata elements, which can then be inserted into the metadata document.

Web form version of the Enumerated_Domain helper:
http://geology.usgs.gov/tools/metadata/tools/doc/ctc/edom.shtml

Web form version of the Source_Information helper:
http://geology.usgs.gov/tools/metadata/tools/doc/ctc/srcinfo.shtml

For batch processing of metadata, the new facility "<tt>mq</tt>", an extension of Tcl/Tk, enables program code written in the Tool Command Language (Tcl) to read, modify, and rewrite metadata. Programming is required, but the Tcl language is simpler to use than the C-language code in which the programs mp and xtme are written.

Documentation for mq:
http://geology.usgs.gov/tools/metadata/tools/doc/mq.html

Recent improvements in mp, tkme, and related programs promise enhanced usability. mp now has the ability to generate HTML output as a series of "frequently-anticipated questions" (FAQ) using your metadata as the source of the answers. This method will help producers to review the metadata and will help users to understand the information more easily. Tkme now divides its window into two horizontally-arranged panes, with a small grip on the midline which can be used to adjust the relative sizes of each of the panes. Significant changes have been made to the program "err2html," whose purpose is to make the error reports generated by mp easier to understand. The new form of its output is tabular and ranks errors by severity, color-coding them and suppressing duplicate messages. New users may find this presentation more easy to interpret, so that problems can be fixed in order of their importance. These programs and their documentation can be obtained from http://geology.usgs.gov/tools/metadata/

ADVICE

1. Ask for help
Data producers at Federal, state, and local levels of government and in many non-governmental organizations have discovered that in the metadata creation process they have much in common. Numerous online resources are available to those who are experiencing difficulties, and it is important for new users to realize that other people are often happy to provide advice and assistance. See http://geology.usgs.gov/tools/metadata/ for links to people and resources that can help.

2. Don't use DOCUMENT
The old DOCUMENT aml was originally developed by EPA and USGS and subsequently supplied to ESRI for inclusion in ArcInfo version 7. This program had a number of flaws. It handles so much of the metadata so poorly that the metadata must be almost completely rewritten to be usable in the Clearinghouse. I have developed a web page to assist people in converting the output of DOCUMENT into metadata that is more appropriate. Please do not encourage anyone to use DOCUMENT, and do encourage them to seek help in rewriting metadata created using it.

How to fix the output of DOCUMENT:
http://geology.usgs.gov/tools/metadata/tools/doc/document/

3. Don't make too many files
Metadata creators are often tempted by logic to generate a full record for each and every distributable data file that they might make available. While this approach is initially appealing, its eventual result is to wear down the person doing the documentation, with a concomitant loss of quality in the work. Concentrate on the data files that contain original scientific contributions, or files that have undergone significant processing that required careful judgements that other people would not necessarily make. Document ancillary data files as Source_Information in the Lineage, indicating each with a Source_Produced_Citation_Abbreviation in a Process_Step.

4. Don't make too few files
It also does no good to try to describe too much information in a single record; the result is a record of undue complexity that is even more difficult for users to read than for the originators to maintain. Wherever sources, processing, or projection information vary among items in a data set, consider describing the components using separate metadata records.

5. Don't document ArcInfo attributes
Some data attributes exist as a consequence of the GIS or other software used to create the data. Where these have not been infused with scientific information, they can safely be left undocumented because their meanings and their values can be readily inferred from the knowledge of the software used. So for ArcInfo data sets, don't document AREA, PERIMETER, LENGTH, FNODE#, TNODE#, LPOLY#, RPOLY#, cover#, or cover-ID unless you have taken the inadvisable step of storing important scientific information in one of these fields. This rule of thumb simplifies the presentation of metadata to the end-user as well as their creation and maintenance.

6. Errors are not equally important
While it is a useful tool for checking the structure and format of metadata, it is not good to put too much faith in mp. Human review is the thing that really matters. mp can help, but isn't the sole arbiter of what is and what is not good metadata. Prioritize errors like this, from most serious (fix) to least serious (understand and let go):

  1. Indentation problems
  2. Unrecognized elements
  3. Misplaced elements
  4. Too many of some element
  5. Missing elements
  6. Empty elements
  7. Improper element values
  8. Warnings and upgrades

7. Leave some specific elements out if they cause trouble
Some metadata elements are difficult to fill out and are so inconsistently understood in the community at large that it does not make sense to agonize over their values. Fill them in if you have appropriate information, but simply leave them out if not:

Latitude_Resolution
Longitude_Resolution
Abscissa_Resolution
Ordinate_Resolution

Source elements, if left out, should be assumed to refer to the data set, publication, or report that is the subject of the metadata. These elements can be safely omitted if their values would be "this report":

Entity_Type_Definition_Source
Attribute_Definition_Source
Enumerated_Domain_Value_Definition_Source

8. Review using FAQ-style output
mp can now generate HTML in the form of a list of frequently-anticipated questions (FAQ) which is likely to be more familiar to many readers. This form of metadata can be used to facilitate the human review of metadata, especially by people who are not conversant with the metadata standard itself.

9. Use controlled keywords
With the proliferation of information available on the internet it is becoming increasingly important to provide keywords that come from widely-recognized thesauri such as Georef by AGI. Such keywords can be discovered by web search engines and are able to provide better conceptual associations among related data and reports than non-controlled keywords. Alternative user interfaces such as those based on pick-lists, can be developed if controlled keywords are chosen. I believe that interfaces more sophisticated than free-text search will become necessary in the future in order to find information effectively.

Home | Contents | Next

U.S.Department of the Interior, U.S. Geological Survey
<https://pubs.usgs.gov/openfile/of00-325/schweitzer.html>
Maintained by Dave Soller
Last updated 11.01.00