USGS visual identity mark and link to main Web site at http://www.usgs.gov/

Digital Mapping Techniques '01 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 01-223

Improving Access to Metadata Using Keywords from Controlled Vocabularies

By Peter N. Schweitzer

U.S. Geological Survey
Mail Stop 918 National Center
Reston VA 20192
Telephone: (703) 648-6533
Fax: (703) 648-6560
e-mail: pschweitzer@usgs.gov

Since the introduction of the National Spatial Data Infrastructure in 1995, the development of well-structured metadata has held the promise that geospatial data could be better organized by the people who maintain them and better presented to the public by the people who provide access to them.

One aspect of the metadata standard that facilitates this is keywords. Where keywords are stored in FGDC metadata, the source of those keywords must be indicated as well. This does not require that terms be chosen from published lists of key words, but allows advantages of such controlled vocabularies to accrue for the user of the metadata. Controlled vocabularies refers to formally-defined lists of terms, usually hierarchical, that are preferred for use in specific ways. In a controlled vocabulary the scope of meaning of each term can be specified, along with relationships to other terms (broader, narrower, related, or preferential). Controlled vocabularies are maintained by an authority (a person or group) who ensures that the terms are all defined consistently and have well-defined relationships.

This paper describes the use of a controlled vocabulary, for place names, to categorize geoscientific spatial metadata. Its focus is on the technology used for choosing appropriate terms and storing the terms in each metadata record.

A CONTROLLED VOCABULARY FOR PLACE NAMES

For place names I have chosen to use two Federal Information Processing Standards (FIPS), 6-4 and 10-4. FIPS 6-4 specifies numerical codes for states and counties (or equivalent entities) in the US and its territories. Each state is identified using a two-digit number, and each county within the state is identified using a three-digit number. Thus a county can be unambiguously identified using a five-digit code consisting of its state code and its county code. Unique codes are needed for these place names because many states have counties with the same name (for example Washington, Jefferson, Franklin, Lincoln, and Jackson counties all occur in 24 or more states).

FIPS 10-4 specifies alphanumeric codes for countries of the world and first-order subdivisions of them. Of the first-order subdivisions I have used only states in the United States and in Mexico and provinces in Canada. This decision reflects the distribution of data that I wish to categorize by place.

I have augmented these standard place names with names of major oceanic regions and names of continental regions. These groupings allow me to build a pick-list interface with a relatively narrow and deep hierarchy, so that users don't have too many choices at the highest level, where they begin to choose places.

Internet resources

Place keywords and unique id codes are found at http://geo-nsdi.er.usgs.gov/metadata/placekey.txt. These place keywords are arranged hierarchically; the hierarchical relationships are shown by indentation. On each line the unique identification code of the place is given followed by a colon and then the place name. A short section of this file is listed below.
	US: United States
		US01: Alabama
			01001: Autauga
			01003: Baldwin
			01005: Barbour
			01007: Bibb
			01009: Blount
			01011: Bullock
			01013: Butler
			01015: Calhoun
			01017: Chambers
			
A web interface utilizing these place keywords is found at http://geo-nsdi.er.usgs.gov/cgi-bin/place/. This utility operates as a Common Gateway Interface (CGI) process attached to a web server. It presents to the user a set of simple text links that traverse the hierarchy of the place keywords. At each point in the hierarchy, it lists as links any metadata records that have been assigned the chosen place name. An example screen is shown in figure 1.

Place keyword search at USGS Geoscience Data node of the National Geospatial Data Clearinghouse

Figure 1. Place keyword search at USGS Geoscience Data node of the National Geospatial Data Clearinghouse.

PLACE KEYWORD ASSISTANT: A TOOL TO SELECT PLACE NAMES FOR METADATA

Place names by themselves don't help much; the key is to associate each metadata record with the corresponding place names from the controlled vocabulary. You can do this manually, of course, using your favorite text editor or Tkme (http://geology.usgs.gov/tools/metadata/tools/ doc/tkme.html). Just add lines like this:
	Keywords:
		Place:
			Place_Keyword_Thesaurus: 						
				Augmented FIPS 10-4 and
				FIPS 6-4, version 1.0
			Place_Keyword: US56 = Wyoming
			
But when you're dealing with a large number of records, it helps to use a specialized tool for this purpose. The tool I've developed is called the Place Keyword Assistant. This tool is written in Tcl/Tk, so to use this tool, you'll need to install Tcl/Tk on your system and also install the mq extension that enables Tcl/Tk scripts to read, modify, and write FGDC metadata. The Place Keyword Assistant has the following major functions:
  1. Reads metadata records. Metadata records may be
    1. named on the command line
    2. listed in a file that is named on the command line, or
    3. found recursively from current directory and its subdirectories.
  2. Displays each record as it is selected. The text is shown in a simple scrollable window.
  3. Presents hierarchical place keywords for the user to choose, and keeps track of keywords that have been chosen.
  4. Saves the selected place keywords in the metadata record.
The Place Keyword Assistant creates three windows. One contains a list of metadata records (by file name) that you can select. It creates this list by traversing all of the directories below the one from which the program is run. Choose a metadata record from this list. Entries shown in green have some place keywords assigned using this software; those shown in red might have place keywords but not keywords chosen from this list. The second window simply shows the text of each record as it is selected. This can be used to make decisions about which places to assign to the record. The third window shows you the place names that you can assign to the metadata record. The keyword chooser window is shown in figure 2 and some functions are described in table 1. It consists of five list windows, each of whose contents are determined by the window to its left. In this example the user chose Land from among Oceans and Land, then North America from the list of continents, then United States from the list of countries in North America, then the state of Arizona, and from its counties the one named Graham. The list in the lower right corner contains those places whose names have been selected for inclusion in the metadata record. Its background is blue to distinguish it from the others visually, and its entries include the unique FIPS code associated with each area.

Keyword selection window

Figure 2. The keyword selection window.

Table 1. Keyword selection buttons, keystroke equivalents, and what they do.


[Click HERE to download a Microsoft Word version of this table]

DETAILED INSTALLATION INSTRUCTIONS FOR USE ON MS-WINDOWS

  1. Install Tcl/Tk
    1. Download tcl832.exe from http://dev.scriptics.com/ftp/tcl8_3/tcl832.exe.
    2. Run tcl832.exe. Choose default install in C:\Program Files\Tcl.
    3. Restart. This makes the system recognize file names ending with .tcl as Tcl scripts.
  2. Install MQ
    1. Download the complete package of metadata tools for MS-Windows from http://geology.usgs.gov/tools/metadata/all_win.exe.
    2. Run all_win.exe. Allow the installer to store the files in C:\USGS.
    3. Copy C:\USGS\tools\bin\mq25.dll into C:\Program Files\Tcl\lib.
    4. Create directory C:\Program Files\Tcl\lib\mq.
    5. Copy C:\USGS\tools\bin\pkgIndex.tcl into C:\Program Files\Tcl\lib\mq.
    6. Test by running Wish
    1. Choose Wish from the Start menu, following Programs >Tcl>Wish.
    2. Two windows appear. One is labeled "Console" and contains a prompt (percent sign). Click this window.
    3. At the % prompt, type package require mq then press Enter.
    4. The interpreter should respond with the version number of mq. At this writing this value is 2.5.11. If you get an error message instead, something wasn't installed correctly.
  3. Install Place Keyword Asssistant
    1. Find a directory above those where you have stored your metadata. There can be other files in its subdirectories, but this works out-of-the-box if your metadata files all have the extension .met. For this example, suppose this is D:\data.
    2. Download placekey.txt from http://geo-nsdi.er.usgs.gov/metadata/placekey.txt and save it in D:\data.
    3. Download placer .tcl from http://geo-nsdi.er.usgs.gov/metadata/placer.tcl and save it in D:\data. This file should have a "Tk" icon.
    4. Double-click placer.tcl. The windows should appear.

USING ARCEXPLORER 3 WITH THE PLACE KEYWORD ASSISTANT

ESRI's ArcExplorer 3 can be used to display U.S. counties (here focusing on the Southwest) with scientific data overlying the county boundaries. Because the counties are shown as polygons, these can be selected when their layer is made active (figure 3). After selecting the counties that overlap the scientific data, the user clicks on the Attributes button in the ArcExplorer toolbar to bring up the table of attributes of the selected counties. This table is divided in two panes by a vertical bar. In the left pane the names of the selected counties are shown. The right pane contains the attributes of the county selected last.

Figure 3. ArcExplorer 3 map showing counties in the southwest U.S. with some data over them, 8 counties of New Mexico selected.

Note that what ArcExplorer shows in the left side of the attribute window is the first item of the layer's DBF file that is not an intrinsic attribute of ArcInfo. The counties layer I have used here was downloaded from the National Atlas of the U.S. I modified the DBF file by deleting the ArcInfo intrinsic attributes and swapping the column positions of the state name and county name attributes, so that the county name comes first.


RETURN TO Contents
National Cooperative Geologic Mapping Program | Geologic Division | Open-File Reports
U.S. Department of the Interior, U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/2001/of01-223/schweitzer.html
Maintained by David R. Soller
Last modified: 18:24:42 Wed 07 Dec 2016
Privacy statement | General disclaimer | Accessibility