Digital Mapping Techniques '03
— Workshop Proceedings
U.S. Geological Survey Open-File
Report 03–471
The Alaska Division of Geological & Geophysical Survey’s Metadata Policy Development and Implementation
1Alaska Division of Geological & Geophysical Survey, 3354 College Road,
Fairbanks, AK 99709-3707.
Telephone (907) 451-5031, (907) 451-5027; fax: (907) 451-5050; e-mail carrie_browne@dnr.state.ak.us, larry_freeman@dnr.state.ak.us
2Gina R.C. Graham, 2575 Cresbard Ct., Fairbanks, AK 99709
Telephone (907) 474-3647; e-mail graben@gci.net
ABSTRACT
In December 2001, the Alaska Division of Geological & Geophysical Surveys (DGGS) identified an emerging problem concerning the documentation of our geospatial datasets. For example, DGGS released 20 publications in 2001, of which 17 required metadata; however, only 1 actually included metadata. DGGS has published an enormous number of geospatial datasets in recent years, and the number of publications with geospatial datasets increases every year. Until recently there was no incentive, penalty, or process for documenting that data. To remedy this, DGGS launched a division-wide metadata policy. This policy stipulates that all future DGGS publications with geospatial data must have Federal Geographic Data Committee (FGDC) compliant metadata before they are published.
In January 2002, the DGGS director assembled a metadata committee to devise an efficient approach to generating metadata for our geospatial datasets. The committee proposed the establishment of an interim “transition period” of 6 months, January through June 2002, during which a minimum metadata standard would be met prior to publication. After June 2002, complete FGDC-compliant metadata would be required for all geospatial datasets prior to publication. Throughout the transition period, the metadata committee met semimonthly to familiarize themselves with the details of the FGDC Content Standard for Digital Geospatial Metadata Workbook version 2.0, and to discuss how the standard should be applied to DGGS. The committee’s first priority was to determine which of the FGDC metadata elements specifically apply to the various types of geospatial datasets published by DGGS. Another task during the transition period was to research existing metadata generation tools and select the best tool for DGGS. The staff’s diverse preferences in GIS software and geospatial data formats drove the metadata committee to recommend that DGGS create its own metadata tool specifically designed for the unique needs of its staff.
Metadata is now an integral part of the publication process at DGGS, as evidenced by the fact that, in 2002, 64 DGGS publications that required metadata did include it. We are continuing to improve the methods and have established an initial DGGS-specific metadata text template. We expect to have a user-friendly Web-based application available to DGGS personnel by October 2004.
INTRODUCTION
Since April 11, 1994, when President Clinton signed Executive Order 12906, the issues of data archiving and documentation have been brought to the attention of most local, state, or federal government agencies. One consequence of this Executive Order is that agencies must develop plans and allocate resources to generate metadata for geospatial datasets. If an organization introduces a metadata policy into their geospatial dataset creation process, managing the collection of their geologists’ datasets becomes less cumbersome in terms of knowing what data they have, how it was generated, when, and by whom. One obvious scenario that illustrates this would be if a geologist were to leave a state survey after many field seasons. If the state survey required metadata documentation for all its geospatial datasets, it would be able to protect and manage the geologist’s data left behind. The background information on how the dataset was generated would be preserved and the dataset would not be discredited by possible data misuse. Of course, the initial reaction of an organization’s members learning they are now responsible for the generation of metadata might not be optimistic. Fortunately, this reaction is generally short lived because no one wants to have spent valuable time compiling a geospatial dataset only to have the data users discredit the dataset for lack of metadata documentation.
ESTABLISHING A METADATA POLICY
The Alaska Division of Geological & Geophysical Surveys (DGGS) has published geospatial datasets for more than 10 years and has, like most agencies, experienced challenges with its staff regarding the need to create metadata for all publishable geospatial datasets. Although DGGS staff knows why metadata is necessary and how to generate it for their geospatial datasets, very few older published geospatial datasets included metadata documentation. There were no real consequences to the staff for this lack of metadata, and so geospatial datasets commonly were published without adequate documentation. When a digital dataset was requested, metadata would have to be generated, sometimes years after the dataset was created and by individuals who didn’t actually produce the data.
The DGGS metadata policy emerged in December 2001 after numerous public requests for digital geospatial datasets from DGGS publications that lacked metadata documentation. An investigation into our geospatial dataset documentation process was initiated to determine why none of the requested geospatial datasets included metadata. Our findings were staggering. During the 2001 publication year, only 1 of 17 publications containing geospatial datasets included proper metadata. Moreover, less than 5 percent of our published geospatial datasets over the previous 10 years included FGDC-compliant metadata.
We reviewed how DGGS had been addressing metadata creation and found that the agency was creating metadata for requested digital geospatial datasets on an ad hoc basis. Unfortunately, generating metadata this way was rapidly becoming overwhelming as the volume of geospatial data without metadata increased. We also realized that the originators of the geospatial data either had left the agency or were committed to other projects and were not available to assist in resurrecting information needed to write the metadata.
The lack of proper geospatial dataset documentation and the increasing pressure of public requests for our metadata were brought to the attention of the director of DGGS. In response, the director established a formal metadata policy stating that as of January 2002, all new geospatial datasets must include FGDC-compliant metadata before they will be published by DGGS. This formal policy was met with opposition and frustration at the survey. In addition, the initial metadata policy was flawed because it did not include an implementation plan. In response to the resistance to this policy, the director assembled a metadata committee from his staff to establish an effective work plan for generating FGDC-compliant metadata at DGGS. As a result of the staff discussions on metadata, it was clear that one goal of the committee would be to produce instructions and methodology for the generation of metadata that could be clearly communicated to project teams at DGGS.
METADATA TIMELINE FOR DGGS
The metadata committee consisted of DGGS managers and various staff members. The committee agreed that a strategy was needed to establish a transition into systematically generating metadata for our geospatial datasets. The recommended timeline was distributed at a survey-wide staff meeting in January 2002, and is presented in figure 1.
Figure 1. DGGS timeline for implementing metadata policy. |
METADATA COMMITTEE MEETINGS
Once the survey staff agreed on the timeline, a second metadata committee was assembled. This committee was also staff-based and consisted of at least two members of each DGGS section (Minerals, Energy, Engineering Geology, and Geologic Communications). The committee met semimonthly for 6 months, during the timeline-defined “transition period,” to become familiar with the details of the FGDC Content Standard for Digital Geospatial Metadata Workbook version 2.0 (Federal Geographic Data Committee, 2000) and to determine how to apply the FGDC standard to DGGS. Many of the committee members were not familiar with the FGDC workbook or its contents and had never completed a full FGDC-compliant metadata file for a geospatial dataset. To help get past our lack of metadata knowledge we used examples of metadata from USGS geologic mapping products, as well as some from other state surveys, and communicated by e-mail with Peter Schweitzer of USGS regarding metadata formatting and development questions that arose.
The first lesson for the committee members was to learn how to read the FGDC workbook and interpret its contents. Once everyone on the committee was familiar with the FGDC workbook we determined that our main concern was to select which FGDC metadata elements applied to specific types of geospatial datasets published by DGGS.
As the committee evaluated various metadata elements within the FGDC workbook, we began interpreting and clarifying relevant metadata element definitions to relate specifically to DGGS geospatial datasets. Our idea was that this strategy would assist data originators (DGGS geologists, staff, and other data contributors) in generating FGDC-compliant metadata for their unique geospatial datasets by helping determine when and why a particular element would apply to their dataset. The committee also supplemented metadata element domains with DGGS “boilerplate” text where appropriate or clarified element definitions so that data originators could clearly understand what was an appropriate entry for that element.
The committee thoroughly researched available metadata generation tools to determine the best tool for DGGS. We concluded that DGGS should create its own metadata tool designed specifically for its needs because the currently available metadata generation tools are software- and geospatial dataset-format specific, which does not allow for much flexibility. The metadata tool DGGS created is specialized to our diverse preferences in GIS software (ESRI ArcGIS, MapInfo, ERMapper, AutoCAD) and formats of geospatial datasets (geologic maps, data tables, databases).
The DGGS-specific metadata generation tool was produced in two phases. The first phase, completed in July 2002, consisted of constructing a text-formatted file or text template that includes all FGDC workbook elements and their DGGS-specific definitions applicable to DGGS datasets (fig. 2). In addition, DGGS provides training and support for those using this metadata template. The second, ongoing phase entails the production of a user-friendly interface application that is (1) not software specific, and (2), will output an FGDC-compliant metadata file when the data originator has finished entering information into the working metadata file. This tool is tentatively scheduled for completion by October 2004.
DEFINING A DATA UNIT
As the committee gained more knowledge about how to produce FGDC-compliant metadata, the question arose of how to divide a geospatial dataset to produce more helpful FGDC-compliant metadata. Establishing how many metadata files were needed for a large and diverse geospatial dataset was difficult. No one wanted to generate too many or too few metadata files for a geospatial dataset. A DGGS geologic publication consists of multiple layers or themes, each made up of a coverage or shapefile. If each coverage or layer of a published dataset were documented with an individual metadata file there would be numerous metadata files containing redundant data for each unique publication. On the other hand, if a single metadata file documents the dataset as a whole there is the legitimate concern that important pieces of metadata, essential to a specific layer or theme, may be lost. The metadata committee discussed three main options to address this problem.
Option 1: Publication Set = Data Unit
We proposed generating one metadata file for each published set of geospatial data and came to the following conclusions:
Option 2: Thematic Layer = Data Unit
Next we suggested generating one metadata file for each thematic layer composing a geospatial dataset. We concluded:
Option 3: Distribution Determination = Data Unit
The last option that we considered was generating our metadata files as a function of how a digital geospatial dataset will be distributed. The following are our conclusions:
The metadata committee concluded that Option 3 was the best fit for DGGS. This means that the number of data units that make up the dataset (and thus, the number of metadata files required for a specific geospatial dataset) will be determined by how a digital geospatial dataset will be distributed by DGGS. For example, if a published geologic map will be distributed as a stand-alone map only, then the data unit would be the entire map and only one metadata file is needed for the publication. On the other hand, if a published geologic map will be distributed on a thematic layer basis when requested, then one data unit is assigned to each thematic layer.
METADATA GENERATION PROCESS
Once the text-formatted file including all FGDC workbook elements and definitions (fig. 2) became available for DGGS in July 2002, the survey began generating full FGDC-compliant metadata for our geospatial datasets. The metadata generation process is illustrated below in figures 3 and 4.
Figure 2. Metadata text-formatted template, including all FGDC sections/elements and their definitions, to be used when generating metadata for a geospatial dataset. |
Figure 3. Metadata process flow chart providing an overview of DGGS metadata generation process. |
Figure 4.0. Metadata generation flow chart detailing all steps to be used when generating FGDC-compliant metadata for a geospatial dataset. |
Figure 4.1. Geospatial data cleanup process detailing the checks and balances of our geospatial data files prior to storage in the dataset library. |
Figure 4.2. Metadata editing process detailing the steps a metadata file goes through at DGGS before it is considered FGDC-compliant metadata. |
DGGS has an assigned Metadata Coordinator to check all metadata files for errors using the mp (metadata parser) tool (created by Peter Schweitzer of USGS). This saves time and effort by not trying to train everyone in how to interpret the error file mp produces. This mp program is a compiler to parse formal metadata, checking the syntax against the FGDC Content Standard for Digital Geospatial Metadata and generating output suitable for viewing with a web browser or text editor. It runs on UNIX systems and on PC’s running Windows 95, 98, or NT. The mp tool generates a textual report indicating errors in the metadata, primarily in the structure but also in the values of some of the scalar elements (i.e. those whose values are restricted by the standard). We also have an on-site publications editor who thoroughly reads the metadata files for content and grammar to help eliminate errors.
Once the metadata file(s) have been generated for a geospatial dataset they are sent to the Alaska State Geospatial Data Clearinghouse (ASGDC) to be posted on their website (http://www.asgdc.state.ak.us/).
WHAT WE HAVE LEARNED
DGGS was not in a position in January 2002 to immediately begin generating proper FGDC-compliant metadata for its geospatial datasets. Even though the director made it a priority and policy that FGDC-compliant metadata must be included with all geospatial datasets published by DGGS beginning January 2002, the survey needed a strategy specifying how this new priority and policy would be implemented. This need is evidenced by the fact that metadata generation has occurred with greater frequency and precision since the metadata committee’s strategy was introduced to the survey.
It is clear that a need exists for a user-friendly, non-software-specific application to help generate FGDC-compliant metadata. In all the research done by the second metadata committee, participants were astonished that with a federal metadata mandate in place there are few applications if any available that are non-software-specific and assist a user in generating metadata files.
The “FGDC Content Standard for Digital Geospatial Metadata Workbook version 2.0” is an invaluable resource for generating metadata for geospatial datasets. It is also a great resource for training personnel on what is expected of them when producing a metadata file. In DGGS, we decided to bring the workbook into a personalized language that the staff could understand and relate to in the form of a text-formatted metadata template.
The more documentation completed throughout the geospatial dataset production phase, the easier it is to generate metadata prior to the publication process. Once people understand what specific information metadata files require, they become more organized throughout the data gathering and documentation process of their project, resulting in more efficient metadata generation.
CONCLUSIONS
The DGGS metadata policy established in January 2002 sparked rapid development of FGDC-compliant documentation of all geospatial datasets ready for publication by DGGS. In the publication year 2002 DGGS generated 64 publications that contain or were produced from geospatial datasets; each is documented with metadata. The director-appointed division-wide metadata committee was successful in learning what is required to complete FGDC-compliant metadata. DGGS created a specialized metadata tool to aid in the completion of FGDC-compliant metadata and is currently producing a user-friendly, non-software-specific application that will be ready for use in October 2004. We have proposed a project and are applying for funding to generate metadata files for the legacy datasets from DGGS publications released prior to January 2002. If the project is funded, our entire geospatial dataset collection will include associated metadata files and be properly documented, archived, and protected for future use.
REFERENCES
Federal Geographic Data Committee, 2000, Content Standard for Digital Geospatial Metadata Workbook Version 2.0, Washington D.C., http://www.fgdc.gov/metadata/meta_workbook.html.
Schweitzer, Peter N., 1995, MP: A compiler for formal metadata (2.7.6 ed.): Reston, Virginia, U.S. Geological Survey, http://geology.usgs.gov/tools/metadata/.