Content Metadata Standards for Marine Science: A Case Study, USGS Open-File Report 2004-1002
MRIB Case Study
The Evolution of Metadata Standards Relevant to the Marine Sciences
A Brief History of Metadata
Traditionally, library cataloguers and archivists have used the term metadata to refer to descriptive information used to index, arrange, file, and improve access to a library or museum's resources (Gilliland-Swetland, 1998). This use of metadata follows from the Greek etymology of the prefix meta, which literally translates as "with, among, after, or behind." Thus "metadata" suggests something "accompanying" data but not essential to it. Recent advances in information technology and the rapid emergence of the digital library have somewhat altered the perception of the term metadata among information managers; metadata are no longer auxiliary definitions or descriptions of some library resource, but a fundamental dimension of said resource.
Because the digital library field is young, its terminologies and concepts are often defined vaguely or contradictorily between authors. The word metadata has met such a fate, and many definitions of it have been invented, refined, and circulated. Numerous examples could be cited here, but what is important is that — whether in its traditional context or in a digital library context — the key purpose of metadata remains the same: to facilitate and improve the retrieval of information.
An early use of metadata in the digital world occurred in the 1960's, with the advent of the international Machine-Readable Cataloguing (MARC) standards and the Library of Congress Subject Headings (LCSH). These standards were used to develop automated retrieval systems such as Online Public Access Catalogs (OPAC).
Metadata Standards Relevant to the Marine Sciences
A digital library for marine science, indeed for any Earth science, has one primary need beyond those of more general libraries: to describe the spatial and temporal coordinates associated with information. Although the MRIB is intended to enable both browsing and searching, it would certainly be possible to build a marine science digital library that relied on more standard bibliographic data and a more traditional library catalogue, oriented toward searching rather than browsing of categories. Such a library would trade browseability for truncated development time. Following is a discussion of some common metadata standards which are especially relevant to cataloguing marine resources, their advantages, and their shortfalls in relation to the MRIB goals specified above.
o MARC21, Machine-Readable Cataloguing
The Library of Congress developed the Machine-Readable Cataloguing (MARC) format in the 1960's to aid librarians in computerizing their catalogues and sharing records with one another (Furrie, 2000). MARC, presently in its twenty-first iteration, uses character codes to name bibliographic data fields (such as "100 1# $a" for its "Author" field). It was responsible for the computerization of library catalogues over the past several decades. Although efforts have been made to adapt MARC to electronic materials (Library of Congress, 2002), it still has notable disadvantages, including its human-unfriendly field names, inability to describe computer formats precisely, and age. These limits combined with MARC's inability to handle numerical spatial data make it inappropriate for an MRIB-style digital library, especially one which encourages authors to create their own bibliographic records. Some members of the traditional library world have also begun to reject MARC in favor of XML bibliographic records, which are expected to ease the integration of paper and electronic resources (Miller, 2000).
o Federal Geographic Data Committee Content Standard for Geospatial Metadata (FGDC-CSGM)
The FGDC began drafting its Content Standard for Geospatial Metadata in 1992 (Federal Geographic Data Committee, 2000). According to the FGDC-CSGM Workbook, this standard is intended to facilitate three uses of data:
FGDC-CSGM is used by many clearinghouses of data because of its thoroughness and its ability to describe data in very precise terms. However, the FGDC-CSGM is so specific that it becomes unwieldy to apply, and thus is undesirable for a catalogue of Web-based materials which are not necessarily raw data and for which much of the information required by the FGDC-CSGM may not be readily available or desirable for searching and browsing. Moreover, to use FGDC-CSGM metadata, one practically needs to be a specialist in the standard. This is not ideal for a library of Web content such as the MRIB that encourages Web document authors to compose the metadata profiles for their own documents. Nor do many of the FGDC fields possess controlled vocabularies, whose absence which makes FGDC records less interoperable for searching. For instance, there is no FGDC authority list of author names, so there is no certainty that a search of a collection of FGDC records for an author name will find all the records linked to a particular author (who might sometimes use initials rather than a full given name, or who might change his or her surname). For these reasons the MRIB team chose to develop a simpler, more focused (but still very detailed) metadata standard that would facilitate record interoperability, rather than designed with the meet-all-possible-needs approach of the FGDC standard.
o Dublin Core Metadata Initiative (DCMI)
The DCMI emerged from a 1995 workshop during which participants discussed essential categories by which Web resources could be catalogued (http://dublincore.org/about/history/). The present-day DCMI provides a set of standard field-names with the aim of "making it easier to find information," the slogan on the project's Web site (http://www.dublincore.org). DCMI specifies syntactical structure for various elements (such as the contents of the field, the controlled vocabulary from which the contents were derived, etc) of each field. With one exception, DCMI does not provide controlled vocabularies for metadata fields; instead, it registers such controlled vocabularies and allows metadata cataloguers to use (and to specify in their metadata) a relevant vocabulary developed elsewhere (Dublin Core Metadata Initiative, 2003a; DCMI). The exception is a rudimentary, flat controlled vocabulary for the DCMI "Resource Types" field (including such terms as "Image," "Event," and "Sound") (DCMI, 2003b). The field names in Dublin Core are human-language terms like "Publisher," rather than MARC-style machine codes. Because DCMI was developed specifically for electronic resources, it dispenses with some of the extraneous bibliographic fields that are irrelevant to electronic resources. Moreover, because it does not focus on describing "data" in the rawest sense, DCMI is simpler than FGDC and more broadly applicable. The DCMI provides fields to specify time ("Period") and location ("Points"), both of which are crucial to describe information resources about the Earth.
o ADEPT Metadata Standard
The ADEPT metadata standard results from collaboration among NASA's JOINed Digital Library, the Digital Library for Earth System Education (DLESE), and the Alexandria Digital Earth Prototype (ADEPT) . ADEPT, which is still in development, promises great versatility in dealing with Earth sciences resources in general. In particular, because the projects involved include a library of mostly raw data (Alexandria) and a project to organize kindergarten – , through college-level educational resources (DLESE), ADEPT will need to find effective ways to sort information by technical level. The ADEPT standards, being specialized for the Earth sciences, include fields competent for describing space and time in several ways. Although not strictly part of the ADEPT metadata, the Alexandria project has also developed an extensive polygon-based gazetteer which, in conjunction with geospatial metadata specified by the ADEPT standard, may provide very accurate location searches. That said, because ADEPT is adapted for the broader Earth sciences, it has some limitations in the scope of the marine sciences. From those sections of the ADEPT standard that are publicly available, it is difficult to judge how well ADEPT will describe information outside the disciplines of geology, geography, and education. The current standards propose a section tailored to the metadata needs of specific disciplines, but no details about the fields in that section or the breadth of disciplines covered are yet public.