Content Metadata Standards for Marine Science: A Case Study, USGS Open-File Report 2004-1002


As scientific information has been increasingly made available on the Web, specialized means to organize and integrate such information have become necessary. Some attempts to meet this need are very general in purpose, while others, like the MRIB standard, have been refined for specific audiences, purposes, and scopes. Because this standard is intended for a broad audience and has a narrow but heterogeneous scope (information from the marine, lake, and coastal sciences, which draw from a variety of disciplines), its development provides useful lessons for the creation of other distributed libraries.

The development of an organic categorization scheme directly from the catalogued documents, and using terms from the documents themselves, was an expedient way to produce detailed controlled vocabularies. Mis-steps along the way largely involved ambiguous terms and overlapping facets. By striving to clarify terms and promote homogeneity of terms within facets, some of these problems have been removed.

The MRIB metadata development was also a lesson in how an ontology should, and should not, be constructed. The "indexing-to-discover" process proved very useful in expanding the term list, but the failure of the MRIB developers to respond early to apparent structural problems (such as in the example of the Class, Audience, and Format facets) has required time-consuming revisions of the metadata records, revisions of a sort that will not be possible once the MRIB categorization scheme begins to be used widely. Moreover, it seems that the early emphasis on geology left the preliminary categorization scheme lacking fields to describe important geographical and biological concepts. Perhaps earlier collaboration with non-geologist marine scientists would have made the need for the Physiographic Features and Biota fields more evident from the beginning (and would have provided more insight into how Biota might be best structured). In short, a process of indexing to discover terms, combined with a willingness to make structural revisions at the early stages, seems to be an ideal approach to developing a categorization scheme of this nature.

At this stage, the MRIB scheme has been used to index thousands of electronic resources. Further development of the scheme will likely focus on developing a supportive front end that will ensure that terms are clear to most end-users. To this extent, describing the challenge in anticipating users' search strategies, Bates (1998) stated:

...The better developed the typical system, the more arcane its fine distinctions and rules are likely to be, and the less likely to match the unconsidered, inchoate attempts of the average user to find material of interest. [The] question should not be: 'How can we produce the most elegant, rigorous, complete system of indexing or classification?,' but rather, 'How can we produce a system whose front-end feels natural to and compatible with the searcher, and which, by whatever infinitely clever internal means we devise, helps the searcher find his or her way to the desired information?'

With a solid EIC creation system now available to its would-be cataloguers, the MRIB faces the challenge of building support structures that will guide the user, through a variety of means, to useful information. This support structure will include the availability of definitions of terms, and will likely include searchable indices of related words linked to the MRIB's controlled terms. Other useful infrastructure is already available to the MRIB; the visibly faceted categorization scheme allows the user to choose a facet, then to see how one item from that facet intersects with the matches for a term from another facet. This capacity for guided wandering allows the user to both zoom in and pull back at his or her choosing.

Although the MRIB metadata were intended to work with the MRIB Web interface, the standard is open, and full information about its application is available on the MRIB Web site. This openness encourages the use of MRIB metadata fields, terms, and even metadata records by other applications. It is possible, for instance, to envision an implementation of the MRIB metadata that would regularly "spider" the Web seeking MRIB metadata stored within XML files or HTML tags, allowing authors to more readily update metadata for their documents. It would also be possible to adapt the MRIB metadata to terrestrial Earth science information, with the addition of methodological terms and "hot topics" more applicable to the continental realms. The flexible, hierarchical structure of the MRIB categorization scheme permits the development of MRIB-based systems that look radically different from one another. An interface to the categorization scheme might display only a subset of the available facets and terms, or truncate them at any point, to provide a simpler interface. Or the interface might have its own set of terms, each of them linked (invisibly to the end-user) to either one or more of the terms in the underlying ontology (perhaps in a different language, dialect, or technical level). An adventurous system might add its own, deeper levels to the terms.

These deeper levels could be used by its specialized interface, and ignored by other systems whose audiences had no need for such levels (or adopted by other systems which did need the specialized levels).

Ultimately, the MRIB is an ongoing project, and doubtless it will evolve much over the next few years. However, because the MRIB metadata represents a stable means for the classification of information resources about Earth's water bodies, the MRIB metadata may serve as a foundation on which a variety of user interfaces and metadata records can be built.

