Indiana Geological Survey
Indiana University
611 North Walnut Grove
Bloomington, IN 47405-2208
Telephone: (812) 855-7636
Fax:(812) 855-2862
e-mail: jsteinm@indiana.edu, hill2@indiana.edu, sowderk@indiana.edu
Virtually all data collected or processed by the IGS are geospatially oriented. Hence, the application of metadata to their cataloging within an organization is easy to envision. By constructing metadata incrementally throughout the life of a project, a means is provided to ensure that adequate documentation is captured upon publication of products, and also prior to an investigator's departure.
Beyond the cost of collection, the value of scientific data lies in their use. Whether used by the original investigator or years later by someone else, scientific data only maintain value if utilized. To be of utility, they must be accessible; to be accessible, they must be discoverable. Many organizations rely heavily on institutional memory -- the collective knowledge and history of an organization held by employees, especially those who have been there for a number of years (National Research Council, 2002) -- to aid in the discovery and accessibility of data. All too often when a long-term and productive employee retires or leaves an organization, an institutional memory of inestimable value is lost. With him or her commonly go such simple information as the physical whereabouts of a data set. Once lost within the organization -- namely, once its location is no longer known within the building -- the data set is essentially useless.
Finding (discovery of) data within an organization involves identifying the existence and location of desired data sets and collections. Ancillary considerations include ascertaining data availability, quality, and format. Within the petroleum industry, it has been estimated that between 60 and 80 percent of a geoscientist's time is spent searching for data; the balance is spent organizing and analyzing it (S. Natali, Barrett Resources, personal commun., 2001). One internal goal of most public or private organizations is to shorten the discovery time so the investigator can invest more valuable time in using the data. In many instances, however, potential users gain knowledge of, and access to, data by traditional means: through personal acquaintances, letters, on-site visits, or by telephone, fax, or e-mail. Too often, knowledge of the mere existence of geoscience data is reliant on personal relations, that is, institutional memory.
Discovery of data from outside an organization requires a certain degree of public relations by the organization that is archiving data. The IGS, like many state and federal Earth science institutions, promotes its holdings not only via e-mail and the Internet, but also through mass mailings, professional meetings, posters, and CD-ROM's. Digital data catalogs and access to them over the Internet are increasingly common, but many investigators are surprised to learn that digital access to a catalog's collection is not yet available. In many instances, funds to build an electronic catalog and provide Internet access are available only when garnered from existing operational funds; new money for these efforts rarely is afforded.
Adequate cataloging of data may seem time consuming and not terribly exciting, yet the costs involved in data acquisition generally far outweigh all other costs combined; reacquisition of data, if even possible, is more costly than initial acquisition and retention. In the current economy, organizations simply cannot afford to lose the usefulness of valuable data by neglecting documentation procedures. The IGS has undertaken the task of inventorying, cataloging, and creating metadata for new and historical data sets. To promote this initiative, the IGS has formed the Data at the Indiana Geological Survey Committee (DIGS Committee). The chair of the committee is the head of the Technology Transfer Section, and the members include staff across all IGS disciplines. They are charged with examining the factors involved in conducting a Survey-wide inventory of files and records, samples, archives, and publications. The committee's immediate goals are to capture IGS data, to develop the means of data retrieval, to develop a database to organize and access records of all IGS data, and to provide Internet access to an inventory of selected records. Some considerations the committee is taking into account are the lumping and splitting of items into various categories, design of inventory forms, necessary resources (personnel, equipment, supplies), metadata and quality assurance, public vs. proprietary data, barcoding, prioritization of data capture, efficiency and ease of the inventory process, staff training, and Internet deliverability.
Metadata are descriptive information about data and information resources. Typically, metadata describe, point to, or otherwise complement the information content of the data to which they are related. Metadata provide a concise aid in locating desired information and help make such information easily accessible. It is particularly useful for geospatial information because federal standards have been written.
On April 11, 1994, President Clinton signed Executive Order 12906. This order, among other things, established "a coordinated National Spatial Data Infrastructure (NSDI) to support public and private sector applications of geospatial data in such areas as transportation, community development, agriculture, emergency response, environmental management, and information technology." Additionally, Executive Order 12906 mandated "the Standardized Documentation of Data . . . each agency shall document all new geospatial data it collects or produces, either directly or indirectly, using the standard under development by the FGDC [Federal Geographic Data Committee], and make that standardized documentation electronically accessible to the Clearinghouse network."
The FGDC standard describes what information is to be provided by the metadata and in what format the data should be provided. For example, the FGDC standard directs that the producer of a data set must describe the data's quality. Metadata help ensure that data remain usable in perpetuity. Moreover, metadata provide assurance that the data are of sufficient quality and validity, and eliminate one of the greatest barriers to the use of scientific data: discovery.
Training of IGS staff in metadata guidelines as specified by the FGDC Content Standard for Digital Geospatial Metadata has been provided through a series of in-house workshops. A policy has been prepared by the IGS administration (see Appendix) to require metadata creation for all new data sets and the creation of project metadata for final products. The ultimate goal is to increase the value of already valuable data and make it easier to access and retrieve. The IGS Intranet Web site currently provides easy access to staff for metadata keyword and category searches, and the IGS Internet site will ultimately provide users with data that can be downloaded.
Figure 1. The short-term cost and value of data, either gathered in the field or generated in the laboratory, differ from their long-term cost and value. While the initial cost to acquire data may be quite high (left), the annual and ongoing costs for retention can be low. The costs of reacquiring the data at some time in the future (to the right of the jagged lines), if reacquisition is even possible, are typically much higher than the original acquisition costs. (Figure modified from National Research Council, 2002.) |
The American Association of Petroleum Geologists (AAPG) has been promoting geoscience preservation and access for over 50 years. It has had a standing committee for core and sample preservation since 1948, and supports the American Geological Institute (AGI) proposal to create a centralized repository, the National Geoscience Data Repository System (NGDRS) -- in effect, a Library of Congress for samples in the public domain (American Geological Institute, 1994, 1997; Montgomery, 1999). To initiate the formation of the NGDRS, AGI secured support from the U.S. Department of Energy and some petroleum companies, developed a repository data model, facilitated the transfer of some data (cores, cuttings, paleontological samples, seismic data, logs, and scout tickets) from the private to the public sector, and implemented and is currently operating GeoTrek, a software data catalog and access system available on the Internet http://www.agiweb.org/NGDRS/. The location of the centralized facility has not been determined, and petroleum industry support is mixed. Companies are not willing to donate materials until a repository is located (Montgomery, 1999), and many individuals feel that a network of distributed repositories at key locations in the country would foster a greater degree of use. Finally, many state geological surveys would not contribute their materials, since they already have a statutory obligation to archive state-derived data.
Once in digital form, data are not guaranteed immortality. Data loss can result from physical degradation of the magnetic medium (particularly tape, which should be re-written about every 5 years), obsolete formats (and obsolete equipment to access them), the migration from one format to another, or the lack of complete auxiliary data (such as header information, recording parameters, calibration data, metadata).
CD-ROM storage currently is one of the more popular forms of digital data storage. Benefits include a simple and low-cost replication process, ability to store multiple data sets (e.g., text, images, video, and audio), and random access to the information. CD-ROM's are also expected to have a shelf life projected to exceed 25 years under standard office conditions.
Digital data also require periodic refreshing. Accessibility and retrievability can be guaranteed only if data are migrated to protect against media deterioration and technology evolution.
The application of informatics may be an important goal in geoscience data discovery and access. In such a scenario, all data would be in digital form and accessible over the Internet. Each sample could be located by its spatial coordinates, and attendant metadata would record the circumstances under which the sample was collected and would provide quality control. Such a system requires standardized formats for data archiving, software support, data mining tools, and a knowledgeable end-user community (see, for example, the Kansas Geological Survey's Geoinformatics efforts at http://www.kgs.ukans.edu/Geoinfo2/).
The Smithsonian's National Museum of Natural History (NMNH) is creating a "Research and Collections Information System" that approaches an informatics-based system. The intention is to accomplish three main goals: (1) better collections management to track the disposition of specimens acquired, loaned, borrowed, or disposed, and their locations; (2) online access to all digital specimen data for the benefit of museum research, collections, public program's staff, scientists worldwide, and the general public worldwide; and (3) participation in national and international informatics initiatives. Using a suite of software applications that are used internationally, NMNH staff have begun to slowly implement the system in a number of science departments. The software was chosen for its stability, ability to scale, flexibility for diverse NMNH disciplines, and ability for customization. Museum officials estimate that between 40 and 50 million records will adequately represent NMNH specimens at a cost of $55 to $75 million. Presently, there are no funds for data entry, and the collections care and informatics initiatives are stalled for lack of funds.
American Geological Institute (AGI), 1997, National directory of geoscience data repositories. Alexandria, Virginia, American Geological Institute, 91 p.
Montgomery, S.L., 1999, Core values: the growing need for repositories: Oil & Gas Journal, v. 97, no. 46, November 15, 1999, p. 84-87.
National Research Council (NRC), 2002, Committee on the Preservation of Geoscience Data and Collections, Committee on Earth Resources, National Research Council. Geoscience Collections and Data: National resources in peril: Washington, D.C., National Academy Press, 205 p.
As stewards of public information, the IGS has an obligation to provide high-quality, well-documented data sets and information through readily searchable and easily accessible means. This policy is established and designed to ensure and facilitate full and open access to quality data for research and education.
Metadata serve as a means to efficiently collect, preserve, manage, access, and disseminate these data and information.
The intentions of establishing a Metadata Policy are to:
After the 6-month implementation period, comments about the metadata system will be solicited. The Metadata Working Group will review all comments and determine what, if any, changes need to be made to the system.
Handling of proprietary data