Open-File Report 2009–1170
Importance of Data Management
The Strategic Plan for the U.S. Geological Survey Biological Informatics Program (2005–2009) recognizes the need for effective data management:Though the Federal government invests more than $600 million per year in biological data collection, it is difficult to address these issues because of limited accessibility and lack of standards for data and information...variable quality, sources, methods, and formats (for example observations in the field, museum specimens, and satellite images) present additional challenges. This is further complicated by the fast-moving target of emerging and changing technologies such as GPS and GIS. Even though these technologies offer new solutions, they also create new informatics challenges (Ruggiero and others, 2005).
The USGS National Biological Information Infrastructure program, hereafter referred to as NBII, is charged with the mission to improve the way data and information are gathered, documented, stored, and accessed. The central objective of this project is a direct reflection of the purpose of NBII as described by John Mosesso, Program Manager of the U.S. Geological Survey-Biological Informatics Program-GAP Analysis:At the outset, the reason for bringing about NBII was that there were significant amounts of data and information scattered all over the U.S., not accessible, in incompatible formats, and that NBII was tasked with addressing this problem….NBII’s focus is to pull data together that truly matters to someone or communities. Essentially, the core questions are: 1) what are the issues, 2) where is the data, and 3) how can we make it usable and accessible (John Mosesso, U.S. Geological Survey, oral commun., 2006).
Redundancy in data collection can be a major issue when multiple stakeholders are involved with a common effort. In 2001 the U.S. General Accounting Office (USGAO) estimated that about 50 percent of the Federal government’s geospatial data at the time was redundant. In addition, approximately 80 percent of the cost of a spatial information system is associated with spatial data collection and management (U.S. General Accounting Office, 2003). These figures indicate that the resources (time, personnel, money) of many agencies and organizations could be used more efficiently and effectively. Dedicated and conscientious data management coordination and documentation is critical for reducing such redundancy. Substantial cost savings and increased efficiency are direct results of a pro-active data management approach. In addition, details of projects as well as data and information are frequently lost as a result of real-world occurrences such as the passing of time, job turnover, and equipment changes and failure. A standardized, well documented database allows resource managers to identify issues, analyze options, and ultimately make better decisions in the context of adaptive management (National Land and Water Resources Audit and the Australia New Zealand Land Information Council on behalf of the Australian National Government, 2003).
Many environmentally focused, scientific, or natural resource management organizations collect and create both spatial and non-spatial data in some form. Data management appropriate for those data will be contingent upon the project goal(s) and objectives and thus will vary on a case-by-case basis. This project and the resulting Data Management Toolkit, hereafter referred to as the Toolkit, is therefore not intended to be comprehensive in terms of addressing all of the data management needs of all projects that contain biological, geospatial, and other types of data. The Toolkit emphasizes the idea of connecting a project’s data and the related management needs to the defined project goals and objectives from the outset. In that context, the Toolkit presents and describes the fundamental components of sound data and information management that are common to projects involving biological, geospatial, and other related data. These components include project planning, standards, data stewardship, data modeling, quality assurance/quality control (QA/QC), metadata, geospatial data acquisition, critical elements of data, and free tools and resources. Also, where possible, it provides guidelines for addressing those various components based on industry, Federal, and international best practices and standards.
The effectiveness of planning and decision-making is closely related to the quality and completeness of available information. Quality information can only be derived from quality data. Global positioning systems (GPS) and geographic information systems (GIS) greatly contribute to improved resource management and decision making. However, efficiency and effectiveness in natural resource management and decision making are not direct results of using tools such as GPS and GIS. These tools must be implemented with adhered-to standard procedures and methodologies for data management and data documentation. Otherwise, management goals and objectives may never be fully realized and the effects from implemented decisions may never be fully quantifiable. The data management concepts presented herein are geared towards facilitating multi-agency efforts related to the adaptive management of Roan Mountain. Since these concepts are applicable to any project, however, the intention was to present them in such a way that the Toolkit can be applied and adapted to other scenarios. The ultimate goal is to allow for those engaged in a project to become better aware of fundamental data management issues that may be outside of their respective areas of expertise.
The greatest challenge of this project was helping natural resource managers, agency biologists and scientists, the non-government community (NGO), and the academic community to realize the importance of data and information management. Approaching this issue in a holistic and collaborative way will greatly enhance the value and utilization of data and information over the long term.
First posted August 27, 2009
For additional information contact:
Part or all of this report is presented in Portable Document Format (PDF); the latest version of Adobe Reader or similar software is required to view it. Download the latest version of Adobe Reader, free of charge.
Burley, T.E., and Peine, J.D., 2009, NBII-SAIN Data Management Toolkit, U.S. Geological Survey Open-File Report 2009–1170, 96 p.
Data Management Toolkit Part A (Sections 1–5)—Project Policy, Approach, and Planning Framework Overview
1 Data Management Toolkit Introduction: Background and Purpose
2 Planning: Project Management Fundamentals
3 Planning: The Benefits of Well-Defined Standards
4 Planning: Strategic Data Management Principles and Guidelines
5 Planning: Data Stewardship for Ensuring Data Longevity
Data Management Toolkit Part B (Section 6)—Elements of Data Management Overview
6 Planning: Data Management Considerations for Meeting Goals and Objectives
Data Management Toolkit Part C (Sections 7–12)—Example Approaches to Specific Elements of Data Management
7 Guidelines for Data Modeling and Design
8 Guidelines for Project Quality Assurance, Development of a QA Plan, and Quality Control
9 Tools, Guidelines, and Work flows for Creation of Federal Geographic Data Committee-Compliant Metadata
10 Geospatial Data Acquisition Guidelines for Quality
11 Documentation Tool # 1—FGDC Bio-Profile Metadata Questions
12 Documentation Tool # 2 – Dublin Core Metadata Questions
Appendix A-FGDC Bio-Profile Cross-walk
Appendix B-Quality Assurance Plan Template
Appendix C-The Top Ten Most Common Metadata Errors