Contaminated Sediments Database for the Gulf of Maine, OFR 02-403 | ||||||||||||||
Home/Abstract Site Map Introduction Content Overview How to Reach Us Database Construction How to Access the Data Data Utilization Data Tables & Maps Geographic Context & Outside Links References Cited Collaborators Acknowledgements DISCLAIMER |
Database ConstructionThe methods used to construct the database included: locating data and references, defining data types, screening and entry of data, editing and validation of data, placement of data into a geographic and physiographic context, and transfer of information to users of the data. A summary of the database structure and content is given in the section "Content Overview." Please read the following text for details. CollaborationThis database of existing data on chemical contaminant concentrations in sediment for the Gulf of Maine region was compiled with the collaboration and cooperation of many scientists, agencies, and institutions. The participation of the research and regulatory community in defining communal goals, in determining what measurements were important to record, and in assessing how to judge the quality of the rescued data results in products that meet the needs of the Gulf of Maine community. A listing of parameters to include in the database was agreed on and training in data screening and entry was provided to the participants. Principal collaborators, and their assistants or students, were responsible for locating references and entering data within their geographic or topic area. The compiled entries were reviewed as a batch by USGS staff for completeness and quality using iterative validation and screening methods (Manheim and Hathaway, 1991; Manheim et al., 1998). Entries that were identified by the validation process as questionable, data that needed repair, and samples with sparse documentation of quality criteria were reviewed again and appropriate comments were made in the database about these samples. Each collaborator was familiar with the content and structure of the database and could serve as a resource for others in the region on how to utilize searches, graphical displays, and comments to select and use data for specific needs. Data and references
Data contained in the database originated from many
sources (Table
1). The
USGS. completed searches of existing bibliographies and electronic searches
of the American Geological Institute's Geoscience Database (GEOREF),
the Aquatic Sciences and Fisheries Abstracts (ASFA), and the National
Technical Information Search (NTIS) listings. The ASFA and GEOREF searches
identified most of the papers in the peer-reviewed literature that contained
significant amounts of data. The NTIS search identified many governmental
agency documents that have limited distribution. Such in-house and consultant
reports are commonly referred to as "gray literature". Keywords
used for the searches included major locations, elements or compounds,
and likely general terms. Records held in existing bibliographies,
funding agencies, institutions, libraries, and individual contact with
scientists and regulators working in marine sciences throughout the
Gulf of Maine were used to identify additional documents likely to contain
data on contaminants in sediments. Bibliographies reviewed for documents include Regional Association for Research on the Gulf of Maine (RARGOM, 1997), Massachusetts Bays (Massachusetts Institute of Technology (MIT) Sea Grant for Coastal Resources, n.d.), Great Bay (Ward and Pope, 1994), and Bay of Fundy collections (Conservation Council of New Brunswick, 1993). When an existing compilation of historical
data was available (e.g., Metcalf & Eddy,
1984; Cahill and Imbalzano, 1991), the data was transferred electronically
and verified with the original data source when possible. Data held
in agency databases was also transferred electronically, and associated
information about data quality was acquired from published documents
and discussions with scientists at these agencies. Agency databases
that were utilized include: the NOAA
Status and Trends Program (NOAA, 1988),
the Massachusetts
Water Resources Authority´s Monitoring Program and the US
Army Corps of Engineers´ permit and dredging programs (New England District,
Concord, MA, (Buchholtz
ten Brink and others, 1992). Documents
containing data included in the database were cited in each data table
(under "Source of Information or Reference") and full bibliographic
references are given in the References. The
database contains linked information to aid users in locating original
data sources and paper copies are archived at the U.S. Geological Survey
in Woods Hole, MA. The compiled bibliographic information also includes
related references that did not contain original data on contaminants
in sediments. Measurements of major elements, trace elements, metals, or organic contaminant compounds on whole sediments within the Gulf of Maine were compiled. Those for measurements in sediment fractions, waters, pore waters, or biota were not. The geographic area for sample inclusion is the marine region bounded on the south by Cape Cod, MA., on the east by Georges Bank, on the north by Nova Scotia, and on the west by coastal New England. Some references containing samples in contiguous wetlands, river estuaries, Georges Bank, and the Bay of Fundy were collected; however not all samples from these peripheral areas were entered in this edition of the database nor was the literature scrutinized for data from these areas. The Database of Contaminated Sediments for the Gulf of Maine (Vol. 1) has attempted to comprehensively retrieve analytical data for sediment samples collected from 1950 through 1995; some omissions are inevitable. Data sets for more recent samples (some through 1998) that could be transferred electronically are included; however, newer documents that require hand-entry into the database are not in the current compilation. We maintain a listing of potential data sources and we ask that omissions, mistakes, supplementary information, and new data be brought to our attention. Ancillary dataIn addition to discrete contaminant measurements,
the database includes documentation about sample collection, analytical
methods, and other information that is required to assess the quality
of the reported data. The heterogeneity of the data sources has resulted
in a wide range of accuracy and precision for the data that is compiled.
Scientific editing of the data (see Data Validation section, below)
has identified some clerical or omission problems and permitted many
of them to be repaired. Commentary and qualifier information is provided
throughout the database to assist users in deciding which data are appropriate
for their specific application. Database StructureThe Contaminated Sediment Database has a flat-file (spreadsheet) structure, with samples in the vertical dimension and properties in the horizontal dimension. The database is subdivided into six data tables in order to accommodate more than 800 fields without exceeding spreadsheet limitations. Each sample in the database occupies a record (row). Each sample record is linked across the tables by a unique identification number (Sample ID) that is assigned when the data is entered, and by a citation to the original source. This structure is flexible. It allowed unlimited addition of fields as new data types were encountered. It also provided a single structure for data entry, for data processing, and for data output in a format suited for immediate data plotting and evaluation using widely-accessible commercial software. Requirements for special database management skills were minimized. The flat-file structure maximizes flexibility and transportability at the expense of compactness and structured query capabilities. Since software and data manipulation capabilities are changing rapidly, the database in its present structure can be imported into database management software of choice by the user. Data Dictionary and Database TablesData Dictionary The Data Dictionary defines the parameters that are in each data field included in the six data tables (Table 2). These tables contain information about the sample location and collection, measurements in sediments of inorganic chemicals, general organic compounds, polychlorinated biphenyls (PCB) and pesticides, polyaromatic hydrocarbons (PAHs), and grain size. These linked tables are supplemented by separate glossary and reference tables (see Content Overview). The glossary includes abbreviations, methods and devices, and other lists compiled during the construction of the database. The full Data Dictionary, in vertical format, provides field names for each parameter in three columns, with short field name (10 characters), medium field name (25 characters) and a definition of the field. This choice of format is provided to accommodate restrictions that may be imposed by a variety of software types that are used in the community. The fields within each table, and their full definitions in the Data Dictionary, are organized by subcategory, and are further organized alphabetically within subcategories. The Data Dictionary is a working and evolving document that provides detailed definitions of parameter fields, codes and abbreviations. It is suggested that the user print these files and keep them handy while inspecting or extracting data. Table 2. Organization of the Data Tables in the Contaminated Sediments Database
Information preservation This compilation aims to preserve the information
that is reported in the original references yet make it homogenous enough
to compile and manipulate. Most text fields in the database accept unrestricted
entry (except for text length) and there are numerous fields throughout
the tables for qualifiers and comments about the data and the sample.
The Working Dictionary and the
Glossary (the alphabetized Working
Dictionary) are metadata for the Data Dictionary. They were used
to record abbreviations, types of methods or devices used, new parameters,
data-entry logs, codes, and similar tables about the descriptive information
entered into the database during compilation. Entries were assigned
for a limited number of interpretive and coded fields in order to aid
in comparing heterogeneous data. For example, "collection depth"
separates "surface samples", which are defined as having more
than 80% of their length above 6 cm in depth, from subsurface samples
and samples with unknown depth. All available information was used to
assign coded fields: geographic location (Area Code), depth in sediment
(Depth Code), sampling device (Core or Grab), type of analysis recorded
in the Database (Metals & Other Inorganics, Organic Contaminants,
Grain Sizes) and availability of related data (Bioassay Data, Other
Analysis, Other references). The "row number" field, which
is present at the beginning of each table, is used for organization
and sorting and can be changed by the user. The contents of most fields in the Database are suggested by their names, and all fields are fully defined in the Data Dictionary. The following comments focus on selected fields in the tables that are especially important or need explanation. Station data:
sample identity, location, and documented
source Analytical data: common features The data tables (Inorganic, General organics, PCB and pesticides, PAHs, and Texture), follow the Station Table and have a common format: The "Unique ID#" and "Source of Information or Reference" fields are at the beginning of each table of analytical data. Next follows specific laboratory and analytical method information that pertains to all or many of the chemical entities reported in the table for a given source. Both instruments and procedures are noted and quality data for groups of compounds may be consolidated here. Last are the analytical data reported for each sample and each parameter´s qualifier fields. Chemical fields usually have a field for concentration values and specific units, a field for detection limit for the method and component, and a qualifier field that may contain quality or other annotations. Qualifiers include notes on measurements than fell below detection limits, reported detection limits, duplicate measurements, corrected measurements, original reported units, questionable values, editorial or data quality notes, and explanatory comments. Associating quality-control data with analytical values decreases the likelihood that information about data quality will be lost or ignored during data retrieval. Measurements that were made but could not be quantified (values were below limit of detection) were entered as zero. Cells were left blank where no data was available.
Inorganic data: major and trace elements, and other inorganic properties There are some parameters listed are in the Data Dictionary that have no entries in the Database; e.g., surface area, resistivity, pH, acid volatile sulfides, and radiochemical and isotopic data. These properties can effect the fate and transport of contaminants in marine sediments but the data not identified in the compiled references. Such supporting analyses may have been measured as part of a project but reported in a different reference that was not available at the time of data entry. Organic data: changing methods, bulk organic properties, and organic contaminants Improvements in analytical methods for organic contaminants over time have resulted in a decrease of broad-scope measurements like "total PCBs" and an increase in analysis for specific organic compounds. The names of organic compounds, such as are reported in the table of polyaromatic hydrocarbons (PAHs), polychlorinated biphenyls (PCBs) and pesticides, are those cited in original data and are arranged categorically and alphabetically. Microbial contaminants and organotins are also recorded in this table but total and organic carbon is recorded in the inorganic data table. Many organic contaminants are known and reported by more than one name; however, the Chemical Abstract Registry Number (CAS #) is also given for compounds whenever possible. Naming protocols may be confusing: specific organic compounds may be reported as total, sums of certain groups, or with names that differ slightly from those listed here. For example, "Fluorene", "C1-Fluorene", "C2-Fluorene", and "Fluorenes" are different measurements. In this database, results are separated where there is ambiguity about their equivalence. Data users should carefully consider information recorded in the methodology and qualifier sections, consult original sources if necessary, and use caution when comparing organic contaminant data from differing sources and years. Texture data: sediment grain size and lithology Information in the texture table can be used to better understand the geologic context in which contaminants are found and the impact which they might have in situ. Sediment grain size (texture) data were originally generated by a variety of methods (Poppe et al, 2000) that can result in non-equivalent units for grain-size measurements. The percentages of sediment in gravel, sand, silt, or clay-size classifications were calculated from sieve-size information according to standard geological boundaries (if data allowed) when the breakdowns were not reported in the source documents. Straightforward conversions between geological grain-size norms and those used for many engineering applications are not possible. Users should consider information recorded in methodology and qualifier sections for samples prior to use of data, consult original sources if necessary, and use caution when comparing grain size data from differing sources. References for the Contaminated Sediments Database Reference Tables provide full bibliographic citations for: 1) sources of compiled data; 2) other references reviewed for data content; 3) documents and bibliographies pertinent to Contaminants in Gulf of Maine Sediments; and 4) references cited in this publication. The Gulf of Maine Database Bibliography lists documents from which data was compiled. The tabular (Excel) file contains both the full citation and the short citation, which is given in the data tables under "Source of Information or Reference". The List of Additional References Reviewed for Data (download below) lists additional references that do not contain samples entered in the database but were reviewed for measurements of contaminants for whole sediments from the Gulf of Maine. These include documents that contained: measurements of related parameters but had no contaminant data; measurements of contaminants in biota, waters, or fractionated sediments; samples outside the study area; synthesized data that was previously reported elsewhere; and new reports. Extensive documentation about sub-areas in the Gulf of Maine is available from a number of libraries in the region. Documents that are referred to in the text of this publication, "Contaminated Sediments Database for the Gulf of Maine", are given in the List of Citations in this Publication. The paper-trail information that is in the station table, reference tables, and data tables may be useful for selecting and evaluating data and for locating the original sources. List of Additional References Reviewed for DataReferences pertinent to contaminants in the Gulf
of Maine that do not contain data entered in the database
but were reviewed for measurements of
contaminants Data compilation Completeness of reporting and missing critical information Data was compiled from references that were originally created for a variety of purposes. Consequently, there was a wide range in the amount of detail that accompanied the contaminant data. Latitude and longitude (in some form) was reported for 96% of the samples (see Statistics About Database Content), as was sampling year; whereas only 83% of the samples had information about the depth of sediment that was sampled. The percent of samples having sampling or analytical methods reported was significantly less. Attempts were made to contact originating laboratories, principal investigators, and identify companion publications in order to locate critical information about the methods and accuracy of the sample collection and analysis. The absence of such data precludes use of the contaminant measurements for many applications since differing methodologies (e.g., acid leach vs. total sediment digestion) can generate data that may not be directly comparable, or for which the accuracy differs significantly (e.g., older vs. recent measurements of organic contaminants). Text entries that were made in the methods fields and the parameter qualifier (or comment) fields document what information was given in the reference, note that found elsewhere, and indicate where seriously comprised data occur. Identification and verification of questionable data A batch validation technique (Manheim, et al., 1998, PDF format: To view files in PDF format, download free copy of Adobe Acrobat Reader.) was used to identify data that may have been erroneously recorded or not measured correctly. The compiled data was systematically sorted and plotted to aid in identification of outliers. Histograms, ratio plots, and area maps were used to define "normal" sample distributions from the compiled Gulf of Maine samples and also from the NOAA Status and Trends national dataset. Data falling outside the criteria (Table 3) were flagged for further inspection. Reasonable explanations for the data were found in some cases, such as extremely high contaminant concentrations found in proximity to a contaminant source, or values with very low detection limits originating in a specialized research laboratory. Sometimes, no explanation or further reason to suspect the data could be found; but more often, a source of error could be identified. In many cases, such as for typographical or conversion errors, the data could be repaired. Repairing data and documentation of data qualifiers Qualifiers given in the references, such as detection limits or descriptions of collection and analysis, were recorded in the database. Repaired data included samples which had missing information that was subsequently located, samples reported as measured values that were verified to be detection limit entries, samples with unit or format conversion mistakes, and typographical errors. The repaired values were generally placed in the parameter field and the reported value placed in the qualifier field with an explanation. Data confirmed to be of exceedingly poor quality were also placed in the qualifier field. Editorial comments were entered for samples or analysis that triggered criteria for questionable data that could not be resolved or repaired. Representative qualifier comments are shown in Table 4. The presence of these comments does not mean that the data cannot be utilized, rather, it indicates that the user should make individual decisions as to whether the sample was collected, analyzed, and reported with an accuracy that is appropriate for the desired application. We have tried to be comprehensive and thorough in identifying data sources, compiling the data, and validating the heterogeneous data contained in the database. Some omissions or errors are inevitable, though, so we ask that you bring these to our attention.
Database
access and data
utilization techniques |
|||||||||||||
|