usSEABED: East Coast Offshore Surficial Sediment Data Release, usSEABED Data

Identifier symbol and link to USGS Home page

U.S. Geological Survey Data Series 118, usSEABED: Atlantic Coast Offshore Surficial Sediment Data Release, version 1.0

usSEABED Data

The usSEABED database differs from a traditional relational database (RDB) because the data are processed and extended to maximize density and usability, making them more comprehensive for mapping and analysis. A traditional RDB often creates simplistic and sparse data summary coverages with thinly populated and unwieldy tables. The usSEABED database not only treats the usual forms of numerical data but also contains a vast store of data about the seafloor in word-based descriptions that can be rich in information but difficult to quantify, map, plot, or use in comparative analyses or models. The usSEABED database provides numeric values for typical seabed characteristics that are based on these descriptive data as well as numeric analytical data.

The usSEABED database also differs from other marine databases in that it incorporates a wide variety of information about seafloor sediment texture, composition, color, biota, and rocks, seafloor characteristics such as hardness or sediment ripples, acoustic properties, and geochemical and geotechnical analyses. The usSEABED output files are produced in comma-delimited text for ease of use in many software applications and are ready for inclusion into many different GIS, RDB, and other software applications.

How usSEABED is Built

The usSEABED database is built using the dbSEABED processing software created at the University of Sydney (Australia) and the University of Colorado. It has companion databases built along similar lines: auSEABED for Australia, balticSEABED, and a global database, goSEABED. Each of these databases relies on pre-existing data from a variety of sources to mine and extrapolate useful information about the seabed.

The dbSEABED program allows these source data sets to be compiled in a standardized format and integrates information across a series of data themes (Table A) and equipment: physical sampling equipment (grabs, cores, or probes) or remotely sensed sampling (photographs, videos, geophysics, soundings). These data may be numeric lab- or instrument- based textural, acoustic, geochemical, and geophysical data and (or) verbal (linguistic) descriptions of grabs, cores, or photographs, or a combination of any of these.

In the usSEABED database, most data held in these reports are queried for additional information that increases the data density over the seabed, allowing for more complete information. Few source reports contain all data reportable in usSEABED; null values are given in those fields that do not contain reliable information.

Sources of Data

usSEABED relies on previously existing and newly collected data, both published and unpublished, from Federal, State, regional, and local agencies and consortiums, as well as research institutions. For the Atlantic coast, many of the data are from the USGS, including the previously published continental margins data from the early 1960s, and more recent (1990s, 2000s) data published by the USGS.

Data gathered by the National Ocean Service (NOS) during their many sounding surveys during the 1960s to 1990s are included, as archived by the Smithsonian Institution and provided by the National Geophysical Data Center (NGDC). USACE reports, including several from the USACE Coastal Hydraulics Laboratory (formerly CERC), Environmental Protection Agency (EPA) data sets, educational institutions' reports and theses, State geological survey reports, and NOAA reports are also included. Two large data compilations, which archived at NGDC are included: the '073' data compilation, which is primarily from U.S. Navy reports, and the Deck41 data set which is from a wide variety of data sources. A complete reference list (atl_sources.htm) of information for the Atlantic coast is included.

Data duplication may exist within the input Atlantic data sets. Data from the same cruise or site may be published in more than one report or data compilation. Efforts have been made to reduce data duplications in usSEABED. In other instances, data from different sources for a given site may be included if significant additional data are included. (Ror example, one source may report only grain size for a particular site, but another source may include geophysical properties for the same sites/sample.)

Data Themes and Output Files

Seabed data come in a variety of forms, which have information in different parameters. For example, textural analyses may have information for percentages of gravel, sand, mud (silt and clay); statistical measurements such as mean, median, sorting, skewness, and kurtosis often accompany the textural information. Acoustical measurements include various velocities and derived densities. Benthic habitat studies may include a short description of the seafloor sediment type and a numerical survey of animals and plants on the seafloor, or evidence of them.

The original seabed information in usSEABED is entered into different data themes. A list of data themes is given in table A. The thematic basis of the values found in the outputs can be found in field 11 ("DataType")(Table B) of the extracted (_EXT), parsed (_PRS), and calculated (_CLC) output files. Information on contribution of each source report is in the accompanying metadata files.

Output files

This publication provides six usSEABED output data files for the Atlantic Coast.

usSEABED Output Files
_EXT	Extracted (numeric, lab-based)
_PRS	Parsed (word-based)
_CLC	Calculated (calculated variables)
_CMP	Components (content and features)
_FAC	Facies (components only)
_SRC	Source

These files are downloadable from the Data Catalog. An additional output file type, although unpublished, provides quality control for the data and was used extensively prior to publication to debug and test the data. Field parameters for the data files are listed in table B.

Relational keys
The usSEABED data file types are linked relationally by the foreign keys: DataSetKey (for individual data sets), SiteKey (for individual sites), and the SampleKey (for individual analyses). The DataSetKey field gives the relationship of the data to the original source. The tables can be loaded into an RDB, relationships may be constructed, and the tables may be joined using the keys.

Source data (_SRC)
Information about the original data are in the source (_SRC) file, including links to metadata about the original data. Each of the output data files discussed below is linked to the _SRC file by the DataSetKey field.

Textural and other basic information (_EXT, _PRS, _CLC)
Textural, statistical, geochemical, geophysical, dominant component, and color information are held in three separate, but similar, data files, based on the type of data: _EXT, _PRS, _CLC. The three data file types have the same fields (table B) and can be combined for more extensive coverage of the seafloor. It is important for users to understand the inherent limitations of each type of file in order to choose the best data file, or combination of data files, appropriate for a particular use. Other dbSEABED programs can combine the three files in a variety of ways, by concatenation or by telescoping, before they are mapped or used for other types of analysis. For access to these files, please see contact page.

Extracted data (_EXT)
The data file with the _EXT tag is the extracted data: those data from strictly performed, lab-based, numeric analyses. Most data in this file are listed as reported by the source data report; only minor unit changes are performed or assumptions made about the thickness of the sediment analyzed based on the sampler type. Typical data themes include textural classes and statistics (TXR: gravel, sand, silt, clay, mud, and various statistics), phi grain-size classes (GRZ), chemical composition (CMP), acoustic measurements (ACU), color (COL), and geotechnical parameters (GTC). The _EXT file is based on rigorous lab-determined values and forms the most reliable data sets. Limitations, however, exist due to the uncertainty of the sample tested. For example, were the analyses performed on whole samples or only on the matrix, possibly with larger particles ignored?

Parsed data (_PRS)
Numeric data obtained from verbal logs from core descriptions, shipboard notes, and (or) photographic descriptions are held in the parsed data set (_PRS). The input data are maintained using the terms employed by the original researchers and are coded using phonetically sensible terms for easier processing by dbSEABED. Longer descriptions may have the data divided by theme (table A). The descriptions often include information on associated biota, seafloor features, and structure. Typical data themes for the parsed data set are lithologic descriptions (LTH), biology (BIO), color (COL), and (or) seafloor type (SFT, descriptions from photos or videos). The values in the parsed data file are calculated using the dbSEABED parser that assigns field values based on the form and content of a description. See the section on dbSEABED processing and fuzzy set theory for a more complete explanation.

The parsing process has been tested and calibrated by comparing the outputs against analytical results for the same samples. Due to the nature of visual descriptions by observers and the use of fuzzy set theory in the parser, the output data variously show the degree of representation in the sample, or percent abundance values. An assumption in the process is that the output degrees of representation reflect absolute abundances to some degree of accuracy. The calibrations provide information on that accuracy. Although at first sight the descriptive results in the parsed file may seem less accurate than measured values in the extracted file, they are frequently more representative of the sample and seabed as a whole, as they include description of objects such as shells, stones, algae, and other objects (table C) that are a textural component of the seabed and which are often left out of laboratory analyses, particularly when a machine analysis is employed.

Calculated data (_CLC)
For the extracted and parsed data, some values are not reported by the original source but can be calculated directly or estimated by standard derivative equations using assumptions (See Frequently Asked Questions) about the conditions or variables. These values are reported in the calculated ( _CLC) data files. Although the calculated ( _CLC) data can be combined with the extracted and the parsed (table B), they are the least reliable of the three data file types and should be used with caution.

Component/feature and facies data (_CMP, _FAC)
Two usSEABED data files contain information about the presence of certain seafloor features, compositional content, biota, and sediment structure. These use major synonyms defined by the thesaurus in the dbSEABED parsing software, which clusters comparable descriptive terms together (granite represents granite, aplite, granodiorite, pegmatite, while laminated represents laminated, laminations, or lamina). Individual components and features (terms like feldspar, phosphorite, bivalves, seagrass, and wood) are held in the _CMP data file (table D). Appropriately combined components are held in the facies (_FAC) data files (table E). As with the parsed data files, the values held within the _CMP and _FAC files are the results of filters based on fuzzy set membership to chosen sets and represent a measure of truth about the attribute, not percentages or defined values. These files only indicate presence, not absence, of material; it is rare that a report might state, "no bivalves" or "no phosphorite."

The _CMP file contains information about compositional content (individual minerals, rocks), genesis (terrigenous, carbonate), and certain biota. These components are internally evaluated and the value for each attribute is based solely on the relationships of attributes within the original description. The flora and fauna included in the compositional components are those that may have an effect on textural determinations in the _PRS data file, such as halimeda, bivalves, or foraminifera (table C). The values within these attribute fields range between 0 (no membership, probably due to no information), to 100 (complete membership, shell hash = 100 to the shell debris set).

The _CMP file also includes information on seafloor features such as bedforms, fissures, internal structure (bedding, bioturbation), and other flora and fauna. Unlike the compositional content information, which is construed as an abundance within the sample, these attributes are an intensity of development or density of occurrence relative to scales of development or density of occurrence observed elsewhere. The flora and fauna included in the feature category are soft-bodied, for example, those that do not have an input on the textural determination within the _PRS data files, such as kelp, ophiuroids, or annelids. Values within the attribute fields range from 0 (no membership, possibly due to no information) up to 100% (maximum development). In contrast to the situation with component abundances, the sum of feature intensities in a sample is allowed to exceed 100%.

The 100 most common components (number limited by dbSEABED processing software) in the U.S. EEZ are given in the _CMP file, and those attributes with "_F" denote features. table D lists the components and gives basic forms of descriptive terms that may trigger membership for each. Included in this file are 27 components that are included in the facies (_FAC) file only. The dbSEABED thesaurus used for usSEABED is also used for the sister data compilations (auSEABED, BalticSEABED, goSEABED), and the list of trigger terms may include some that are not known in U.S. waters.

The second file, the facies file (_FAC), is created from components only, similar to the _CMP file. This file configures multiple components into appropriate groups or facies, such as igneous, metamorphic, ooze, foraminifera, and others. The dbSEABED processing software is restricted to a maximum of six components per facies. table E lists the facies type and the components that comprise each facies group.

Again, these files only indicate presence, not absence, of material; it is rare that a report might state, "no bivalves" or "no phosphorite". The values within this attribute field range between 0 (no membership, probably due to no information), to 100 (complete membership, for example, schist = 100 to the metamorphic set).

Relationship between the _PRS and _CMP outputs
The dbSEABED processing software recognizes that many skeletonized biota, such as halimeda, rhodoliths, shells (broken and unbroken), and others often comprise a sediment sample. Such biological terms are included in the parsing of the textural values. To see the selected biota with textural implications, see table C . When using the parsed data, it may be important to cross-check with the component file using the relational foreign keys (SiteKey, SampleKey) to determine if biota are to be included in the textural outputs.

Within the _PRS file, the "seabed class" and "class membership" fields indicate the dominant compositional class and the fuzzy set membership of a sample to that class. Other components and mined information may also be listed for that sample in the _CMP file, linked by the relational keys.

Quality Control

Quality control over the data is an iterative process implemented using criteria in the following steps. First, graphical plots of site locations and parameter values are used to detect outliers and edit them appropriately. Each data set is viewed in a GIS to ensure that data locations are reasonable relative to survey extents; those sites with unresolvable location issues or known incomplete analyses are deactivated and are not included in the usSEABED output files. (Note: usSEABED does contain a small number of onshore samples.) This step may be optional depending on the data set. Older sets may require more scrutiny at this step, whereas newer or well exercised data sets require less.

Second, built-in filters in the dbSEABED processing software detect implausible values for numeric fields, unknown verbal terms, incomplete analyses (for example, Gravel-Sand-Silt Clay (mud) ((GSSC(m)) greater than 100% or less than 95%), and incorrect field types (string or number). The software also detects samples that seem to belong to a core though they are described as independent samples. For the parsing of verbal descriptions, all terms must be known to the dbSEABED data processing program, with values assigned; those analyses that fail this test have null values given to all appropriate fields. Edits are made to the data (i.e., at the level of the usSEABED input data files) and metadata are entered explaining the changes. The edits (or deactivations) are then taken into account in the next dbSEABED program run.

Finally, output data are analyzed in a GIS to test whether the data outputs "make sense" for a given geographic area. Users of the output data should, however, note the limitations imposed by the source data sets as to navigational precision, sampler type, and analytical technique.

See the dbSEABED section and the Frequently Asked Questions for details about the usSEABED data mining program and the application of fuzzy set theory.

Spatial and Temporal Uncertainties

Metadata are available for each source report and are linked through the ATL_SRC.txt, which gives information about navigational, sampling, and processing techniques where known. In cases where no original metadata were available, metadata were created based on existing available information accompanying the data. Of particular importance, site locations are as given in the original sources, with uncertainties due to navigational techniques and datums ignored in the usSEABED compilation. As many reports are decades old, users of usSEABED should use their own criteria to determine the appropriateness of data from each source report for their particular purpose and scale of interest.

As a caution in using the usSEABED database in depicting seabed sedimentary character or creating seafloor geologic maps, users should aware that all seafloor regions are by their nature dynamic environments and subject to a variety of physical processes such as erosion, winnowing, reworking, and sedimentation or accretion that vary on different spatial and temporal scales. In addition, as with any such database, usSEABED is comprised of samples collected and described and analyzed by many different organizations and individuals over a span of many years, providing inherent uncertainities between data points. Plotting the data can also introduce uncertainties that are largely unknown at this time. In addition, there are uncertainties in data quality associated with both the extracted data (numeric/ analytical analyses) and parsed data (word-based descriptions). The authors are aware that on occasion grain-size analyses are done solely on the sand fraction, excluding coarse fractions such as shell fragments and gravel, while word descriptions of sediment samples can emphasize or de-emphasize the proportion of fine or coarse sediment fraction, or disregard other important textural or biological components. The authors have done their best to select the best quality data for inclusion in usSEABED and encourage users to view the provided metadata files for information about individual sources' limitations, date of collection, and other pertinent information.

Users are encouraged to view the entire document before downloading the data files in the Data Catalog.

To view files in PDF format, download free copy of Adobe Reader.