|
usSEABED (Data)
The usSEABED database differs from a traditional relational database (RDB) because the data are processed and extended to maximize density and usability, making them more comprehensive for mapping and analysis. A traditional RDB often creates simplistic and sparse data summary coverages with thinly populated and unwieldy tables. The usSEABED database not only treats the usual forms of numerical data but also contains a vast store of data about the sea floor in word-based descriptions that can be rich in information but difficult to quantify, map, plot, or use in comparative analyses or models. The usSEABED database provides numeric values for typical seabed characteristics that are based on these descriptive data as well as numeric analytical data.
The usSEABED database also differs from other marine databases in that it incorporates a wide variety of information about sea-floor sediment texture, composition, color, biota, and rocks, and sea-floor characteristics such as hardness or sediment ripples, acoustic properties, and geochemical and geotechnical analyses. The usSEABED output files are produced in comma-delimited text for ease of use and are ready for inclusion into many different GIS, RDB, and other software applications.
How usSEABED is Built
The usSEABED database is built using the dbSEABED processing software created at the University of Sydney (Australia) and the University of Colorado. It has companion databases built along similar lines: auSEABED for Australia, balticSEABED, and a global database, goSEABED. Each of these databases relies on preexisting data from a variety of sources to mine and extrapolate useful information about the seabed.
The dbSEABED program allows these source data sets to be compiled in a standardized format and integrates information across a series of data themes (table 1), physical sampling equipment (grabs, cores, or probes) or remotely sensed sampling (descriptions from photographs and videos, geophysics, soundings). These data may be numeric lab- or instrument-based textural, acoustic, geochemical, and geophysical data and (or) verbal (linguistic) descriptions of grabs, cores, or photographs, or a combination of any of these.
In the usSEABED database, most data held in these reports are mined for additional information that increases the data density over the seabed, allowing for more complete information. Few source reports contain all data reportable in usSEABED; null values are given in those fields without data.
Back to Top
Sources of Data
usSEABED relies on previously existing and newly collected data, both published and unpublished, from Federal, State, regional, and local agencies and consortiums, as well as research institutions. For the Pacific coast, many of the data are from the USGS, including published and unpublished data from the 1980s to 2000s.
Data gathered by the NOAA National Ocean Service (NOS) during their many sounding surveys in the 1960s to 1990s are included, as archived by the Smithsonian Institution and provided by the National Geophysical Data Center (NGDC). Theses and dissertations from many universities, reports from University of California (Berkeley) Hydraulic Engineering Lab, University of Washington's data publications of the 1960s and 1970s, U.S. Army Corps of Engineers (USACE) reports, and local harbor and U.S. Navy (USN) reports are also included. A large data compilation, Deck 41, archived at NGDC is included, which is from a wide variety of data sources. A complete data sources list for usSEABED along the Pacific coast is included.
Although efforts have been made to reduce data duplications within usSEABED, they may exist within the input Pacific data sets as data from the same cruise or site may be published in more than one report or data compilation. For example, NGDC's Deck 41 compilation contains information for several west coast sources; sites are decommissioned in our Deck 41 dataset where these sources are included within usSEABED under the original sources. In other instances, data from different sources for a given site may be included if significant additional data are included. For example, one source may report only grain size for a particular site, but another source may include geophysical properties for the same sites/sample.
Back to Top
Data Themes and Output Files
Seabed data come in a variety of forms, which have information in different parameters. For example, textural analyses may have information for percentages of gravel, sand, and mud (silt and clay); statistical measurements such as mean, median, sorting, skewness, and kurtosis often accompany the textural information. Acoustical measurements include various velocities and derived densities. Benthic habitat studies may include a short description of the sea-floor sediment type and a numerical survey of animals and plants on the sea floor or evidence of them.
The original seabed information in usSEABED is entered into different data themes. A list of data themes is given in table 1. The thematic basis of the values found in the outputs can be found in the DataType field of the extracted (EXT), parsed (PRS), and calculated (CLC) output files. Information on the contribution of each source report is in the accompanying source metadata files.
Back to Top
Output Files
This publication provides six usSEABED output data files for the Pacific Coast.
Table 2. usSEABED Output files
Data File |
Contents |
EXT |
Extracted (numeric, lab-based) |
PRS |
Parsed (word-based) |
CLC |
Calculated (calculated variables) |
CMP |
Components (content and features) |
FAC |
Facies (components only) |
SRC |
Source |
These files are downloadable from the Data Catalog. An additional output file type, although unpublished, provides quality control for the data and was used extensively prior to publication to debug and test the data. Field parameters for the data files are listed in table 3, table 4, and table 5.
Back to Top
Relational keys
The usSEABED data file types are linked relationally by the foreign keys: DataSetKey (for individual data sets), SiteKey (for individual sites), and the SampleKey (for individual analyses). The DataSetKey field gives the relationship of the data to the original source. The tables can be loaded into an RDB, relationships may be constructed, and the tables may be joined using the keys.
Source data (SRC)
Information about the original data are in the source (SRC) file, including links to metadata about the original data. Each of the output data files discussed below is relationally linked to the data source (SRC) file by the DataSetKey field. Information on data sources is also provided in a more traditional bibliographic format.
Textural and other basic information (EXT, PRS, CLC)
Textural, statistical, geochemical, geophysical, dominant component, and color information are held in three separate, but similar, data files, based on the type of data: EXT, PRS, CLC. The three data file types have the same fields (table 3) and can be combined for more extensive coverage of the sea floor.
It is important for users to understand the inherent limitations of each type of file in order to choose the best data file, or combination of data files, appropriate for a particular use. Other dbSEABED programs can combine the three files in a variety of ways, by concatenation or by telescoping, before they are mapped or used for other types of analysis. For access to these files, please see contact page.
Extracted data (EXT)
The data file with the EXT tag is the "extracted" data: those data from strictly performed, lab-based, numeric analyses. Most data in this file are listed as reported by the source data report; only minor unit changes have been performed. In some cases, assumptions may be made about the thickness of the sediment analyzed based on the sampler type. Typical data themes include textural classes and statistics (TXR: gravel, sand, silt, clay, mud, and various statistics), phi grain-size classes (GRZ), chemical composition (CMP), acoustic measurements (ACU), color (COL), and geotechnical parameters (GTC). The EXT file is based on rigorous lab-determined values and forms the most reliable data set. Limitations, however, exist due to the uncertainty of the sample tested; for example, were the analyses performed on whole samples or only on the matrix, possibly with larger particles ignored?
Parsed data (PRS)
Numeric data mined from verbal logs, core or grab descriptions, shipboard notes, and (or) photographic descriptions are held in the parsed data set (PRS). The input data are maintained using the terms employed by the original researchers and are coded using phonetically sensible terms for easier processing by dbSEABED. Longer descriptions may have the data divided by theme (table 1). The descriptions often include information on associated biota, sea-floor features, and structure. Typical data themes for the parsed data set are lithologic descriptions (LTH), biology (BIO), color (COL), and (or) sea floor type (SFT, descriptions from photos or videos). The values in the parsed data file are calculated using the dbSEABED parser that assigns field values based on the form and content of a description. See the section on dbSEABED processing and fuzzy set theory for a more complete explanation.
The parsing process has been tested and calibrated by comparing the outputs against analytical results for the same samples. Due to the nature of visual descriptions by observers and the use of fuzzy set theory in the parser, the output data variously show the degree of representation in the sample or percent abundance values. An assumption in the process is that the output degrees of representation reflect absolute abundances to some degree of accuracy. The calibrations provide information on that accuracy. Although at first sight the descriptive results in the parsed file may seem less accurate than measured values in the extracted file, they are frequently more representative of the sample and seabed as a whole, as they include description of objects such as shells, stones, algae, and other objects (table 6) that are a textural component of the seabed and are often left out of laboratory analyses, particularly when a machine analysis is employed.
Calculated data (CLC)
For the extracted and parsed data, some values are not reported by the original source but can be calculated directly or estimated by standard derivative equations using assumptions (see Frequently Asked Questions) about the conditions or variables. These values are reported in the calculated (CLC) data files. Although the CLC data can be combined with the extracted and the parsed data (table 3), they are the least reliable of the three data-file types and should be used with caution.
Component/feature and facies data (CMP, FAC)
Two usSEABED data files contain information about the presence of certain sea-floor features, compositional content, biota, and sediment structure. These use senior synonyms defined by the thesaurus in the dbSEABED parsing software, which clusters comparable descriptive terms together (granite represents granite, aplite, granodiorite, pegmatite, whereas laminated represents laminated, laminations, or lamina). Individual components and features (terms like feldspar, phosphorite, bivalves, seagrass, and wood) are held in the CMP data file (table 4). Appropriately combined components are held in the facies (FAC) data files (table 5). As with the parsed data files, the values held within the CMP and FAC files are the results of filters based on fuzzy-set membership to chosen sets and represent a measure of truth about the attribute, not percentages or defined values. These files only indicate presence, not absence, of material; it is rare that a report might state, "no bivalves" or "no phosphorite."
The CMP file contains information about compositional content (individual minerals, rocks), genesis (terrigenous, carbonate), and certain biota. These components are internally evaluated and the value for each attribute is based solely on the relationships of attributes within the original description. The flora and fauna included in the compositional components are those that may have an effect on textural determinations in the PRS data file, such as halimeda, bivalves, or foraminifera (table 6). The values within these attribute fields range between 0 (no membership, possibly due to no information) to 100 (complete membership, shell hash = 100 to the shell debris set).
The CMP file also includes information on sea-floor features, such as bedforms, fissures, internal structure (bedding, bioturbation), and other flora and fauna. Unlike the compositional content information, which is construed as an abundance within the sample, these attributes are an intensity of development or density of occurrence relative to scales of development or density of occurrence observed elsewhere. The flora and fauna included in the feature category are soft-bodied, for example, those that do not have an input on the textural determination within the PRS data files, such as kelp, ophiuroids, or annelids. Values within the attribute fields range from 0 (no membership, possibly due to no information) up to 100 (maximum development). In contrast to the situation with component abundances, the sum of feature intensities in a sample is allowed to exceed 100.
The 100 most common components (number limited by dbSEABED processing software) in the U.S. EEZ are given in the CMP file, and those attributes with "_F" denote features. Table 4 lists the components and gives basic forms of descriptive terms that may trigger membership for each. Included in this file are 27 components that are included in the facies (FAC) file only. The dbSEABED thesaurus used for usSEABED is also used for the sister data compilations (auSEABED, BalticSEABED, goSEABED), and the list of trigger terms may include some that are not known in U.S. waters.
The second file, the facies file (FAC), is created from components only, similar to the CMP file. This file configures multiple components into appropriate groups or facies, such as igneous, metamorphic, ooze, foraminifera, and others. The dbSEABED processing software is restricted to a maximum of six components per facies. Table 5 lists the facies type and the components that comprise each facies group.
Again, these files only indicate presence, not absence, of material; it is rare that a report might state, "no bivalves" or "no phosphorite." The values within this attribute field range between 0 (no membership, possibly due to no information) to 100 (complete membership, for example, schist = 100 to the metamorphic set).
Back to Top
Relations between the PRS and CMP outputs
The dbSEABED processing software recognizes that many skeletonized biota, such as halimeda, rhodoliths, shells (broken and unbroken), and others often constitute a sediment sample. Such biological terms are included in the parsing of the textural values. The selected biota with textural implications are listed in table 6. When using the parsed data, it may be important to crosscheck with the component file using the relational foreign keys (SiteKey, SampleKey) to determine if biota are to be included in the textural outputs.
Within the PRS file, the 'seabed class' and 'class membership' field indicate the dominant compositional class and the fuzzy-set membership of a sample to that class. Other components and mined information may also be listed for that sample in the CMP file, linked by the relational keys.
Back to Top
Quality Control
Quality control over the data is an iterative process implemented using criteria in the following steps. First, graphical plots of site locations and parameter values are used to detect outliers and edit them appropriately. Each data set is viewed in a GIS to ensure that data locations are reasonable relative to survey extents; those sites with unresolvable location issues or known incomplete analyses are deactivated and are not included in the usSEABED output files. (Note: usSEABED does contain a small number of onshore samples.) This step may be optional depending on the data set. Older sets may require more scrutiny at this step, whereas newer or well-exercised data sets require less.
Second, built-in filters in the dbSEABED processing software detect implausible values for numeric fields, unknown verbal terms, incomplete analyses (for example, Gravel-Sand-Silt Clay (mud) (GSSC(m)) greater than 105% or less than 95%), and incorrect field types (string or number). The software also detects samples that seem to belong to a core though they are described as independent samples. For the parsing of verbal descriptions, all terms must be known to the dbSEABED data processing program, with values assigned; those analyses that fail this test have null values given to all appropriate fields. Edits are made to the data at the level of the usSEABED input data files and metadata are entered explaining the changes. The edits (or deactivations) are then taken into account in the next dbSEABED program run.
Finally, output data are analyzed in a GIS to test whether the data outputs "make sense" for a given geographic area. Users of the output data should, however, note the limitations imposed by the source data sets as to navigational precision, sampler type, and analytical technique.
As issues about the data or the data processing may be discovered, errata will be posted on the usSEABED website. Corrections will be included in the next version of the publication.
See the dbSEABED section and the Frequently Asked Questions for details about the usSEABED data mining program and the application of fuzzy set theory.
Back to Top
Spatial and Temporal Uncertainties
Users of usSEABED data are reminded that many seafloor regions are, by their nature, dynamic environments subject to a variety of physical processes, such as erosion, winnowing, reworking, and sedimentation or accretion that vary on different spatial and temporal scales, and sea-floor samples may represent a only moment in time. Because usSEABED is comprised of samples collected, described, and analyzed by many different organizations and individuals over a span of years, metadata are provided for each source report, linked both through the bibliographic list of data sources and the relational link DataSetKey in the output files. In cases where original metadata are not available from the data source, metadata were created based on available information accompanying the data. Of particular importance, site locations are as given in the original sources, with uncertainties due to navigational techniques and datums ignored in the usSEABED compilation. As many reports are decades old, users of usSEABED should use their own criteria to determine the appropriateness of data from each source report for their particular purpose and scale of interest.
In addition, there are uncertainties in data quality associated with both the extracted data (from lab-based analytical analyses) and parsed data (word-based descriptions). It may be that grain-size analyses are done solely on the sand fraction excluding coarser material, such as shell fragments and gravel, while word descriptions of sediment samples may emphasize the proportion of a sediment fraction over another and may disregard other important textural or biological components. Detailed information about issues such as these are noted in the source metadata files and known incomplete data are decommissioned in usSEABED.
Users are encouraged to view the entire document before downloading the data files in the Data Catalog and should refer to the provided metadata files for information about individual sources' limitations, date of collection, and other pertinent information.
To view files in PDF format, download free copy of Adobe Reader.
|