|
dbSEABED
Introduction
The dbSEABED processing system was developed by Jenkins (1997, 2002, 2003) in collaboration with the USGS and others over the past decade. Currently, there is no open-source code for the program. This explanation of the system is intended to give information and guidance about how the data are compiled, integrated, and processed in the usSEABED database.
The dbSEABED system aims to produce a unified mappable database from the multitude of data sets dealing with the seabed. The primary objective of the dbSEABED system is to produce integrated data that can be mapped, analyzed, and visualized. Datasets include both legacy and modern collections, involving data from samplings, and visual inspections. Filtering routines within dbSEABED unify marine geologic data that originally may be disparate in purpose, function, style, and collection or analytical techniques. It works on data files that hold the source data in their original values (except for minor unit changes and phonetically sensible word codes) and provides standardized output data. It is important that users of the PRS and CLC output files understand the parsing process, the meanings of field values, and the limitations of the usSEABED output.
More information about the dbSEABED software can be found on the dbSEABED Web site or in the Frequently Asked Questions section of this publication.
Back to Top
Data Import Methods
Incoming data files number in the hundreds and are diverse in content and format. The process of import begins with manual re-formatting, which usually involves rearranging the data in columns specific to each parameter, such as color, percent sand, seabed description, or multisensor core logger acoustic velocities. These columns are arranged according to a template specific to dbSEABED. Most data are pre-arranged by columns, but in some cases sections of prose may need to be cut into their constituent parts.
New parameters are sometimes encountered as a data set is imported. These new parameters are added to the template at the ends of the appropriate data theme, and the dbSEABED processing software is modified to take the new parameter(s) into account if possible. For future reference and to help editing, the original data are often held as commentary metadata alongside the active data. Some data that are not useful to dbSEABED are held only as commentary metadata.
After import, the data are held in a type of written log arranged according to the nested sequence: data set / site / subbottom depth / subsample. Sites are specified by each new sampling operation. The written log structure is unusual for a database, having more in common with XML-format structure than relational databases. It has distinct advantages for dealing with sea floor sampling data sets, such as:
(i) an algorithm can perform highly useful calculations on the data for each sample, which has made it possible to meet user demands in a timely way despite the complexity and size of the data holdings
(ii) data that are human readable, especially if metadata are interspersed
(iii) it conforms with data structures that are generated by core-loggers at sea and in the laboratory
(iv) it is efficient to import
(v) it is able to cope with variable data quality and incompleteness
(vi) it is low maintenance, nonproprietary, and programs that address it have low cost of entry and are highly adaptable.
The disadvantage of the written log structure is that specialized programs such as those in dbSEABED are needed for conversion to the flat-file formats that most users require. These flat-file formats are provided in this publication.
Back to Top
The Numeric Data Type
A primary function of the processing programs is to read, quality-check, and then report numerical data that have been obtained from laboratory analyses of grain size, composition, color, shear strength, and other parameters. Although we describe these data as "numeric," they also includes coded data such as Munsell color codes. In many cases, the numeric data can be echoed unchanged to outputs (in _EXT files), for instance, in percent sand, average grain sizes, carbonate, and porosities. Checks are performed, however, on whether a value is properly numeric or string, and if it is within plausible ranges. Problems are reported to a diagnostics file that is a basis for quality and completeness checks, with possible corrective edits to the data file (along with explanatory metadata). Data items are often deactivated if they are suspected as incorrect.
The numeric data output to EXT files have had minimal manipulation. The data in grain size analyses (held at their original phi intervals) are summed into gravel, sand, silt, and clay percentages; the median, average and standard deviations are calculated. If grain density is available, bulk densities and water contents are converted to porosities, with the porosity parameter adopted by dbSEABED. Many parameters that are available in the data are not reported to the EXT files, for instance skewness and kurtosis. They may however be obtained from RDB renderings of the data (not available on this publication). The dbSEABED output of Central Grain size is a composite of median (preferred), moment average, and graphical averages. Currently, only the second moment (standard deviation) is directly transferred to the sorting field.
The reporting of consistent mappable values for geotechnical and acoustic parameters is not an easy task. The results of physical property tests are very dependent on experimental setup, such as strain rates, sample preparation, equipment dimensions, and detection of behavioral thresholds for the materials. The shear strength reported from dbSEABED is a composite of penetrometer and vane shear values (undrained, unconfined) in the un-remolded states (that is, for initial failure). Also included for the sake of maximizing mappability are the cohesions from shear box and low-pressure triaxial experiments. P-wave acoustic velocities are reported without regard to the frequencies of measurement. In both cases investigators wanting more specific information on the analyses can refer to the original data and metadata.
The extracted outputs based on numeric and coded data are put out separately from the parsed and calculated results of dbSEABED. It is recognized that some investigators will choose one over the other - or may wish to combine them in different ways. It must also be recognized that rarely can a sensible coverage of the seabed be obtained from the extracted data alone, as it is too sparse.
Back to Top
The Linguistic Data Type
A feature of dbSEABED is its ability to parse word-based descriptive data such as "brown fine sand with abundant shells; seagrass and some pebbles; whiff of h2s". These types of data are held using their original terms although some abbreviation and coding is necessary. Thus dbSEABED is not a natural language parser even for the noun phrase constructions, such as the above description. The ability to handle word-based data greatly extends the power of the system to map the seabed, because on a global average, approximately 85% or more of data characterizing the seabed are word based. Calibrations are performed to validate this process relative to analytical data on the same sediments. A simplified description of the parsing functions is included in this publication.
The dbSEABED program applies these concepts to geological descriptions, using
-
a parser that divides the descriptions into arithmetic equations
- a thesaurus that attaches meanings and memberships to the quantifiers, modifiers, and objects
- a linear weighted assembly of the numerical totals
In the dbSEABED program, word memberships can be defined across many parameters, not just grain size. Fuzzy memberships are best thought of as a measure of truth or possibility (note: not probability).
The outputs are fuzzy memberships of parameters such as mud, grain sizes, carbonate, organic carbon, grain types, sedimentary features, rock and weed coverages, and engineering strengths.
Statistical comparisons can be made between the EXT and PRS data outputs, resulting in calibrations which are an overall guide to the accuracy of the regional mappings, and a highlighting of areas and issues in the data where improvements can be made.
Back to Top
Expansion of Data Coverage
A summary of the theoretical or empirical relationships that are used by dbSEABED to expand the coverages of seabed parameters (often not directly measured or calculated in individual reports) is given in the onCalculation document.
Back to Top
To view files in PDF format, download free copy of Adobe Reader.
|