U.S. Geological Survey Data Series 146
usSEABED: Gulf of Mexico and Caribbean (Puerto Rico and U.S. Virgin Islands) Offshore Surficial Sediment Data Release, version 1.0
dbSEABED is an information processing system, that can perform statistical and individual tests of accuracy across the range of output parameters.
In the case of the thousands of samples where both analytical and descriptive data exists, a statistical comparison can be made between the EXT and PRS data outputs. The results of this calibration are an overall guide to the accuracy of the regional mappings, and a highlighting of areas and issues in the data where improvements can be made. Those improvements involve both the analytical and descriptive raw input data. For example, grain size analyses that appear to be the whole sediment, but are actually the sand fraction or analyses where gravel / shell have been omitted from an analysis.
The EXT and PRS outputs are imported into MS Access and links are created between the two files (based usually on the SampleKey). Entries with null values (-99) in either EXT or PRS are eliminated through a query. This query is brought into MS Excel and used to calculate the frequency distribution of deviations ( + and absolute) and plotted for inspection. Percentile statistics are calculated using the absolute deviation at the 50% (Median Absolute Deviation (MAD)), 68% and 95% percentiles (1s, 2s). Examples of the outputs are shown in the description of usSEABED. For most data sets the percentile statistics are 0.4, 0.8 and 4 phi for the 50, 68, and 95% levels, which may be acceptable over such a diverse set of input data sets but can be improved. An example of this analysis is shown in the figure below, for a data set which is under improvement.
A second way of statistically evaluating the results uses a cross-plot between the EXT and PRS output data in the figure below. This type of plot serves to highlight some of the issues that may reduce the accuracy of dbSEABED with incoming data sets. At the locations A-D these common issues are identified in populations of points:
The programs of dbSEABED have been equipped to detect problematic data, whether by values falling outside plausible limits, or by mismatches between EXT and PRS results. These tools normally do not prevent the problem values being output, but they do report detections to a diagnostics file that is particularly useful in the preparation and cleaning of incoming data sets. The statistical data shown in figure 1 is employed to set the filters, usually at the 68% (1s) level. The original data can then be revisited, checked for issues such as those shown in figure 2, and can be corrected, deactivated or left alone as appropriate.
Issues of accuracy and reliability become apparent as soon as data are integrated. Tools for monitoring the integration process are required, with feedback to the input data, so that improvements can be made in the system.
Basic uncertainties exist in all the incoming data that cannot be reduced and integrative systems cannot proceed past that uncertainty. Parallel studies in dbSEABED have determined on the basis of replicate analyses that analytical data such as grain size analyses (Syvitski and others, 1991) has 1-sigma uncertainties on the order of 4% of the total parameter range, or 0.8 phi. With good maintenance of the data, the outputs from dbSEABED approach those levels of reliability.