usSEABED: East Coast Offshore Surficial Sediment Data Release, Calibration

Identifier symbol and link to USGS Home page

U.S. Geological Survey Data Series 118, usSEABED: Atlantic Coast Offshore Surficial Sediment Data Release, version 1.0

Calibration

dbSEABED is an information processing system, that can perform statistical and individual tests of accuracy across the range of output parameters.

Statistical Tests

In the case of the thousands of samples where both analytical and descriptive data exists, a statistical comparison can be made between the _EXT and _PRS data outputs. The results of this calibration are an overall guide to the accuracy of the regional mappings, and a highlighting of areas and issues in the data where improvements can be made. Those improvements involve both the analytical and descriptive raw input data. For example, grain size analyses that appear to be the whole sediment, but are really only of the sand fraction or analyses where gravel / shell have been omitted from an analysis.

The EXT and PRS outputs are imported into MS Access and links are created between the two files (based usually on the SampleKey). Entries with null values (-99) in either _EXT or _PRS are eliminated through a query. This query is brought into MS Excel and used to calculate the frequency distribution of deviations ( + and absolute) and plotted for inspection. Percentile statistics are calculated using the absolute deviation at the 50% (Median Absolute Deviation (MAD)), 68% and 95% percentiles (1s, 2s). Examples of the outputs are shown in the description of usSEABED. For most data sets the percentile statistics are 0.4, 0.8 and 4 phi for the 50, 68, and 95% levels, which may be acceptable over such a diverse set of input data sets but can be improved. An example of this analysis is shown in the figure below, for a data set which is under improvement.

Figure 1. Statistical calibration of outputs for grain sizes, shown using a frequency plot of the deviations between PRS and EXT data, using the improving Atlantic data set. Deviations are the result of inaccuracies in EXT and PRS input data, as well as to identifiable issues in the data as highlighted below and in figure 2. The 50, 68, and 95 percentile confidences are 0.8, 1.3, 4.3 phi, respectively. Click on figure for larger image.

A second way of statistically evaluating the results uses a cross-plot between the _EXT and _PRS output data in the figure below. This type of plot serves to highlight some of the issues that may reduce the accuracy of dbSEABED with incoming data sets. At the locations A-D these common issues are identified in populations of points:

PRS coarser than GRZ, apparently due to outsized shells / clasts being omitted from lab grain size analyses;
sediment described as very fine in PRS, but only the sand fraction is represented by the EXT analysis data;
detailed analyses of grain sizes does not go beyond coarsest class of about -2.5 phi;
descriptive PRS data does not distinguish grades of sand, and is apparently dominated by reports of very large clasts, such as cobbles or shell, that were not analyzed.

Figure 2. Cross-plot of the EXT and PRS output data for a region, not in U.S. waters. A-D represent populations of data that have commonly encountered problems. With the detection and fixing of these problems the accuracy of outputs is substantially improved over that shown here. Notice that overall PRS extend further in coarse grades, and EXT further in fine grades, reflecting their common observational biases.

Individual Tests

The programs of dbSEABED have been equipped to detect problematic data, whether by values falling outside plausible limits, or by mismatches between EXT and PRS results. These tools normally do not prevent the problem values being output, but they do report detections to a diagnostics file that is particularly useful in the preparation and cleaning of incoming data sets. The statistical data shown in Figure 1 is employed to set the filters, usually at the 68% (1s) level. The original data can then be revisited, checked for issues such as those shown in Figure 2, and can be corrected, deactivated or left alone as appropriate.

Conclusion

Issues of accuracy and reliability become apparent as soon as data are integrated. Tools for monitoring the integration process are required, with feedback to the input data, so that improvements can be made in the system.

Basic uncertainties exist in all the incoming data that cannot be reduced and integrative systems cannot proceed past that uncertainty. Parallel studies in dbSEABED have determined on the basis of replicate analyses that analytical data such as grain size analyses (Syvitski and others, 1991) has 1-sigma uncertainties on the order of 4% of the total parameter range, or 0.8 phi. With good maintenance of the data, the outputs from dbSEABED approach those levels of reliability.