OFR 97-492: Why a new NURE HSSR data format?

  About USGS /  Science Topics /  Maps, Products & Publications /  Education / Publication: FAQ

 

National Geochemical Database—Reformatted Data from the National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance (HSSR) Program

By Steven M. Smith
Version 1.40 (2006)

 

The National Uranium Resource Evaluation (NURE) Hydrogeochemical and Stream Sediment Reconnaissance (HSSR) program produced a large amount of geochemical data. To fully understand how these data were generated, it is recommended that you read the History of NURE HSSR Program for a summary of the entire program.

By the time the NURE program had ended, the HSSR data consisted of 894 separate data files stored with 47 different formats. Many files contained duplication of data found in other files. The University of Oklahoma's Information Systems Programs of the Energy Resources Institute (ISP) was contracted by the Department of Energy to enhance the accessibility and usefulness of the NURE HSSR data. ISP created a single standard-format master file to replace the 894 original files. ISP converted 817 of the 894 original files before its funding apparently ran out.

The ISP-reformatted NURE data files have been released by the USGS on CD-ROM (Lower 48 States, Hoffman and Buttleman, 1994; Alaska, Hoffman and Buttleman, 1996). A description of each NURE database field, derived from a draft NURE HSSR data format manual (unpubl. commun., Stan Moll, ISP, Oct 7, 1988), was included in a readme file on each CD-ROM. That original manual was incomplete and assumed that the reformatting process had gone to completion. A lot of vital information was not included. Efforts to correct that manual and the NURE data revealed a large number of problems and missing data.

As a result of the frustrating process of cleaning and re-cleaning data from the ISP-reformatted NURE files, a new NURE HSSR data format was developed. This work represents a totally new attempt to reformat the original NURE files into 2 consistent database structures; one for water samples and a second for sediment samples, on a quadrangle by quadrangle basis, from the original NURE files. Although this USGS-reformatted NURE HSSR data format is different than that created by the ISP, many of their ideas were incorporated and expanded in this effort. All of the data from each quadrangle are being examined thoroughly in an attempt to eliminate problems, to combine partial or duplicate records, to convert all coding to a common scheme, and to identify problems even if they can not be solved at this time.

This work is not yet finished; the data files released here are preliminary and may eventually be changed slightly. Some fields present now may be combined or eliminated in the final version. Changes to the data contained in these files should be a rare occurrence. Since this database and accompanying manual are a draft version, any comments or additional corrections are most welcome.

The author of this document was not involved in the original NURE project. All of the historical information was derived from published NURE reports and data files. Any errors in these web pages are the responsibility of the author.



A partial list of problems are given below to justify revisiting the NURE data and the creation of a new USGS-reformatted NURE FORMAT.

Duplicated Records: Some work was done by ISP to reduce the amount of duplication of records and data fields but many duplications were never addressed. Many samples are represented by 2 or even 3 records. Often, one of these records will contain some or all of the information that is contained in another record. In other cases, records differ slightly. This has been seen when one file was meant to supersede a previous file. Sometimes these duplicated records have slightly different latitude and longitude coordinates because of corrections, different rounding algorithms, or re-digitizing of locations. Other duplicated records contain differences in element concentrations due to additional analyses, re-analyses, or different analytical correction factors.

The USGS-reformatted NURE data files attempt to combine data from duplicated records into a single record for each sample without losing any contained information. This has been largely successful for most NURE quadrangles. Occasionally it has been necessary to include 2 records for some sites when corresponding records have irreconcilable differences. When these situations were encountered, the corresponding records are identified in the REFORMAT comment field.

Quadrangle Names: One of the most common means of retrieving data from NURE files is by 2-degree quadrangle names. Since quadrangle names were not given in the original NURE data files, ISP added them to a new field for each record. These quadrangle names were programmatically assigned to each record based upon the recorded latitude-longitude coordinates. Because some sample sites could not be matched based on coordinates, 19,559 records reportedly could not be assigned a quadrangle name. Also with this programmatic approach to assigning quadrangle names, whenever the coordinates were in error, the samples were assigned to the wrong quadrangle. These mislocated samples are seldom identified by users of the NURE data and have mistakenly been used in several subsequent studies.

The USGS-reformatted NURE data files also have quadrangle names that are assigned to each sample. These names are assigned at the beginning of the formatting process based upon the quadrangle study area from which the samples were reportedly collected. When latitude-longitude coordinates do not match the specified quadrangle, the record is checked against all published reports for corrections. For all cases where a coordinate value is found to be incorrect, a comment is added to the COORDPRB comment field. When a coordinate value has been changed, the original unchanged value is also noted in the COORDPRB field along with the justification for the change.

Concentration Reporting Units: The original NURE data files contain data for waters and sediments, often in the same file. The concentration reporting units such as percent, parts-per-million (ppm) or parts-per-billion (ppb) were often different for the same element depending upon the original laboratory and the sample media. For consistency within the database, ISP attempted to convert all element concentration values to ppb. Most geochemical values were assumed to be in ppm and were multiplied by 1000 to convert to ppb. Since some original formats actually contained data in ppb or in percent, this change sometimes resulted in incorrect reporting units such as parts-per-trillion or parts-per-100,000. Other values were never converted and remained as ppm in the ISP database.

The USGS-reformatted NURE data files attempt to solve this problem in a different fashion. First, each of the original files are checked against values in the published reports to determine the reporting units actually used for each element and each sample media. Reporting units are only converted, when necessary, on a file by file basis. Secondly, sediment and water data are separated into different data files and formats. Within both file formats, concentrations are then reported in units that are appropriate for that sample media and that concentration unit is appended directly to the field name. For example, aluminum concentrations are reported in percent in the sediment file (AL_PCT) and in ppb for the water file (AL_PPB).

Site Characteristics Coding Schemes: The four National Laboratories responsible for the collection and analysis of NURE HSSR samples recorded field information on at least 5 uniquely different field forms, each with a different coding scheme for observations. During the ISP reformatting process, most of these different coding schemes for site description variables were never reformatted into a single standard coding scheme. The ISP reformatted data contains as many as 5 different coding schemes for any one field, based upon the scheme of the originating laboratory. Only one set of coding schemes is explained in the draft NURE HSSR data format manual which is commonly incorrect for data from entire sections of the nation.

For almost all fields, we have replaced the code values with the defined text for the original code. All of the definitions for each site description variable have been compiled and documented for the USGS-reformatted NURE data files in the On-Line Manual for USGS-Reformatted NURE HSSR Data Files.

Uranium Analyses: Because the emphasis of the NURE program was uranium exploration, many records contain the results of several types of uranium analyses. Within the ISP-reformatted files, uranium data were included in several fields and in at least 2 different formats. Unfortunately, many users of the data never discovered the additional fields that often contained uranium data and have consequently reported an absence of NURE uranium data for some study areas when they do exist. Additionally, when uranium concentration values were reported in these other fields, the data were often mixed between different analytical methods and the reporting units were commonly in error.

The USGS-reformatted NURE data files attempt to clarify the uranium data situation by including several uranium fields differentiated by the analytical method. For example, in the sediment data files, uranium concentrations may be found in 4 adjacent fields: U_DN_PPM (uranium by delayed neutron counting methods), U_FL_PPM (uranium by fluorescence spectroscopy methods), U_MS_PPM (uranium by mass spectrometry methods), and U_XX_PPM (used for additional uranium analyses and methods as identified in the U_XX_MTHD field).

Additional Data: During the ISP reformatting process, some data were never included into their standardized format. These missing data include some descriptive data fields in the original files (such as weather codes and air temperature measurements) and data from entire files and studies (such as "Gold from Savannah River Lab Neutron Activation Analyses", hydrocarbons in water, and certain pilot studies or follow-up detailed studies).

During the reformatting process for the USGS-reformatted NURE data files, an attempt was made to preserve every bit of unique data from each original file and to identify every source of NURE data for each quadrangle. Occasionally, identified data were not readily available in digital form and were entered from hardcopy publications. Data from each identified source were then combined, when possible, into a single complete record for each sample. Additionally, several new fields have been added to the USGS-Reformatted NURE data files that add information only found in published reports. Some of these new fields include STUDY (identifies special samples collected for each pilot, orientation, or detailed NURE study); METHODS (identifies the analytical methods used to determine element concentrations listed in a record); REFORMAT (includes comments added during the reformatting process to document identified data problems, rationale for changing or adding data, and sources for combined or additional data); COORDPRB (includes comments about problems with or corrections to latitude and longitude coordinates); and TAPEFILE (documents the original data file used for the primary information included in the record).

USA.gov logo