Review and Revision of the Environmental Sample Data


U.S. Geological Survey	Data Series 152
National Water-Quality Assessment Program

Water-Quality, Streamflow, and Ancillary Data for Nutrients in Streams and Rivers Across the Nation, 1992–2001

By David K. Mueller and Norman E. Spahr

Introduction

Sample Collection and Laboratory Analysis

Compilation of Environmental Sample Data

Review and Revision of Environmental Sample Data

Modification to Data Censoring

Estimation of Nutrient Loads

Review and Revision of the Environmental Sample Data

The first step in preparing the data for analysis was to review all the records and make any necessary revisions. The initial review involved checking data codes to ensure that the sample type was properly identified for each record. The primary types for nutrient data are environmental samples, replicate samples, and blank samples. Replicates and blanks are quality-control (QC) samples. Nutrient QC data were described and analyzed by Mueller and Titus (2005). A few cases of inconsistent or obviously erroneous sample coding were noted and corrected. For example, 25 records with nutrient concentration data were coded as plant or animal tissue samples, and 8 obvious environmental samples were incorrectly coded as various QC sample types. Also, three samples coded as blanks were found to be switched with related samples that were coded as environmental.

Data for some environmental samples were stored in more than one record in the database. These records were combined into a single record for each sample. In most of these cases, one type of data, such as organic carbon, was stored in one record, and the remaining data for the sample, such as physical measurements and nutrients, were stored in a second record.

At two National Water-Quality Assessment Program sites, samples were collected in more than one location, and the individual records were combined to provide total streamflow and flow-weighted average concentrations. These sites were the Apalachicola River near Sumatra, Florida, which was sampled both in the main channel and on the flood plain during high flow, and the Rio Grande at San Marcial, New Mexico, which was sampled in the historical channel and a bypass canal.

The resultant environmental samples data set contains more than 28,000 samples from 500 sites collected during water years 1992–2001. About 200 of these samples contained known or obvious data errors. Known errors included sampling mistakes— for example, analyses of total nutrient concentrations made on filtered water samples— and laboratory errors. These were deleted from the data set. Extremely anomalous values were identified by comparing samples over time and ranges of streamflow and by checking for consistency among constituent concentrations and physical measures. About 175 of these values were deleted. Also, about 25 anomalous “less than” remark codes (<) were deleted, but the values were retained. In about 35 samples, anomalous values were replaced by values from a replicate sample, or “dissolved” concentrations were substituted for “total” values. In addition, about 40 other data values were obvious decimal errors, which were corrected. All revisions are recorded in the environmental data file in the column labeled “Nutrient National Synthesis Team Comments.”

U.S. Geological Survey

Data Series 152

National Water-Quality Assessment Program

Water-Quality, Streamflow, and Ancillary Data for Nutrients in Streams and Rivers Across the Nation, 1992–2001

Review and Revision of the Environmental Sample Data