Contaminated Sediments Database for the Gulf of Maine, OFR 02-403
Training Document: Procedure for Data Entry into Database Tables

Jamey M. Reid-Currence

Have the following items handy before beginning data entry:

1. Photocopy (if possible) of the document or file to be entered
2. Highlighter and a pencil
3. Data dictionary
4. Data entry header files (6): Station, Inorganics, General Organics, PCBs and Pesticides, PAHs, and Texture.
5. Working dictionary file or glossary file

There are basically two types of documents that are entered in the database: (A.) papers and reports (which includes reports, journal articles, theses papers, unpublished data, etc.) and (B.) Army Corps of Engineer permit files. The main key to data entry is DETAIL. Enter as much about the data as possible; there are plenty of comment and qualifier fields to place information.

If something seems questionable, or you don´t know where a certain piece of data goes, put your question in a comments field (relevant to that data) until you can ask someone for guidance.  Also mark the question on the document either by writing it beside the item in question or placing it on a Post-It-Note and sticking it on that page in the document.  These flags in the comments fields and on the paper copy let the reviewer know that the item needs further investigation.

(A.) Entry of Papers and Reports

1. Photocopy the document so that you can write on your copy.

2. Photocopy bibliographic data

(or verify that another copy is on file.) Assign a short reference name and enter in both the station table, project bibliography and status table and all subsequent tables.

3. Use the Working Dictionary and Glossary.

Any abbreviations or acronyms that the data entry person encounters or decides to utilize (to speed up the data entry process) should be listed in a glossary.  You may use the existing files or create an EXCEL table to serve as your list or Glossary for maintaining these items.  The Glossary can be sorted in alphabetical order to look up abbreviations more quickly.  The Working Dictionary has categories already set up that different glossary items will probably fall under, such as Agency Names, Sampling Devices, Analytical Methods, etc.  Usage of abbreviations vs. complete terms can be made consistent throughout the database at a later stage as long as records are kept and there are no ambiguities.

The U.S. Army Corps of Engineers is the agency identified as "USACOE."  The acronym USACOE can be used in the station table and recorded in the glossary and working dictionary by entering "USACOE" in a column labeled "Abbreviation" and "U.S. Army Corps of Engineers" in a column labeled "Definition."

4. Identify the data

Skim through and find the following items:

What types of data are in the report?

What is the total number of samples, regardless of what is analyzed for each sample? (This is very important because sometimes there are more samples shown for one type of data than for another, and all samples must be entered whether they contain data or not) As you skim through the document mark off or highlight the items that are to be entered into the data files. Reply on pre-screening commentary when available.  You will check and date them when entry is completed.

5. Enter data: station table.

(Name your file with "stat" somewhere in the file name, i.e. STAT1999.xls)

    a. Sample Identification

Identify samples in the Local ID column by using your initials and a number. For example, if your name were Jamey Reid, you would enter JR1 for the first sample you enter. On the document, write down what your local ID number is for that sample. For example, if sample M-4 is the first sample you enter, and you call it JR1, write JR1 down next to it in the report.

Identification numbers are very important in order to retain original information in the report and they should remain consistent throughout ALL data tables. They´re also important when a reviewer attempts to verify the data that has been entered by comparing it to the original document. It provides the ability to point to a sample in the database and find it easily in the report you entered it from.  Two important identification numbers are: Sample ID or Original No. and Orig. Station.  These are numbers which can usually be found in the tables within the report or permit file along with the data.  The Unique Sample Identifier is an identification number which is assigned by USGS after the completely entered dataset has been reviewed and verified.  The format of this Unique Number is US0####, where # are numbers.

(The Preceding Database and Preceding File Name are used only if you are transferring data that is already in electronic format.  Some data may originate from an electronic file taken from an already existing data compilation or database. The name of the database would be entered into Preceding Database, e.g., USGS Contaminated Sediments Database and the actual name of the file that was used to copy the data from would be entered into Preceding File Name, e.g., data.xls.)

    b. Sources of References

The next important field is the Source of Reference, which should be the author´s last name first, initials and year. Ex: Moffet, A.M., et al, 1994. (Use ‘et al’ when there are more than two authors.) For two authors, you can use both names. Ex: Esser and Turekian, 1993. For Project Name, I recommend using the title of the document because it creates a further link between the database and the hard copy of the document.  Quad Name is more relevant to permit files.

    c. Location Information

Most reports don’t give State Plane N and State Plane E, but if they do, enter the information in these fields. Look for a table of latitudes and longitudes. (Remember that longitude degrees in the Western hemisphere are entered with a negative sign in front.) If there is no table given, proceed to enter the data.  Find latitudes and longitudes from the map when time permits.

General Location Name and Specific Location Name: Example: "Long Island Sound" is an example of a general location name, but more specifically the location could be "New Haven Harbor" or "Guilford Marina," etc. Other examples are: a data set is located in the "Connecticut River" and the document says "in the vicinity of the Coast Guard Academy" Therefore "Connecticut River" would be the general location name and "vicinity of the Coast Guard Academy" is the specific location name.

The next part is entry of the sampling date(s), or the date on which the data in the document was collected. This data is usually found in the text of the document or sometimes at the top of a data table. Many documents will only list a Sampling Day/Month/Year 1. (‘1’ refers to the first day of sampling and ‘2’ refers to the last day of sampling.)

    d. Sample Collection Information

Gen. Comment re sample: when more explanation of each sample is given in the text, it should be entered here
Cruise Id -
this is usually found within the text of the document either in the introduction or under "Methods." An example is "RV Asterias.
Core or Grab #
Sampling Device
: This is usually found under the "Methods" section of a report. It will probably say something like "Samples were collected using…."
Sample Type: should always be sediment, do not enter elutriate (water), or sludge data
Depth in core or sediment: would be found on data table or in text under methods
Core Length: sometimes given in a data table; may also be in methods
Interval Number: not usually given
Depth interval, top of core: if a samples lists 0-6cm, then the "top" value is 0
Depth interval bottom of core: if a samples lists 0-6cm, then the "bottom" value is 6
Orig. depth in sediment: original values are entered here.  If the data was originally measure in feet, entered those values here.
Original depth units: the original units of this value such as meters, feet, etc.
Sediment depth code: depth code is either "depth" or "surface;" surface being the first 0-6 centimeters of a core, depth being deeper than 6 cm in a sediment core
Sediment depth comments: notes on the individual sample

    e. Information about type of data and data entry

A Yes or No (entered as "Y" or "N") answer should be entered in the following fields:  Metals and other inorganics?, Organic Contams. analyzed?, Grain Size analyzed? and Bioassay data available?
Comments- Bioassay: enter what types of bioassay are in the report here, for example, if there’s data on winter flounder and tube worms, etc. enter that here.
Bio reference:
if the bioassay data are not in this reference, enter where this data can be found
Other types Analy. In Ref:  what other types of data are in this reference? Such as elutriate, sludge, etc.
Data Entry day/month/year/formatted: Enter the whole date into one cell and accept the default format because it may change as files are transferred between different systems, hence the need for separate day, month, year columns. (Example: December 29, 1999 may appear as 29-Dec-99)
Initials of Data Enterer:
  Enter your initials so you can be identified with the data you entered if any questions arise later.

(B.) Entry of Permit Files:

Source or Reference Name for permit files should be entered in the following way:

ACE_NED permit file #regulatory file no., Project Name


ACE_NED permit file #1995-00138, Groton Long Point Association

Permit files can be very complicated and to enter agency information, you need to sort through to see who did what. For example, Portsmouth Yacht Club may be sponsoring the work, but they hire Braman Engineering to do the work, and Braman Engineering may in turn hire someone else to help do the work.

Agency 1 Sponsoring (agency publishing the work)
Agency 2 Contracted (agency/researcher doing the sampling)
Agency 3 Subcontracted (agency/researcher doing sampling, not analytical labs)
Agency 4 Other (additional agencies responsible for work)
Project Name: included in the name of the file
Quad Name: this is a four-letter abbreviation for the area on the map, and can either be found on a data sheet or on the map itself.
Regulatory File Number: the number on the application, see example above
Est. vol. of material: how much material will be dredged up; measured in cubic yards; is usually found on the permit application
Disposal area code: a four letter code that indicates where the material will be disposed of; can either be found on the data sheet or on the permit application

All other procedures related to papers and reports are must be done with permit files also.

6. Enter data: tables of analytical data

The next sequence of data entry should be 2) inorganics, 3) general organics, 4) specific organics (parts 1 and 2), and 5) texture.  Name the files accordingly

    a. Sample identification information

Be sure to use the same local ID number in all of the files and use the same original numbers throughout. For example: if the samples in the report are A, B, C, D, and E, then those sample names should appear in every file either under Laboratory's sample ID number or Laboratory's Job Number (even if it’s not the lab ID)   These numbers can usually be found on the lab sheets themselves if they are included in the document or in the text of the document.  Also use the same Source or Reference Name for all files. If you find that a report includes only inorganics and texture, or only organics, or some other variation, you must enter the data in ALL tables (as a space holder) and put in a comment such as "no data of this type for this sample"

Example: "no organics data for this sample."

    b. Laboratory and methods information

All of the data tables have columns in which to enter the following information:

Testing Lab: This is the actual name of the testing lab found on the lab sheets themselves or in the text of the document.
Analytical technique:  Enter the technique used to analyze this sample as written on the lab sheets or in the text of the document
Analytical comments:  If there are any comments or further description pertaining to how this sample was analyzed, they can be entered here
Replicate number and number of replicates:
Test day/month/year:  This information is usually found on the lab sheets themselves or in the text of the document.
Test date formatted:  Enter the whole date into one cell and accept the default format because it may change as files are transferred between different systems, hence the need for separate day, month, year columns. (Example: December 29, 1999 may appear as 29-Dec-99)

    c. Measured data values

Measured data values should always be entered in the concentration field for each parameter.

Parameter concentration Parameter Qualifier Parameter Detection Limit
Concentration goes here    


Arsenic (As) mg/g As qualifier As detection limit


The values should always be entered in the units that are given in the tables.   Most inorganic parameters are measured in mg/g (micrograms per gram) while most organics parameters are measured in ng/g (nanograms per gram).  If the units of the data in the document do not match the units in the tables, then the data should be converted.  A comment should be made in the qualifier column stating that the data was converted and what units it was originally reported in.

Conversions are as follows:

These units: are equal to: and also equal to:
1.) μg/g = ppm = mg/kg
2.) ng/g = ppb = μg/kg
3.) 1 μg/g (or ppm or mg/kg) = 1000 ng/g (or ppb or μg/kg)
4.) 1 in. = 2.54 cm
5.) 106 μg = 1 g
6.) 104 μg = 100 g =

    d. Qualifiers, comments and detection limit values

Any comments in the document or data tables pertaining to a particular measurement should be entered in the qualifier column for each parameter.  Detection limit values should always be entered in the detection limit column for each parameter.  (Detection limit values will be listed in the document.)

The qualifier column should also include any notes or comments found in the text or data tables (within the document) about the sample. Only numeric values should be entered in the concentration and detection limit columns.  Abbreviations that indicate that a sample is below the detection limit are: "BDL," "ND," "below MDL," etc. If a document states that a sample was a non-detect or below the detection limit yet no detection limit values reported, enter a 0 for the concentration and put an appropriate comment in the qualifier field such as "reported as ND, detection limit values not reported."  When no measurement is attempted, the concentration field remains empty.

Samples that are below the detection limit are to have this information entered in 3 fields:

Parameter concentration Parameter Qualifier Parameter Detection Limit
0 < Detection limit value goes here


Arsenic (As) m g/g As qualifier As detection limit
0 < .001

Arsenic (As) m g/g As qualifier As detection limit
0 reported as ND, detection limit values not reported  

