How to Reach Us
Tables & Maps
& Outside Links
Training Document: Procedure
for Data Entry into Database Tables
Jamey M. Reid-Currence
Have the following items handy before
beginning data entry:
1. Photocopy (if possible) of the document or file
to be entered
2. Highlighter and a pencil
3. Data dictionary
4. Data entry header files (6): Station, Inorganics,
General Organics, PCBs and Pesticides, PAHs, and Texture.
5. Working dictionary file or glossary file
There are basically two types of documents that
are entered in the database: (A.) papers and reports
(which includes reports, journal articles, theses papers, unpublished
data, etc.) and (B.) Army Corps of Engineer permit
files. The main key to data entry is DETAIL. Enter as much
about the data as possible; there are plenty of comment and qualifier
fields to place information.
If something seems questionable, or you don´t know
where a certain piece of data goes, put your question in a comments
field (relevant to that data) until you can ask someone for guidance.
Also mark the question on the document either by writing it beside the
item in question or placing it on a Post-It-Note and sticking it on
that page in the document. These flags in the comments fields
and on the paper copy let the reviewer know that the item needs further
(A.) Entry of Papers and Reports
1. Photocopy the document so that you can write on your copy.
2. Photocopy bibliographic data
(or verify that another copy is on file.) Assign a short reference name
and enter in both the station table, project bibliography and status
table and all subsequent tables.
3. Use the Working Dictionary and Glossary.
Any abbreviations or acronyms that the data entry person encounters or
decides to utilize (to speed up the data entry process) should be listed
in a glossary. You may use the existing files or create an EXCEL
table to serve as your list or Glossary for maintaining these items.
The Glossary can be sorted in alphabetical order to look up abbreviations
more quickly. The Working Dictionary has categories already set
up that different glossary items will probably fall under, such as Agency
Names, Sampling Devices, Analytical Methods, etc. Usage of abbreviations
vs. complete terms can be made consistent throughout the database at
a later stage as long as records are kept and there are no ambiguities.
The U.S. Army Corps of Engineers is the agency identified as "USACOE."
The acronym USACOE can be used in the station table and recorded in
the glossary and working dictionary by entering "USACOE" in
a column labeled "Abbreviation" and "U.S. Army Corps
of Engineers" in a column labeled "Definition."
4. Identify the data
Skim through and find the following items:
What types of data are in the report?
What is the total number of samples, regardless
of what is analyzed for each sample? (This
is very important because sometimes there are more samples shown for
one type of data than for another, and all samples must be entered whether
they contain data or not) As you skim through the document mark off
or highlight the items that are to be entered into the data files. Reply
on pre-screening commentary when available. You will check and
date them when entry is completed.
5. Enter data: station table.
(Name your file with "stat" somewhere in the file name, i.e.
a. Sample Identification
Identify samples in the Local ID column by
using your initials and a number. For example, if your name were Jamey
Reid, you would enter JR1 for the first sample you enter. On the document,
write down what your local ID number is for that sample. For example,
if sample M-4 is the first sample you enter, and you call it JR1, write
JR1 down next to it in the report.
Identification numbers are very important in order
to retain original information in the report and they should remain
consistent throughout ALL data tables. They´re also important
when a reviewer attempts to verify the data that has been entered by
comparing it to the original document. It provides the ability to point
to a sample in the database and find it easily in the report you entered
it from. Two important identification numbers are: Sample ID
or Original No. and Orig. Station. These are numbers
which can usually be found in the tables within the report or permit
file along with the data. The Unique Sample Identifier is an identification
number which is assigned by USGS after the completely entered dataset
has been reviewed and verified. The format of this Unique Number
is US0####, where # are numbers.
(The Preceding Database and Preceding
File Name are used only if you are transferring data that is already
in electronic format. Some data may originate from an electronic
file taken from an already existing data compilation or database. The
name of the database would be entered into Preceding Database,
e.g., USGS Contaminated Sediments Database and the actual name of the
file that was used to copy the data from would be entered into Preceding
File Name, e.g., data.xls.)
b. Sources of References
The next important field is the Source of Reference,
which should be the author´s last name first, initials and year. Ex:
Moffet, A.M., et al, 1994. (Use ‘et al’ when there are more than two
authors.) For two authors, you can use both names. Ex: Esser and Turekian,
1993. For Project Name, I recommend using the title of the document
because it creates a further link between the database and the hard
copy of the document. Quad Name is more relevant to
c. Location Information
Most reports don’t give State Plane N and
State Plane E, but if they do, enter the information in these
fields. Look for a table of latitudes and longitudes. (Remember that
longitude degrees in the Western hemisphere are entered with a negative
sign in front.) If there is no table given, proceed to enter the data.
Find latitudes and longitudes from the map when time permits.
General Location Name
and Specific Location Name: Example: "Long Island Sound"
is an example of a general location name, but more specifically the
location could be "New Haven Harbor" or "Guilford Marina,"
etc. Other examples are: a data set is located in the "Connecticut
River" and the document says "in the vicinity of the Coast
Guard Academy" Therefore "Connecticut River" would be
the general location name and "vicinity of the Coast Guard Academy"
is the specific location name.
The next part is entry of the sampling date(s),
or the date on which the data in the document was collected. This data
is usually found in the text of the document or sometimes at the top
of a data table. Many documents will only list a Sampling Day/Month/Year
1. (‘1’ refers to the first day of sampling and ‘2’ refers to the last
day of sampling.)
d. Sample Collection
Gen. Comment re sample: when more explanation
of each sample is given in the text, it should be entered here
Cruise Id - this is usually found within the text of the document
either in the introduction or under "Methods." An example
is "RV Asterias.
Core or Grab #
Sampling Device: This is usually found under the "Methods"
section of a report. It will probably say something like "Samples
were collected using…."
Sample Type: should always be sediment, do not enter
elutriate (water), or sludge data
Depth in core or sediment: would be found on data table or in
text under methods
Core Length: sometimes given in a data table; may also be in
Interval Number: not usually given
Depth interval, top of core: if a samples lists 0-6cm, then the
"top" value is 0
Depth interval bottom of core: if a samples lists 0-6cm, then
the "bottom" value is 6
Orig. depth in sediment: original values are entered here.
If the data was originally measure in feet, entered those values here.
Original depth units: the original units of this value such as
meters, feet, etc.
Sediment depth code: depth code is either "depth" or
"surface;" surface being the first 0-6 centimeters of a core,
depth being deeper than 6 cm in a sediment core
Sediment depth comments: notes on the individual sample
e. Information about
type of data and data entry
A Yes or No (entered as "Y" or "N")
answer should be entered in the following fields: Metals and
other inorganics?, Organic Contams. analyzed?, Grain Size analyzed?
and Bioassay data available?
Comments- Bioassay: enter what types of bioassay are in the report
here, for example, if there’s data on winter flounder and tube worms,
etc. enter that here.
Bio reference: if the bioassay data are not in this reference, enter
where this data can be found
Other types Analy. In Ref: what other types of data are
in this reference? Such as elutriate, sludge, etc.
Data Entry day/month/year/formatted: Enter the whole date into
one cell and accept the default format because it may change as files
are transferred between different systems, hence the need for separate
day, month, year columns. (Example: December 29, 1999 may appear
Initials of Data Enterer: Enter your initials so you
can be identified with the data you entered if any questions arise later.
(B.) Entry of Permit Files:
Source or Reference Name
for permit files should be entered in the following way:
ACE_NED permit file #regulatory file no., Project
ACE_NED permit file #1995-00138, Groton Long Point
Permit files can be very complicated and to enter
agency information, you need to sort through to see who did what. For
example, Portsmouth Yacht Club may be sponsoring the work, but they
hire Braman Engineering to do the work, and Braman Engineering may in
turn hire someone else to help do the work.
Agency 1 Sponsoring
(agency publishing the work)
Agency 2 Contracted (agency/researcher doing the sampling)
Agency 3 Subcontracted (agency/researcher doing sampling, not
Agency 4 Other (additional agencies responsible for work)
Project Name: included in the name of the file
Quad Name: this is a four-letter abbreviation for the area on
the map, and can either be found on a data sheet or on the map itself.
Regulatory File Number: the number on the application, see example
Est. vol. of material: how much material will be dredged up;
measured in cubic yards; is usually found on the permit application
Disposal area code: a four letter code that indicates where the
material will be disposed of; can either be found on the data sheet
or on the permit application
All other procedures related to papers and reports
are must be done with permit files also.
6. Enter data: tables of analytical data
The next sequence of data entry should be 2)
inorganics, 3) general organics, 4) specific organics (parts 1 and
2), and 5) texture. Name the files accordingly
a. Sample identification
Be sure to use the same local ID number in all of
the files and use the same original numbers throughout. For example:
if the samples in the report are A, B, C, D, and E, then those sample
names should appear in every file either under Laboratory's sample
ID number or Laboratory's Job Number (even if it’s not
the lab ID) These numbers can usually be found on the lab sheets
themselves if they are included in the document or in the text of the
document. Also use the same Source or Reference Name for
all files. If you find that a report includes only inorganics and texture,
or only organics, or some other variation, you must enter the data in
ALL tables (as a space holder) and put in a comment such as "no
data of this type for this sample"
organics data for this sample."
b. Laboratory and methods
All of the data tables have columns in which to
enter the following information:
Testing Lab: This
is the actual name of the testing lab found on the lab sheets themselves
or in the text of the document.
Analytical technique: Enter the technique used to analyze
this sample as written on the lab sheets or in the text of the document
Analytical comments: If there are any comments or further
description pertaining to how this sample was analyzed, they can be
Replicate number and number of replicates:
Test day/month/year: This information is usually found
on the lab sheets themselves or in the text of the document.
Test date formatted: Enter the whole date into one cell
and accept the default format because it may change as files are transferred
between different systems, hence the need for separate day, month, year
columns. (Example: December 29, 1999 may appear as 29-Dec-99)
c. Measured data values
Measured data values should always be entered in
the concentration field for each parameter.
|Arsenic (As) mg/g
||As detection limit
The values should always be entered in the units
that are given in the tables. Most inorganic parameters are measured
in mg/g (micrograms per gram) while most organics parameters are measured
in ng/g (nanograms per gram). If the units of the data in the
document do not match the units in the tables, then the data should
be converted. A comment should be made in the qualifier column
stating that the data was converted and what units it was originally
Conversions are as follows:
||are equal to:
||and also equal to:
||1 μg/g (or ppm or mg/kg) =
||1000 ng/g (or ppb or μg/kg)
||1 in. =
||106 μg =
||104 μg =
||100 g =
d. Qualifiers, comments
and detection limit values
Any comments in the document or data tables pertaining
to a particular measurement should be entered in the qualifier column
for each parameter. Detection limit values should always be entered
in the detection limit column for each parameter. (Detection limit
values will be listed in the document.)
The qualifier column should also include any notes
or comments found in the text or data tables (within the document) about
the sample. Only numeric values should be entered in the concentration
and detection limit columns. Abbreviations that indicate that
a sample is below the detection limit are: "BDL," "ND,"
"below MDL," etc. If a document states that a sample was a
non-detect or below the detection limit yet no detection limit values
reported, enter a 0 for the concentration and put an appropriate comment
in the qualifier field such as "reported as ND, detection limit
values not reported." When no measurement is attempted, the
concentration field remains empty.
Samples that are below the detection limit are to
have this information entered in 3 fields:
||Detection limit value
|Arsenic (As) m g/g
||As detection limit
|Arsenic (As) m g/g
||As detection limit
||reported as ND, detection
limit values not reported