USGS
 Environmental Geochemistry and Sediment Quality in Lake Pontchartrain

II. Sediment Database Structure and Development

A. "Flat file vs. relational" database structures


Flat files

We define database terms as follows. A flat file is a collection of data organized in columns and rows, pertinent to a given subject. A data table showing various parameters from Lake Pontchartrain and surroundings (Table 1) is an example. This is the way earth scientists, who often deal with very complex systems, have traditionally collected and analyzed data. Spreadsheets like Microsoft Excel are a popular way to display, manipulate, and create plots and diagrams from flat files.

Relational databases

dbquery.gif (41493 bytes)
Figure 1

emapflow.gif (9009 bytes)
Figure 2

Potential confusion emerges from two interpretations of "relational database systems". A broad use of the term relational would refer to database systems that relate different data tables to each other and, therefore, are capable of querying these tables to extract information. Database management software like Microsoft Access serves this purpose in the current work (Fig. 1). However, database systems professionals commonly adopt the specific conventions of the "relational model" introduced by E.F. Codd of the IBM corporation in 1970 (Walters, 1987, p.46), taking the definition one step farther. This model uses formal mathematical rules of set theory and relational algebra to arrange the tables and their relationships in a manner that prevents anomalies from occurring while inserting, updating or deleting data. The guidelines for this system, called "normalization", require that every row (tuple) have a primary key; the attributes in a relational table must depend on the primary key; and the nonkey attributes in the table must be independent of each other. In other words, complex data systems are broken up into their most fundamental relationships to form tables, each of which must have a primary key, and which generally have less than 25 fields (columns). An example of such a relational system (with minor modifications) applied to sediment chemistry is provided in Figure 2 and Table 2 (data from the U.S. Environmental Protection Agency’s EMAP program).

This system allows computer database managers to manipulate very large and complex databases with flexibility and efficiency. Normalized tables are essential in a very large database, especially one containing data that requires frequent updating. A table which is not normalized may contain data that is dependent on other data or entries that are repeated. This causes significant difficulties in maintaining the internal consistency (integrity) of the database while updating the data.

The disadvantage of removing all dependencies is that these may be the principle features that scientists use to compare and analyze data.  Although such relationships could be stored or recreated in query structures, this process may require services of professional or experienced database managers. For smaller databases, it may be possible to reduce the need for such expertise and recombination of data by keeping the database structure in formats convenient to the main user needs.

"Horizontal" vs. "Vertical" format

The Pontchartrain database (containing about 90,000 descriptors and measurements) uses a mixed databasing system. It departs from the classical relational model by allowing multiple dependent fields. However, this kind of structure is not a problem since the data are essentially static. The majority of the updates involve adding new records but not changing what already has been entered. The structure displays the parameters in a "horizontal" format, meaning the chemical constituents are the field names stretching across the table horizontally (Table 3). Such structures differ from the format used in Table 2 which shows the chemical constituents listed vertically in one column, repeating for each station. The horizontal format allows the scientist to more conveniently view, compare, sort and associate data that intuitively belong together. The tables may have hundreds of fields rather than be limited to a few tens of fields as in the vertical format, but the horizontal format obviously requires a fewer number of records. Database Mangagement System (DBMS) software like Microsoft Access is nearly unlimited in the number of records it can manage (2 billion per table), but the latest version of Microsoft Excel only allows 65,536 records. The horizontal table structure allows storage and use in both Excel and Access. Even though they are related by a unique identification number and can be queried by DBMS software, some computer specialists may still refer to the tables as flat files because of the horizontal format which is not normalized.

Although the current database's total storage capacity is small in comparison with industrial or agency databases, it has many complex parameters and properties . The development and interactive use of the database throughout both the data processing and interpretative phases (Fig. 3) offers parallels to large industrial database systems with their feedback loops to data sources and customers. Industrial managers now use "data mining" to perform statistical analysis and modeling techniques to uncover patterns and relationships hidden in an organization's database (Edelstein, 1998). This intelligent data querying and exploration of data resources helps uncover new relationships rather than merely providing answers to standard queries as traditional databases have done. Data mining likewise implies the necessity for continuous modification of database structures, another feature of the present work.

dbflow2.gif (14082 bytes)

Figure 3

 

 

Back to Introduction

Forward to Data Dictionary


Beginning of Section
Sediment Database and Preliminary Interpretations

[an error occurred while processing this directive]