This readme file was created by Chris Leeth to document the changes in the files and database
originally published in the report: “Geophysical log database for the Floridan Aquifer System
and Southeastern Coastal Plain aquifer system”...by Lester Williams, Jessica Raines and Amanda Lanning.
USGS Data Series 760. The original data are located in a directory name FloridanLogDB.
This readme file resides in a copy of that directory named FloridanLogDbRevision. Per direction of 
the South Atlantic Water Science Center (SAWSC) director Eric Strom, Log ascii files that were scanned by
 students are to be removed from the database and from publication and the report modified to reflect this 
change. Because of the size of the database and the number of files, several automated tools were used to find 
and fix erroneous files and directories. This also addresses problems with the Web pages that arose because 
of the different operating systems of the staging and production servers. Please note that I was unable to rectify 
a basic inconsistency in the report and in the database. The report states that there were 5,636 logs from 
1,292 wells. I have found no databases that contain a list of these files, nor any directories that contain this
number of files. The most voluminous directory found was that on what was termed Server 11 in the Norcross 
office of the SAWSC. In that directory there was a database, excel spreadsheet and text file that has listed 
5,246 files; however, that directory has 5,275 files in it.  
 
Outside of the problems with the actual data files, file directories were not named consistently, with mixed 
case being used for both directory and file names. Thus the first task was identification of directories
with lower case names. This was accomplished using the following unix kornshell commands.
 ls -Rd */* |grep "[[:lower:]]\+"

The logic here was to recursively list the directories and then use regular expressions to identify those
with lower case. This command resulted in the following directories being identified:
 
FL/Lee/
FL/Okeechobee/
GA/Effingham/
GA/Quitman/
OS/Offshore/
SC/Allendale/
SC/Bamberg/
SC/Barnwell/
SC/Charleston/
SC/Colleton/
SC/Dorchester/
SC/Jasper/

These directories were renamed manually.

The next task was to identify directories with any blank spaces in their path. 
These were few and the directories were changed by hand.

       ls -Rd */* |grep "[[:blank:]]\+"

The next task was to identify the .las files that were digitized by WSC staff.
This was accomplished using the following commands:

 grep -rl "NEURALOG PLOT DEFINITION" | tee lasDel.txt

This again uses regular expressions to recursively search the files for the term
"NEURALOG PLOT DEFINITION" and writes them out to the file lasDel.txt. This 
command identified 362 files that were to be deleted. 

The next task was to delete the files identified above. This was accomplished using
the same logic and then redirecting to the rm command:

grep -rl "NEURALOG PLOT DEFINITION" | xargs rm *.las

This leaves us with a directory that has 362 fewer .las files. 

Similar logic was used to identify files that had blank spaces
ls -R */* |grep "[[:blank:]]\+"

There were a number of these, so to make it a bit easier, the same command was completed 
on the State directory level. Again this was manageable by hand editing.

Next was to identify files that either had incomplete or no extensions. When done 
recursively from the top level directory, there were 154 of these with 30 of them being the 
autogenerated Thumbs.db files. There should only be 3 file types in the directories
las, pdf or tif thus:

 ls -R */*/* | grep -v "\.[lpt][adi][sf]"

Some of these had one or more of the extension letter and were changed without
actually opening the file. Using the unix grep command the rest of the files were identified by using variations 
of the regular expression above, i,e..

ls -R */*/* | grep -v "\.[lpt]$"
ls -R */*/* | grep -v "\.[lpt][adi]$"

etc. This only eliminated about 15 of the 130 files. For the rest of the files, it was necessary to discern if 
they were .tif, .las, or .pdf files. This was a 2 step process, using the 
head command to look at the first few lines of the file to see if it was an .las file
and then using the tiffinfo command to see if it was a .tiff if not a .pdf was assumed.

The next task was to identify files that had filenames that were lower case. Unfortunately,
the regular expression skills of the reviewer were not such that this could be done directly with grep, so the 
file name and extension were divided into separate "columns" using the cut command and then pushed the 
file names into grep. The following command identified 83 files with lowercase letters

ls -R */*/* | cut -d . -f1 | grep [[:lower:]] | wc -l


The filenames and directory structure are thus consistent and can be used to build a new database and Web site.