USGS OFR 02-370: Alaska DGGS Scanning Project

Digital Mapping Techniques '02 -- Workshop Proceedings
U.S. Geological Survey Open-File Report 02-370

The Alaska DGGS Scanning Project: Conception, Execution, and Reality

By Gail Davidson, Lauren Staft, and E. Ellen Daley

Alaska Division of Geological & Geophysical Surveys
794 University Avenue, Suite 200
Fairbanks, AK 99709
Telephone: (907) 451-5006
Fax: (907) 451-5050
e-mail: Gail_Davidson@dnr.state.ak.us

INTRODUCTION

The Alaska Division of Geological & Geophysical Surveys (DGGS) is engaged in an ongoing series of projects under the auspices of the Minerals Data and Information Rescue in Alaska (MDIRA) program. The mandate of this program is to recover and make easily available all mineral-related files, documents, and physical samples held in the public domain, in order to prevent this information from being lost through attrition or inaccessibility. Several different public entities, including the Alaska Department of Natural Resources (of which the Division of Geological & Geophysical Surveys is a part), the University of Alaska, the Alaska Resource Library Information System (ARLIS), the U.S. Geological Survey (USGS), and the U.S. Bureau of Land Management (BLM), are on the Liaison committee, along with private entities, including the Alaska Federation of Natives, the Alaska Miners Association, and various interested members of Alaska's mining community. Specific DGGS projects funded by the MDIRA program include the Guide to Alaska Geologic and Mineral Information, published by DGGS in 1998, Alaska Resource Data Files, Alaska State Agency Lithochemical Data; Alaskan Bedrock and Surficial Map Index; the partially completed DGGS-wide Geologic Database project; and the DGGS Scanning and Document Conversion Project, designed to scan and make available on the World Wide Web all DGGS publications. The Scanning Project began in 1999 and is virtually complete at this time.

CONCEPTION

The conversion of DGGS publications to an electronic format has a dual purpose. First, by converting these documents, we can make them available on our Web site for easy public access to our publications. In the past, people who requested information could come to our building in Fairbanks or to a library in Anchorage or Juneau, or they could wait for a publication to be mailed to them; now publications are available at a mouse click. Second, we wanted to provide electronic backups of irreplaceable documents. Some of our publications date back as far as 1903 and are extremely fragile. We formerly had to provide low-quality photocopies of these documents for distribution; now they can be viewed at any time, online, with no damage to the original and at better quality.

EXECUTION

Personnel

The following personnel have been involved in the project:

Geologist V: Project oversight
Geologist III: Project manager, in a long-term nonpermanent position
College intern: Map scanning
College intern: Databasing and Web-page production

Management and Logistics

The first issue in this project involved the merits of scanning documents and maps in house or contracting them out. We decided to contract the document scanning for all but the most fragile of our 1,900 titles, totaling 67,000 pages. The contract required: (1) scanning and conversion of pages up to 11" x 17"; (2) conversion to Adobe Acrobat .PDF format; and (3) optical character recognition (OCR), because OCR'ed versions are smaller and more readable, as well as editable. Bindings were removed where necessary, and documents were shipped to the contractor in batches. The total time for completing the contract was about nine months. Upon their return to us, all files ran through a quality control filter. An Access database was employed to store data on documents sent and returned as well as on data quality.

The second issue concerned the advisability of contracting the scanning of approximately 3,000 oversized sheets, including maps, cross sections, tables, and other large-format published documents. Because we worried about scanning quality due to the large differences in color and quality of our hard copies, we ultimately decided to purchase a 36" scanner to scan the oversized sheets in-house. We then hired a college intern to do the actual scanning. Toward the end of the map-scanning portion of the project, we contracted scanning of 70 oversized sheets exceeding 36" in size to a local printing shop. Next we needed to decide on an electronic storage method for the maps and documents that would allow archiving as well as delivery to the end user. Adobe Acrobat .PDF files were determined readable by most computer systems with a free viewer, http://www.adobe.com/products/acrobat/readstep2.html, both online or off line. This format had been used on our Web site in the past for both text and maps, so we knew it to be reliable. We decided to store in this format all documents scanned and converted to text by the contractor. Our experience led us to conclude that .PDF files of maps would be too big for Web delivery on a large scale, so we searched for a more compact, yet equally useful, format. We chose LizardTech's MrSID because it is the most widely used compression format available, it provides very good resolution even when zoomed in, it is read directly by Arc/Info, and it has free readers, http://www.lizardtech.com/download/, available to users for both online and off-line use. Figure 1 shows a MrSID-compressed map, originally made at 1:250,000, that has been zoomed on-screen several times.

Figure 1. MrSID map showing resolution when zoomed in.

The next issue involved deciding upon a means of delivering the scanned documents and maps to the public via our Web site. Because a Divisionwide geologic database was in the planning stages, we looked forward to using it as a means of access to the data. We found, however, that we needed to deliver the scanned products before the database was ready, so we wrote direct Web pages to do so.

The last issue is still in the process of being resolved. Information published since the digital age began is already in electronic format, so it can be included in our archive and Web presentation by moving it to the proper format. A simple means of adding these publications is still under construction.

TECHNICAL ASPECTS

Before purchasing a wide-format scanner, we sent a mylar with various graphics on it to several vendors for testing. When the files were returned, we checked them for fuzziness, evenness of scanning over the entire map, and stretch. We then purchased a Widecom SLC 936C scanner. During the map-scanning phase of the project, this scanner broke down several times, and we had trouble getting parts for it. After about a year, we purchased a second scanner, a Contex FSC Color 36, which has served us well.

Metadata for both document scanning and map scanning were stored in an Access database (Figure 2). As each map was scanned, parameters on map corners, scales, and other such data were recorded for future entry into the National Geologic Map Database. Maps were scanned at a resolution of 400 dpi. Scanned files were archived on CD-ROM (documents in .PDF format and map files in .TIF format). The .TIF files were compressed to .SID files using parameters of c=30 and n=6.

Figure 2. Relationship table for scanning database.

Because we needed to deliver the scanned project to the public before it was complete, we chose to put all scanned documents on our Web site in December 2000, along with maps that had been scanned to that date. In the fall of 2001 we updated the pages to include all maps scanned. Maps and documents can be viewed directly from the Web server using free viewers. Three search methods are available at http://wwwdggs.dnr.state.ak.us/pubs.html: Quadrangle search, Publications Series search, and a keyword search that uses the Google search engine. The latest publications, which were prepared and published electronically, have not been added to the site at the date of this writing (May 2002).

Web pages listing all available publications were produced using a Visual Basic program that does much the same thing as a Microsoft Word mailmerge. The code reads a query that accesses data on all necessary variables from the database, punctuates and formats the results, and writes HTML code. An example is shown in the Appendix.

The Scanning Project database includes an index number that ties the scanned publications to a second Access database at DGGS, where data on authors, titles, and similar attributes of the publications of the Survey are stored. In order to query both databases simultaneously, late in the Scanning Project process we decided to combine them. This effort required changing some field names, but the combined database is useful for several purposes and will ultimately be uploaded to the planned Divisionwide Oracle database. In the future, the Web site will access the Oracle database for delivery of publications to the public.

REALITY

Many different issues slowed the delivery of the Scanning Project. First was the time lag -- approximately two months -- between receiving the project money and hiring the project manager. This time lag is not uncommon in the hiring system of Alaska state government, but made it difficult to deliver products in a short time frame. The project manager left after document scanning was complete, but before map scanning was complete, leaving interns to complete the project.

Document scanning is not perfect technology. Even documents printed on a press may not scan and OCR perfectly; many DGGS publications were typed on manual typewriters and have handwritten notes on them (Figure 3). Although the project manager examined files returned from the contractor and made a note in the database as to the quality of each, the project timeline did not allow fixing OCR errors. A decision was made early in the project to put the files on line "as is" and to fix them if time and budget allowed. In the course of using the scanned documents, we have found numerous mistakes, such as documents with every other page missing. We are fixing these as we go.

Figure 3. Miscellaneous Report 003-01; note the OCR errors.

Although we have a large file of mylar originals of our maps, several are missing, and we had to scan paper copies of folded maps for this project. Because our intention was to make all published material available, we used the best copy we could find of each map. Scanner problems set us back several weeks on more than one occasion. We had trouble getting help and parts from the manufacturer. One of the largest problems is the integration of newer publications with the ones that have been scanned. As we have published maps using GIS and drawing programs, and text using word processors, the files have become scattered and the methods keep changing. We envision that our coming Divisionwide database will alleviate these problems by keeping track of where the various pieces of publications reside. That database will also allow us to feed data to the web directly, using any sort of search imaginable. In the meantime, however, we are in the process of writing code to integrate these publications with those already on the Web site.

We have received very positive feedback on the availability of our publications online, in spite of the very slow Internet speed available between Fairbanks and Anchorage. That bandwidth is in the process of being upgraded now. Publication sales at DGGS have dropped dramatically due to online availability, but we find that net budget changes amount to very little because the lack of sales is balanced by the reduction in our reproduction costs.

APPENDIX

Option Compare Database
Sub QuadMailmerge()

' Dim CALLS VALUES FOR VARIABLES, SETS db AS ABBREVIATION FOR DATABASE, rs FOR RECORDSET AND 
CALLS PubNumber AS AN INTEGER FIELD
Dim db As Database, rs As Recordset, PubNumber As Integer

Set db = CurrentDb
' SELECT QUERY TO EXTRACT DATA (SETS rs (recordset) AS NAMED TABLE, FOR EXAMPLE "NewQuadMailmerge")
Set rs = db.OpenRecordset("NewQuadMailmerge")

' CALL A FILE TO SEND THE TEXT TO; IN THIS CASE, TEXT IS SENT TO "C:\temp\Alaska.txt"
FileNum = FreeFile
Open "C:\temp\Alaska.txt" For Output As FileNum

' GO TO THE FIRST RECORD
rs.MoveFirst
With rs
  ' SET PUBNUMBER VALUE TO ZERO
  PubNumber = 0
  
    ' BEGIN LOOP. DOES NOT FINISH UNTIL END OF FILE IS REACHED.
    Do
      If PubNumber = 0 Then GoTo Top
      
 ' IF THE PUBNUMBER EQUALS THE SHEET INDEX NUMBER (I.E. IS A MAP    BELONGING TO THAT PUBLICATION)
      ' THEN SKIP TO THE MIDDLE OF THE LOOP AND PRINT ONLY SHEET INFO.
      ' INITIAL VALUE IS SET AS ZERO, SO THIS WILL NOT BE TRUE FOR THE FIRST RECORD AND WILL DEFAULT
      ' TO PRINTING PUBLICATION INFO
Here:    If PubNumber = rs!SheetIndex Then
        GoTo Middle
        End If
        'FOR RECORDS WHERE SHEETINDEX DOES NOT MATCH PUBNUMBER, PRINT PUBLICATION INFO
Top:    PubNumber = rs!PubIndex
      Print #FileNum, "<BR>"
        'PRINTS THE FILENUMBER, THE AUTHOR, THE PUBLICATION YEAR, THE TITLE, ETC. WHICH ARE ALL FIELDS IN THE "NewQuadMailmerge"
      Print #FileNum, rs!AuthSeq & ", " & rs!PubYear & ", " & rs!Title & " " & rs!Publisher & ", " & rs!QuadFileName & ":<BR>"
      If rs!InternetInfo = "!" Then Print #FileNum, "<FONT COLOR='RED'>", rs!PubComments, "</FONT><BR>"
      If rs!TextOK = "!" Then GoTo Middle Else Print #FileNum, "<a href='../" & rs!Path & "/text/" & rs!FileDesignator & 
".PDF'>Report</a>, " & rs!PubPages & " p., .PDF format (" & rs!PDFsize & " KB).<BR>"
      ' IF THERE ARE NO SHEETS, GOTO NEXT RECORD
      ' OTHERWISE PRINT THE SHEET INFORMATION
Middle:   If (IsNull(rs!NoSheets)) And rs!SheetQuad Like "*Alaska*" Then
      Print #FileNum, "<"; rs!SheetsOK & "a href='../" & rs!Path & "/oversized/" & rs!FileName & ".SID'>" & rs!FileName & 
"</a>, " & rs!ActualName & ", "; rs!Comments & ", " & rs!MapScale & ", .SID format (" & rs!SIDFileSize & " KB).<BR>"
      End If
      ' GOTO THE NEXT RECORD
EndLp:   ' END LOOP, GOTO TOP
      .MoveNext
    Loop Until .EOF
End With
    ' CLOSE THE TEXT FILE
    Close #FileNum
    
End Sub

RETURN TO Contents

National Cooperative Geologic Mapping Program | Geologic Division | Open-File Reports

U.S. Department of the Interior, U.S. Geological Survey
URL: https://pubsdata.usgs.gov/pubs/of/2002/of02-370/davidson.html
Maintained by David R. Soller
Last modified: 19:15:35 Wed 07 Dec 2016
Privacy statement | General disclaimer | Accessibility

Digital Mapping Techniques '02 -- Workshop Proceedings U.S. Geological Survey Open-File Report 02-370