Alaska Division of Geological & Geophysical Surveys
794 University Avenue, Suite 200
Fairbanks, AK 99709
Telephone: (907) 451-5006
Fax: (907) 451-5050
e-mail: Gail_Davidson@dnr.state.ak.us
The second issue concerned the advisability of contracting the scanning of approximately 3,000 oversized sheets, including maps, cross sections, tables, and other large-format published documents. Because we worried about scanning quality due to the large differences in color and quality of our hard copies, we ultimately decided to purchase a 36" scanner to scan the oversized sheets in-house. We then hired a college intern to do the actual scanning. Toward the end of the map-scanning portion of the project, we contracted scanning of 70 oversized sheets exceeding 36" in size to a local printing shop. Next we needed to decide on an electronic storage method for the maps and documents that would allow archiving as well as delivery to the end user. Adobe Acrobat .PDF files were determined readable by most computer systems with a free viewer, http://www.adobe.com/products/acrobat/readstep2.html, both online or off line. This format had been used on our Web site in the past for both text and maps, so we knew it to be reliable. We decided to store in this format all documents scanned and converted to text by the contractor. Our experience led us to conclude that .PDF files of maps would be too big for Web delivery on a large scale, so we searched for a more compact, yet equally useful, format. We chose LizardTech's MrSID because it is the most widely used compression format available, it provides very good resolution even when zoomed in, it is read directly by Arc/Info, and it has free readers, http://www.lizardtech.com/download/, available to users for both online and off-line use. Figure 1 shows a MrSID-compressed map, originally made at 1:250,000, that has been zoomed on-screen several times.
Figure 1. MrSID map showing resolution when zoomed in. |
The next issue involved deciding upon a means of delivering the scanned documents and maps to the public via our Web site. Because a Divisionwide geologic database was in the planning stages, we looked forward to using it as a means of access to the data. We found, however, that we needed to deliver the scanned products before the database was ready, so we wrote direct Web pages to do so.
The last issue is still in the process of being resolved. Information published since the digital age began is already in electronic format, so it can be included in our archive and Web presentation by moving it to the proper format. A simple means of adding these publications is still under construction.
Metadata for both document scanning and map scanning were stored in an Access database (Figure 2). As each map was scanned, parameters on map corners, scales, and other such data were recorded for future entry into the National Geologic Map Database. Maps were scanned at a resolution of 400 dpi. Scanned files were archived on CD-ROM (documents in .PDF format and map files in .TIF format). The .TIF files were compressed to .SID files using parameters of c=30 and n=6.
Figure 2. Relationship table for scanning database. |
Because we needed to deliver the scanned project to the public before it was complete, we chose to put all scanned documents on our Web site in December 2000, along with maps that had been scanned to that date. In the fall of 2001 we updated the pages to include all maps scanned. Maps and documents can be viewed directly from the Web server using free viewers. Three search methods are available at http://wwwdggs.dnr.state.ak.us/pubs.html: Quadrangle search, Publications Series search, and a keyword search that uses the Google search engine. The latest publications, which were prepared and published electronically, have not been added to the site at the date of this writing (May 2002).
Web pages listing all available publications were produced using a Visual Basic program that does much the same thing as a Microsoft Word mailmerge. The code reads a query that accesses data on all necessary variables from the database, punctuates and formats the results, and writes HTML code. An example is shown in the Appendix.
The Scanning Project database includes an index number that ties the scanned publications to a second Access database at DGGS, where data on authors, titles, and similar attributes of the publications of the Survey are stored. In order to query both databases simultaneously, late in the Scanning Project process we decided to combine them. This effort required changing some field names, but the combined database is useful for several purposes and will ultimately be uploaded to the planned Divisionwide Oracle database. In the future, the Web site will access the Oracle database for delivery of publications to the public.
Document scanning is not perfect technology. Even documents printed on a press may not scan and OCR perfectly; many DGGS publications were typed on manual typewriters and have handwritten notes on them (Figure 3). Although the project manager examined files returned from the contractor and made a note in the database as to the quality of each, the project timeline did not allow fixing OCR errors. A decision was made early in the project to put the files on line "as is" and to fix them if time and budget allowed. In the course of using the scanned documents, we have found numerous mistakes, such as documents with every other page missing. We are fixing these as we go.
Figure 3. Miscellaneous Report 003-01; note the OCR errors. |
Although we have a large file of mylar originals of our maps, several are missing, and we had to scan paper copies of folded maps for this project. Because our intention was to make all published material available, we used the best copy we could find of each map. Scanner problems set us back several weeks on more than one occasion. We had trouble getting help and parts from the manufacturer. One of the largest problems is the integration of newer publications with the ones that have been scanned. As we have published maps using GIS and drawing programs, and text using word processors, the files have become scattered and the methods keep changing. We envision that our coming Divisionwide database will alleviate these problems by keeping track of where the various pieces of publications reside. That database will also allow us to feed data to the web directly, using any sort of search imaginable. In the meantime, however, we are in the process of writing code to integrate these publications with those already on the Web site.
We have received very positive feedback on the availability of our publications online, in spite of the very slow Internet speed available between Fairbanks and Anchorage. That bandwidth is in the process of being upgraded now. Publication sales at DGGS have dropped dramatically due to online availability, but we find that net budget changes amount to very little because the lack of sales is balanced by the reduction in our reproduction costs.
Option Compare Database Sub QuadMailmerge() ' Dim CALLS VALUES FOR VARIABLES, SETS db AS ABBREVIATION FOR DATABASE, rs FOR RECORDSET AND CALLS PubNumber AS AN INTEGER FIELD Dim db As Database, rs As Recordset, PubNumber As Integer Set db = CurrentDb ' SELECT QUERY TO EXTRACT DATA (SETS rs (recordset) AS NAMED TABLE, FOR EXAMPLE "NewQuadMailmerge") Set rs = db.OpenRecordset("NewQuadMailmerge") ' CALL A FILE TO SEND THE TEXT TO; IN THIS CASE, TEXT IS SENT TO "C:\temp\Alaska.txt" FileNum = FreeFile Open "C:\temp\Alaska.txt" For Output As FileNum ' GO TO THE FIRST RECORD rs.MoveFirst With rs ' SET PUBNUMBER VALUE TO ZERO PubNumber = 0 ' BEGIN LOOP. DOES NOT FINISH UNTIL END OF FILE IS REACHED. Do If PubNumber = 0 Then GoTo Top ' IF THE PUBNUMBER EQUALS THE SHEET INDEX NUMBER (I.E. IS A MAP BELONGING TO THAT PUBLICATION) ' THEN SKIP TO THE MIDDLE OF THE LOOP AND PRINT ONLY SHEET INFO. ' INITIAL VALUE IS SET AS ZERO, SO THIS WILL NOT BE TRUE FOR THE FIRST RECORD AND WILL DEFAULT ' TO PRINTING PUBLICATION INFO Here: If PubNumber = rs!SheetIndex Then GoTo Middle End If 'FOR RECORDS WHERE SHEETINDEX DOES NOT MATCH PUBNUMBER, PRINT PUBLICATION INFO Top: PubNumber = rs!PubIndex Print #FileNum, "<BR>" 'PRINTS THE FILENUMBER, THE AUTHOR, THE PUBLICATION YEAR, THE TITLE, ETC. WHICH ARE ALL FIELDS IN THE "NewQuadMailmerge" Print #FileNum, rs!AuthSeq & ", " & rs!PubYear & ", " & rs!Title & " " & rs!Publisher & ", " & rs!QuadFileName & ":<BR>" If rs!InternetInfo = "!" Then Print #FileNum, "<FONT COLOR='RED'>", rs!PubComments, "</FONT><BR>" If rs!TextOK = "!" Then GoTo Middle Else Print #FileNum, "<a href='../" & rs!Path & "/text/" & rs!FileDesignator & ".PDF'>Report</a>, " & rs!PubPages & " p., .PDF format (" & rs!PDFsize & " KB).<BR>" ' IF THERE ARE NO SHEETS, GOTO NEXT RECORD ' OTHERWISE PRINT THE SHEET INFORMATION Middle: If (IsNull(rs!NoSheets)) And rs!SheetQuad Like "*Alaska*" Then Print #FileNum, "<"; rs!SheetsOK & "a href='../" & rs!Path & "/oversized/" & rs!FileName & ".SID'>" & rs!FileName & "</a>, " & rs!ActualName & ", "; rs!Comments & ", " & rs!MapScale & ", .SID format (" & rs!SIDFileSize & " KB).<BR>" End If ' GOTO THE NEXT RECORD EndLp: ' END LOOP, GOTO TOP .MoveNext Loop Until .EOF End With ' CLOSE THE TEXT FILE Close #FileNum End Sub