PROGRAM TO CONVERT SUDS2ASC FILES TO A SINGLE BINARY SEGY FILE

MARK GOLDMAN
OPEN-FILE REPORT 00-141
MARCH, 2000


INTRODUCTION

This program, SUDS2SEGY, converts and combines ASCII files
created using SUDS2ASC Version 2.60, to a single SEGY file.

SUDS2ASC has been used previously to create an ASCII file
of three-component seismic data for an individual recording station.
However, many seismic processing packages have difficulty
reading in ASCII data.  In addition, it may be cumbersome
to process a separate file for each recording station, particularly 
if traces from different recording stations contain a different
number of data samples and/or a different start time.

This new program - SUDS2SEGY - combines these recording station 
files into a single SEGY file.  In addition, SUDS2SEGY normalizes
the trace times so that each trace starts at a given time and 
consists of a fixed number of samples.  This normalization allows 
seismic data from many different stations to be read in as a single
"data gather".  SUDS2SEGY also produces a report summarizing the
offset and maximum absolute amplitude for each component in a station
file.  These data are output separately to an ASCII file and
can be subsequently input to a plotting package.


SUDS2ASC FORMAT

The input files must have the following format, as created
by the program SUDS2ASC Version 2.60.  There must be data for
three components in each file, each component section containing
header lines followed by data lines.  Header lines must be present
for the keywords "initial sample time", "number of samples", 
and "samples per second".  Each keyword must be on a separate
line, with the value for that keyword in the first field of
that line.  However, for the header line containing "initial 
sample time",  the first field is the date, which is not used.  
Instead the program uses the second field, which is the 
hour, minute, and second that the trace begins.
The last header line must contain the keyword: "rate correction".  
The data lines must start immediately after this last header line.  
After the last data line, there must be at least one blank line.  
After these blank lines should start the header lines for the next 
component.  Additional header lines may of course be present,
as long as the last header line is as described above.

An example input file would be (with extranious lines removed):

;SUDS2ASC - Version 2.60

n                          ;component
09/20/99  17:47:11.000     ;initial sample time
17810                      ;number of samples
200.00000                  ;samples per second
0  0  0  0  0  0  0  0 -1  0
0 -1  0  0  0  0  0  0 -1  0

e                          ;component
09/20/99  17:47:11.000     ;initial sample time
17810                      ;number of samples
200.00000                  ;samples per second
0  0  0  0  0  0  0  0 -1  0
0 -1  0  0  0  0  0  0 -1  0

v                          ;component
09/20/99  17:47:11.000     ;initial sample time
17810                      ;number of samples
200.00000                  ;samples per second
0  0  0  0  0  0  0  0 -1  0
0 -1  0  0  0  0  0  0 -1  0

Note that input files may contain a ^M character at the end
of every line, including "blank" lines.  This character will
be automatically removed by the program and will not affect
determination of blank or header lines.  Also note that
the "component" header value is not read by the program.
The program will process components in the order in which
they appear in the input files.

The SUDS2SEGY source code defines constant values for many of
the above definitions.  If the input ASCII files have a slightly
different format, it may be possible to easily modify the
program by changing one or more constants in the source code.


STEPS IN CONVERTING AND COMBINING SUDS2ASC DATA TO A SEGY FILE

1) Place the input SUDS2ASC files into a separate directory.
   This input directory should contain only these input files 
   and no other files.  Each input file must contain the 
   following header information for each of the three components:

      initial sample time
      number of samples
      samples per second
   
2) Place or create in the current directory a table file providing the 
   full station name and offset for each input file in the input directory.
   Each input file in the input directory must have a corresponding line
   in the table file.  

   The first field in the table file must match a filename 
   in the input directory up to the first "." character.
   For convienence, the matching is case-insensitive.
   For example, input filename "pe0001.asc.bin" will match
   a line starting with "PE001.ASC" in the table file.

   The second field gives the full station name, and the third field 
   gives the station's offset in km from the source.  Additional fields 
   on the same line are ignored, as are lines not starting with a matching
   filename.  Such files are often provided with the data.
   An example line of a typical table file is:

   PE004014.ASC ILA035 143.55 ........

   In this example, a file of the form "pe004014" must exist
   in the input directory.  The station name for this file
   will be "ILA035", and the offset will be 143550 meters.
   
3) Next, decide on a starting time.  This starting time will be compared 
   to the "initial sample time" as given in the input file header.  
   If the initial sample time begins after the starting time, then dummy 
   samples (value of zero) will be padded to the begining of the data samples.  
   If the initial sample time begins before the starting time, then data samples 
   will be skipped until the sample corresponding to the starting time
   is reached.  The starting time of the event itself is often a good
   choice for this parameter.

4) Next, decide on the number of samples per trace.  If the number 
   of samples in the file (after padding or truncating samples as 
   described in step 3 above) is less than this value, then dummy 
   samples (value of zero) will be padded to the end of the data 
   samples.  If the number of samples in the file (after step 3) is 
   greater than this value, then data samples will be skipped once 
   this number of samples is reached.  In general, the number of 
   samples should be set large enough to encompass most of the data 
   from most of the stations.  However, there is a maximum limit of 
   32,000 samples per trace.  Some trial-and-error may be necessary 
   to arrive at an optimal value for both the starting time and the 
   number of samples per trace.  The output diagnostics file will contain 
   diagnostic information on the number of samples padded or truncated 
   for each trace.

   Example:  input data has "initial sample time" of 17:45:00,
             and 5000 samples per trace, 100 samples per second.

   If the user wishes all traces to start at 17:46:00, with
   1000 samples per trace, then
   the program will skip the first 100 samples, read in the next
   1000 samples, and skip the remaining 3900 samples.

   If the user wishes all traces to start at 17:44:00, with
   1000 samples per trace, then
   the program will pad 100 dummy samples to the begining of the trace, 
   read in the next 900 samples, and skip the remaining 4100 samples.

   Note that by entering a starting time value and a number of samples
   value, the user is in effect creating a "time window".  Only data
   samples that fall within this user-defined time window will be copied
   to the output SEGY file.  In addition, the maximum absolute amplitude
   will be computed only from data samples that fall within this window.

5) Now you are ready to run SUDS2SEGY, specifying the input directory, 
   the table filename, the output SEGY filename, the output report file, 
   the output diagnostics file, the starting time for the data samples, 
   and the number of data samples per trace.  An example is:

   SUDS2SEGY  input_files  input_table  SEGY_output  report_output  diag_file  17:47:12.6  20000

   In this examples, input station files are read from a
   directory called input_files, the table file is called
   input_table, the output SEGY file is called SEGY_output,
   the report file is called report_output, the diagnostics
   file is called diag_file, and each trace starts at time 
   17:47:12.6 and contains 20,000 samples.


OUTPUT SEGY FILE:

The SEGY output file will have a separate FFID (field file 
identification number, e.g. station number, shot number, etc) 
for each station file in the input directory.  
Each FFID will contain three traces, with the first component in
the input file as channel 1, the second component as channel 2, 
and the third component as channel 3.

The SEGY output format is a binary format, based on:  
"Recommended Standards for Digital Tape Formats"
by K. M. Barry, D. A. Cavers, C. W. Kneale
Geophysics, Vol 40, # 2 (April 1975) p 344-352

All header values must be integers, which requires multiplying
offset by 1000 to convert from kilometers to meters.

The SEGY file can be broken up into the following parts:

1) An initial 3600-byte file header

   byte        header
   offset      value
   ---------   --------------------------------
   3213-3214   number of data traces per record
   3217-3218   sample interval in microseconds (this file)
   3219-3220   sample interval in microseconds (field records)
   3221-3222   number of samples per trace (this file)
   3223-3224   number of samples per trace (field records)
   3225-3226   sample format code (set to 2 - ie. 4-byte fixed point)

2) For each trace, a 240-byte trace header

   byte        header
   offset      value
   ---------   --------------------------------
     1-4       trace sequence number within line (increments starting from 1)
     5-8       trace sequence number within reel (increments starting from 1)
     9-12      ffid
    13-16      chan
    29-30      trace id code (set to 1 - ie. seismic data)
    37-40      offset in meters
    41-44      receiver elevation 
    69-70      scalar (set to 1)
    71-72      scalar (set to 1)
   115-116     number of samples per trace
   117-118     sample interval in microseconds
   161-162     hour of day (24 hour clock)
   163-164     minute of hour
   165-166     second of minute
   167-168     time basis code (set to 2 - ie. GMT)
   169-170     trace weighting factor (set to 0)

3) For each trace, the trace data samples are written as 4-byte integers

Note that the size of the output SEGY file, in bytes, will be:
      3600 
      + (number_of_traces) * 240
      + (number_of_traces) * (samples_per_trace) * 4


OUTPUT REPORT FILE:

The report file will contain information regarding each
station file read from the input directory.  This file 
can be used to match FFID's in the SEGY file to filenames 
in the input directory.  The report file will also contain, 
for each station the offset of the station from the source, 
the maximum absolute amplitude for each of the three components, 
and the time of maximum amplitude for each component.
Note that the maximum amplitude will be calculated only
from samples that begin after the user-specified starting
time and fall within the user-specified number of samples.


OUTPUT DIAGNOSTICS FILE:

The diagnostics file will contain information concerning 
the number of samples padded or truncated at the begining 
and end of each trace.  Warnings may be issued
if a trace's "inital sample time" was so different
from the user-supplied starting time that all data
samples were truncated by the program.


SUMMARY:

In summary, this program will allow for easier importation of
seismic data produced by SUDS2ASC into a seismic processing 
system, such as Promax.  In addition, all traces will be
normalized to begin at a specified time, and each trace will
contain a fixed number of samples.  An output summary gives the 
offset and maximum amplitude values for each trace, which can 
then be read into a plotting program.