PROGRAM TO CONVERT SUDS2ASC FILES TO A SINGLE BINARY SEGY FILE MARK GOLDMAN OPEN-FILE REPORT 00-141 MARCH, 2000 INTRODUCTION This program, SUDS2SEGY, converts and combines ASCII files created using SUDS2ASC Version 2.60, to a single SEGY file. SUDS2ASC has been used previously to create an ASCII file of three-component seismic data for an individual recording station. However, many seismic processing packages have difficulty reading in ASCII data. In addition, it may be cumbersome to process a separate file for each recording station, particularly if traces from different recording stations contain a different number of data samples and/or a different start time. This new program - SUDS2SEGY - combines these recording station files into a single SEGY file. In addition, SUDS2SEGY normalizes the trace times so that each trace starts at a given time and consists of a fixed number of samples. This normalization allows seismic data from many different stations to be read in as a single "data gather". SUDS2SEGY also produces a report summarizing the offset and maximum absolute amplitude for each component in a station file. These data are output separately to an ASCII file and can be subsequently input to a plotting package. SUDS2ASC FORMAT The input files must have the following format, as created by the program SUDS2ASC Version 2.60. There must be data for three components in each file, each component section containing header lines followed by data lines. Header lines must be present for the keywords "initial sample time", "number of samples", and "samples per second". Each keyword must be on a separate line, with the value for that keyword in the first field of that line. However, for the header line containing "initial sample time", the first field is the date, which is not used. Instead the program uses the second field, which is the hour, minute, and second that the trace begins. The last header line must contain the keyword: "rate correction". The data lines must start immediately after this last header line. After the last data line, there must be at least one blank line. After these blank lines should start the header lines for the next component. Additional header lines may of course be present, as long as the last header line is as described above. An example input file would be (with extranious lines removed): ;SUDS2ASC - Version 2.60 n ;component 09/20/99 17:47:11.000 ;initial sample time 17810 ;number of samples 200.00000 ;samples per second 0 0 0 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 0 -1 0 e ;component 09/20/99 17:47:11.000 ;initial sample time 17810 ;number of samples 200.00000 ;samples per second 0 0 0 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 0 -1 0 v ;component 09/20/99 17:47:11.000 ;initial sample time 17810 ;number of samples 200.00000 ;samples per second 0 0 0 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 0 -1 0 Note that input files may contain a ^M character at the end of every line, including "blank" lines. This character will be automatically removed by the program and will not affect determination of blank or header lines. Also note that the "component" header value is not read by the program. The program will process components in the order in which they appear in the input files. The SUDS2SEGY source code defines constant values for many of the above definitions. If the input ASCII files have a slightly different format, it may be possible to easily modify the program by changing one or more constants in the source code. STEPS IN CONVERTING AND COMBINING SUDS2ASC DATA TO A SEGY FILE 1) Place the input SUDS2ASC files into a separate directory. This input directory should contain only these input files and no other files. Each input file must contain the following header information for each of the three components: initial sample time number of samples samples per second 2) Place or create in the current directory a table file providing the full station name and offset for each input file in the input directory. Each input file in the input directory must have a corresponding line in the table file. The first field in the table file must match a filename in the input directory up to the first "." character. For convienence, the matching is case-insensitive. For example, input filename "pe0001.asc.bin" will match a line starting with "PE001.ASC" in the table file. The second field gives the full station name, and the third field gives the station's offset in km from the source. Additional fields on the same line are ignored, as are lines not starting with a matching filename. Such files are often provided with the data. An example line of a typical table file is: PE004014.ASC ILA035 143.55 ........ In this example, a file of the form "pe004014" must exist in the input directory. The station name for this file will be "ILA035", and the offset will be 143550 meters. 3) Next, decide on a starting time. This starting time will be compared to the "initial sample time" as given in the input file header. If the initial sample time begins after the starting time, then dummy samples (value of zero) will be padded to the begining of the data samples. If the initial sample time begins before the starting time, then data samples will be skipped until the sample corresponding to the starting time is reached. The starting time of the event itself is often a good choice for this parameter. 4) Next, decide on the number of samples per trace. If the number of samples in the file (after padding or truncating samples as described in step 3 above) is less than this value, then dummy samples (value of zero) will be padded to the end of the data samples. If the number of samples in the file (after step 3) is greater than this value, then data samples will be skipped once this number of samples is reached. In general, the number of samples should be set large enough to encompass most of the data from most of the stations. However, there is a maximum limit of 32,000 samples per trace. Some trial-and-error may be necessary to arrive at an optimal value for both the starting time and the number of samples per trace. The output diagnostics file will contain diagnostic information on the number of samples padded or truncated for each trace. Example: input data has "initial sample time" of 17:45:00, and 5000 samples per trace, 100 samples per second. If the user wishes all traces to start at 17:46:00, with 1000 samples per trace, then the program will skip the first 100 samples, read in the next 1000 samples, and skip the remaining 3900 samples. If the user wishes all traces to start at 17:44:00, with 1000 samples per trace, then the program will pad 100 dummy samples to the begining of the trace, read in the next 900 samples, and skip the remaining 4100 samples. Note that by entering a starting time value and a number of samples value, the user is in effect creating a "time window". Only data samples that fall within this user-defined time window will be copied to the output SEGY file. In addition, the maximum absolute amplitude will be computed only from data samples that fall within this window. 5) Now you are ready to run SUDS2SEGY, specifying the input directory, the table filename, the output SEGY filename, the output report file, the output diagnostics file, the starting time for the data samples, and the number of data samples per trace. An example is: SUDS2SEGY input_files input_table SEGY_output report_output diag_file 17:47:12.6 20000 In this examples, input station files are read from a directory called input_files, the table file is called input_table, the output SEGY file is called SEGY_output, the report file is called report_output, the diagnostics file is called diag_file, and each trace starts at time 17:47:12.6 and contains 20,000 samples. OUTPUT SEGY FILE: The SEGY output file will have a separate FFID (field file identification number, e.g. station number, shot number, etc) for each station file in the input directory. Each FFID will contain three traces, with the first component in the input file as channel 1, the second component as channel 2, and the third component as channel 3. The SEGY output format is a binary format, based on: "Recommended Standards for Digital Tape Formats" by K. M. Barry, D. A. Cavers, C. W. Kneale Geophysics, Vol 40, # 2 (April 1975) p 344-352 All header values must be integers, which requires multiplying offset by 1000 to convert from kilometers to meters. The SEGY file can be broken up into the following parts: 1) An initial 3600-byte file header byte header offset value --------- -------------------------------- 3213-3214 number of data traces per record 3217-3218 sample interval in microseconds (this file) 3219-3220 sample interval in microseconds (field records) 3221-3222 number of samples per trace (this file) 3223-3224 number of samples per trace (field records) 3225-3226 sample format code (set to 2 - ie. 4-byte fixed point) 2) For each trace, a 240-byte trace header byte header offset value --------- -------------------------------- 1-4 trace sequence number within line (increments starting from 1) 5-8 trace sequence number within reel (increments starting from 1) 9-12 ffid 13-16 chan 29-30 trace id code (set to 1 - ie. seismic data) 37-40 offset in meters 41-44 receiver elevation 69-70 scalar (set to 1) 71-72 scalar (set to 1) 115-116 number of samples per trace 117-118 sample interval in microseconds 161-162 hour of day (24 hour clock) 163-164 minute of hour 165-166 second of minute 167-168 time basis code (set to 2 - ie. GMT) 169-170 trace weighting factor (set to 0) 3) For each trace, the trace data samples are written as 4-byte integers Note that the size of the output SEGY file, in bytes, will be: 3600 + (number_of_traces) * 240 + (number_of_traces) * (samples_per_trace) * 4 OUTPUT REPORT FILE: The report file will contain information regarding each station file read from the input directory. This file can be used to match FFID's in the SEGY file to filenames in the input directory. The report file will also contain, for each station the offset of the station from the source, the maximum absolute amplitude for each of the three components, and the time of maximum amplitude for each component. Note that the maximum amplitude will be calculated only from samples that begin after the user-specified starting time and fall within the user-specified number of samples. OUTPUT DIAGNOSTICS FILE: The diagnostics file will contain information concerning the number of samples padded or truncated at the begining and end of each trace. Warnings may be issued if a trace's "inital sample time" was so different from the user-supplied starting time that all data samples were truncated by the program. SUMMARY: In summary, this program will allow for easier importation of seismic data produced by SUDS2ASC into a seismic processing system, such as Promax. In addition, all traces will be normalized to begin at a specified time, and each trace will contain a fixed number of samples. An output summary gives the offset and maximum amplitude values for each trace, which can then be read into a plotting program.