U.S. Geological Survey Data Series 74, Version 3.0
Long-Term Oceanographic Observations in Massachusetts Bay, 1989-2006
The names of the files containing the time-series data are coded to provide information on the mooring number, instrument location in the mooring, instrument or variable, and processing stage (table 8). The data-file names begin with a four-digit number: the first three digits are the mooring number and the fourth digit indicates the vertical position of the instrument in the mooring. For example, 5541 identifies the top-most instrument on mooring 554, 5542 the next instrument down on mooring 554, etc. Additional identifiers are added to the four-digit identifier to indicate instrument or sensor, variable, processing steps, averaging, and data position on a platform. Table 8. Explanation of data-file names. The data-file names are in the format nnni[A][bbbb][c]-d[dd]_[ee].ff (for example, 6831adc-a1h.nc). [ ], optional elements
The original intent of the data file-naming convention was to be able to identify the file content using the file name. However, the systems used to process the data and the file-naming conventions evolved over the 16 years of data collection and processing. As a result, the file names are not consistent over the entire data set and the content of a data file cannot always easily be determined by examining the file name. For example, some file names have no identifier in the [bbbb] field, some an instrument identifier, and some a variable identifier. In some cases the [bbbb] identifier does not indicate all the variables in that file. The graphic data catalog and list of files by variable (see Overview of Data Set), and the tables for downloading the digital data files (see Digital Data Files) were generated using tools in MATLAB that searched for variable names, depth, and time in the file headers. The data-processing strategy was generally to keep all variables collected by a data logger in the same data file. However, in some cases the variables are in several files. For example, if a sensor failed early in a deployment, while others collected data for the entire deployment, the data are in separate files. Data processed prior to 2000 and stored in the WHOI Buoy Group system format were converted to netCDF format. The conversion program automatically separated variables collected by a single data logger (for example MIDAS), and originally stored in a single file, into multiple files if the variables were obtained at different depths. The program retained the original nnni[bbbb] in the new file name, but appended d0, d1, etc., for the variables at different depths. In addition to introducing multiple files, the originally accurate [bbbb] identifier in the new file name may no longer accurately identify the data. For example, temperature, conductivity, and light transmission data, originally in a single file labeled [tct], were placed in separate new files, with the same [tct] label, but with temperature in one file, conductivity and salinity in data another, and beam-attenuation data in a third. In preparing this report, consideration was given to renaming and recombining data files to reduce the number of files and introduce a consistent labeling. However, the file names were left unchanged because of the very large number of files with a wide variety of names and the potential to introduce further confusion. With such a large data set, search tools are the most efficient means of finding the data of interest. |