next up previous contents
Next: Use of SOHO Up: Data Previous: Dissemination and archiving

Standard formats

Overview

The specification and use of a standard format or set of formats enables data to be exchanged easily between investigators. SOHO will use the Standard Formatted Data Unit (SFDU) which is becoming more common in data archives. For example, all data in the ISTP CDHF at NASA/GSFC must be SFDU conforming data objects. Documents describing formatting standards may be obtained from:

NASA/OSSA Office of Standards and Technology
Code 933
NASA Goddard Space Flight Center
Greenbelt MD 20771
USA

SFDU

SFDU is an international standard that facilitates the exchange of information between users. The SFDU formalism enables a description of the data to be specified in a standard way and in a way that anyone, possibly years later, can obtain from the appropriate international agency. Such agencies are called Control Authorities, two of which are the NASA/NSSDC and ESA/ESOC. A data description that is registered with such a Control Authority is given a unique identifier that is included with the data as an SFDU label (either as a separate file or included with the data at the beginning of the data). Thus any user of the data who is unfamiliar with the data can obtain a description by contacting a Control Authority. A FORTRAN procedure is available to generate SFDU labels.

The SOHO Science Operations Working Group has adopted the SFDU formalism for any product that is going to be distributed to the community. For example the summary data will have SFDU labels (detached) as will the orbit and attitude files.

The SFDU is described in the following documents:

The SFDU does not in itself specify the format of the data. It permits any format, either registered or not to be used. If a non-registered format is used, then the format specification needs to be included with the data. Three data formats that are registered are the Parameter Value Language (PVL), Flexible Image Transport System (FITS), and the Common Data Format (CDF), all of which will be used in SOHO files.

PVL

The SFDU uses PVL to specify required information. It is a generalization of the format in the header of FITS files, and is of the form "Parameter = Value". It is an international standard also and is described in the following documents:

The catalogs that will be generated by SOHO experimenters will use PVL/FITS concepts. In order to ensure that everyone is using the keywords (parameters) in a consistent way, the keywords and their definitions will be registered with a Control Authority. A draft document of the keywords has been circulated (see Annex 6 in the minutes of the 8 SOWG meeting).

FITS

All scientific data files generated by the PI teams will be in FITS format. In particular, this applies to the summary data, and to level-1 (and higher) data files. An exception are the summary data files of the three particles experiments CELIAS, COSTEP, and ERNE which will be in CDF.

A formal description of the FITS standard can be found in ``Implementation of the Flexible Image Transport System (FITS)'', available as publication NOST 100-0.3b from the Office of Standards and Technology, or by anonymous ftp from nssdca.gsfc.nasa.gov (128.183.36.23), or via DECnet from NSSDCA::ANON_DIR:[FITS] (15548::).

FITS files facilitate interoperability by using a specified binary standard for encoding data values independent of the computer platform. In other words, FITS files look the same regardless of what computer the file is sitting on, and can be copied from computer to computer without modification. FITS files are also used in a wide range of astronomical applications, and are directly supported in such astronomical software packages as IRAF, and indirectly supported in some broader data analysis packages such as IDL.

Some standardized software for reading and writing FITS files are available in the public domain. The FITSIO package by William Pence is a set of FORTRAN subroutines available by anonymous ftp from tetra.gsfc.nasa.gov (128.183.8.77).

There are also IDL routines available for reading and writing FITS files, as part of the IDL Astronomy User's Library. These are available via anonymous ftp from idlastro.gsfc.nasa.gov (128.183.84.71), or by DECnet copy from uit::$1$DUA5:[IDLUSER] (15384::).

Primary FITS files

The simplest form of FITS file consists of a single FITS header and data unit. FITS headers are a series of eighty-character card images of the form keyword=value. The keywords are restricted to a maximum of eight characters, and include a standard set of predefined keywords, some of which are required, and whatever additional keywords the experimenter wishes to define.

The data unit consists of an N-dimensional data array. The size, dimensions, and datatype of the array are described by standard FITS keywords in the header. IEEE standards are used for the binary representation of the data.

The primary FITS header and data unit can be followed by one or more FITS extensions. In that case it is not required that there be a primary data array; the number of elements can be given as zero. There are a number of different kinds of standard extension types, and there is also the possibility of defining new kinds of extensions.

ASCII tables

One standard extension type, the ``TABLE'' extension, allows the experimenter to store an ASCII encoded table. The format of each column in the table is defined individually. This extension could be used to store catalog-type information.

Binary tables

Similar to the ``TABLE'' extension, the ``BINTABLE'' extension allows the storage of data organized into a table with rows and columns. However, the data are stored with a binary representation (although ASCII fields are allowed), and individual items in the table can be arrays rather than scalar values.

At the moment there is no formal standard for describing the dimensions of an array. This is principally because there is no one ``right' way to do this. However, there is a proposal for one way to do this, the ``Multidimensional Array Facility'', which is given as an Appendix in the NOST FITS document, and uses TDIMn keywords in the header to describe the dimensions. This TDIM approach should meet the needs of any SOHO instrument team that wants to use binary tables to store their data.

Binary tables represent a powerful and efficient way of associating together a number of different data variables in a single FITS file.

The IMAGE extension

The ``IMAGE'' extension has been proposed by the IUE (International Ultraviolet Explorer) team as a standard for storing multiple arrays in a single FITS file. Each IMAGE extension is basically of the same format as the primary FITS header and data unit.

IMAGE extensions are appropriate when the number of data arrays, and hence the number of extensions, to store together in a single FITS file is small. If the number of non-scalar variables is large, or the data structure is complex, then binary tables are more appropriate.

CDF

The GGS/ISTP (Global Geospace Science / International Solar-Terrestrial Physics) project has adopted the NSSDC (National Space Science Center) CDF for use in key parameters and some other data products maintained at the CDHF. The exact role of the CDHF in storing and distributing SOHO summary data still needs to be worked out, but at the very least key parameters from certain SOHO instruments will be incorporated into the CDHF database. Since that database uses CDF, and SOHO uses FITS, some conversion will be necessary.

CDF has some properties in common with FITS, in that it is self-describing, and that it allows the association of information about the data, (units, description of data axes, etc.) together with the data arrays. The underlying physical representation, and the basic data model, are different however.

The NSSDC supplies a set of standard FORTRAN and C libraries for reading and writing CDF files on VMS and Unix computers. These are available via anonymous ftp for VMS from nssdca.gsfc.nasa.gov (128.183.36.23), or by DECnet copy from NSSDCA::ANON_DIR:[CDF.CDF21- DIST] (15548::). Software for various Unix workstations are available using anonymous ftp from ncgl.gsfc.nasa.gov
(128.183.10.238).

The CDHF also supplies software to aid in the generation of key parameter software in ISTP/CDF format. This software is available via DECnet from ISTP::SYS$PUBLIC:[SFDU_TOOLS.BLD_SFDU]-(15461::) or by anonymous ftp from either istp1.gsfc.nasa.gov (128.183.92.58) or from istp2.gsfc.nasa.
gov (128.183.92.59) in the directory SYS$PUBLIC:[SFDU_TOOLS.BLD_SFDU].

The format used by the ISTP/CDHF is a subset of the complete CDF specification, and further specifies the format to promote uniformity between the different ISTP data sets. This uniformity extends such things as the binary representation of data (e.g. IEEE format for floating point numbers, the same as FITS), and the representation of times.

Both FITS and CDF are supported in IDL.



next up previous contents
Next: Use of SOHO Up: Data Previous: Dissemination and archiving



SOHO Archive
Fri Apr 28 14:32:42 EDT 1995