Skip to content

Time stream data format

kiyo-masui edited this page Feb 16, 2011 · 3 revisions

Time stream data format

The data comes from the telescope in a fits file format. Generally it is broken into scans (in the current observing strategy there are 8 of these in each data file) and frequency windows (for spectrometer data there are also 8 of these). Each scan contains a number of integration time bins (60 of 1 second each), and each time bin has 4 polarizations, 2 noise cal states (on, off) and a spectrum of frequencies (2048). One of the key design philosophies of this pipeline is that the time stream data should never change format. This means that for every stage of the pipeline the input format is fits files similar to those produced by GBT, and the output format is the same. This allows for stages of the pipeline to be easily dropped in and out of the analysis and have the whole thing still run.

For example: lets say we have a stage in the pipeline, one that applies some filter to the time stream data and another that flags bad data. The pipeline looks as follows.

  • raw gbt fits data -> flags bad data -> fits data -> filter -> fits data -> map making

Then, without changing any code, we also want to be able to do the following:

  • raw gbt fits data -> filter -> fits data -> map making

or:

  • raw gbt fits data -> flags bad data -> fits data -> map making

or even:

  • raw gbt fits data -> map making

The advantages are as follows:

  • Can add and remove modules at will to see what effect they have.
  • Can replace modules, make better ones and compare results trivially.
  • Can write modules without understanding or breaking the other ones.
  • Can independently test modules.
  • Don't have to wait for a monolithic code to be written before we start testing and evaluating algorithms.

Note that with this modular set up there is no reason that a module would have to be written in a particular language. However, if you do decide to use python for your module, I have written some infrastructure that will greatly facilitate things.

Modules for time stream format

core.DataBlock

The most important module in the time stream part of the pipeline is the core.data_block module which defines the DataBlock object. The DataBlock object holds exactly 1 scans and 1 frequency windows worth of data. That means that reading a fits file with 8 scans and 8 windows will give you 64 DataBlock objects. Each DataBlock holds ALL the information required to make a new fits file. This class enforces the policy that while data is in time stream format, each part of the pipeline should read and write data in the same (fits) format.

The actual data may be found in an attribute of the DataBlock object: DataBlock.data. This is a numpy masked array object, which to a first approximation may be treated as a numpy array. The data array is 4 dimensional with the 4 axes corresponding to time, polarization, cal state and frequency. Feel free to modify and update the entries of a DataBlock.data object, but to completely replace the data array (with for instance an array of a different shape or data type), use the DataBlock.set_data method. You should not need to do this very often.

The fits file stores much more information than just the data. For instance it stores time stamps and pointing information. I call these extra data 'fields'. You can get access to this information through the DataBlock.field attribute. The field attribute is a python dictionary and is thus normally indexed with strings not integers (i.e. DataBlock.field['field_name']). Every entry in field is a 1D numpy array. The length of the array is always the length of one of the data array dimensions. This is best illustrated with an example. One of the entries in DataBlock.field is 'CAL' which is an array of strings giving the cal state ('T' of 'F'). The 3rd dimension of the data array is normally length 2, so DataBlock.field['CAL'] = array(['T', 'F']). The local sidereal time is found in the 'LST' field. Obviously this quantity varies over the data time axis. Lets assume that the 1sy axis of the data attribute is length 60 (60 second scan with 1 second integrations). Then DataBlock.field['LST'] is an array with length 60. The DataBlock object keeps track which axis corresponds to each field in another dictionary DataBlock.field_axes. DataBlock.field_axes['CAL'] = ('cal',) and DataBlock.field_axes['LST'] = ('time',) (these are stored as tuples in case I want to eventually implement multidimensional fields). For many fields, field_axis is just an empty tuple, in which case the associated field data is just a scalar. The 'SCAN' field is an example, since each DataBlock object has a scan number (ex. DataBlock.field['SCAN'] = 42, DataBlock.field_axes['SCAN'] = ()).

Fields may be read or modified in place by accessing the DataBlock.field dictionary, but to replace a field entirely or to make a new field, use the DataBlock.set_field method. For a full list of the fields that are normally read from a fits file see the source code for core.fitsGBT.

There frequency of each channel is not stored in the fits file, only the information required to calculate the frequency (the centre frequency and the frequency channel spacing). For this reason, the DataBlock object has a calc_freq method. After calling DataBlock.calc_freq(), there will be a DataBlock.freq attribute that is an 1D array with the same length at the frequency dimension of the data array (We don't store this in the field dictionary because all entries of the field dictionary are stored in fits files when written to disk). Frequencies are in Hz. Also pointing is stored as in the inconvenient AzEl coordinates and time as a UTC string instead of a more convenient float. The conversions are made using DataBlock.calc_pointing() and DataBlock.calc_time() and can then be found in DataBlock.ra, DataBlock.dec and DataBlock.time. All are 1D arrays with length shape(DataBlock.data)[0].

Finally DataBlock has a history attribute (also implemented as a python dictionary) that keeps a complete history of the data, including every time it has been read, written or modified. Every time you do something to the data, you should update it using the DataBlock.add_history method. You can inspect the history with the DataBlock.print_history method.

core.fitsGBT

The next module you have to worry about is core.fitsGBT which allows you to read and write DataBlocks from and to fits files. The module doc strings should be relatively self explanatory so I won't go into details.

time_stream.base_single

Base class for making an analysis stage. Eliminates all the boiler plate of reading in fits files and looping over them. More on this to come.

Navigation

Pipeline Primer Contents

Time Stream Data Format

Map Data Format

Pipeline and Stages

Setting Up and Getting Started