DP3
===
`Go to source documentation `_.
DP3 (the Default Preprocessing Pipeline, previously NDPPP for New Preprocessing Pipeline) is the LOFAR data pipelined processing program. It can be used to do all kind of operations on the data in a pipelined way, so the data are read and written only once.
DP3 started as a new and faster version of IDPPP.
ASTRON users can see the original differences `here `__.
DP3 preprocesses the data of a LOFAR observation by executing steps like flagging or averaging. Such steps can be used for the raw data as well as the calibrated data by defining the data column to use. One or more of the following steps can be defined as a pipeline. DP3 has an implicit input and output step. It is also possible to have intermediate output steps.
DP3 comes with quite some predefined steps, but it is possible to plugin arbitrary steps, either implemented in C++ or Python.
.. toctree::
:maxdepth: 2
:caption: Home
:hidden:
self
changelog
.. toctree::
:maxdepth: 2
:caption: Steps
:hidden:
:glob:
steps/Description of all parameters
steps/*
The following steps are possible:
* :ref:`Flagging` and :ref:`Filtering`:
* :ref:`AOFlagger` for automatic flagging in time/freq windows using Andre Offringa's advanced aoflagger.
* :ref:`Preflagger` to flag given baselines, time slots, etc.
* :ref:`UVWFlagger` to flag based on UVW coordinates, possibly in the direction of another source.
* :ref:`MADFlagger` for automatic flagging in time/freq windows based on median filtering.
* :ref:`Filter` to filter on baseline and/or channel (only the given baselines/channels are kept). The reader step has an implicit filter.
* :ref:`Averaging`
* :ref:`Averager` to average data in time and/or freq.
* :ref:`Phase Shifting`
* :ref:`PhaseShift` to shift data to another phase center.
* :ref:`Demixing` to remove strong sources (A-team) from the data.
* :ref:`Demixer` to demix in the old way.
* :ref:`SmartDemixer` to demix in a new, smarter way.
* :ref:`Station summation`
* :ref:`StationAdder` to add stations (usually the superterp stations) forming new station(s) and baselines.
* :ref:`Counter` to count the number of flags per baseline, frequency, and correlation. A flagging step also counts how many visibilities it flagged. Counts can be saved to a table to be plotted later using function `plotflags` in python module `lofar.dppp`.
* **Data calibration** and :ref:`Data scaling`
* :ref:`ApplyCal` to apply an existing calibration to a MeasurementSet.
* :ref:`GainCal` to calibrate gains using StefCal.
* :ref:`DDECal` to calibrate direction dependent gains.
* :ref:`Predict` to predict the visibilities of a given sky model.
* :ref:`H5ParmPredict` to subtract multiple directions of visibilities corrupted by an instrument model (in H5Parm) generated by DDECal.
* :ref:`SagecalPredict` to use SAGECal routines for model prediction, replacement for normal Predict, H5ParmPredict and also within DDECal.
* `ApplyBeam `__ to apply the LOFAR beam model, or the inverse of it.
* :ref:`SetBeam` to set the beam keywords after prediction.
* :ref:`ScaleData` to scale the data with a polynomial in frequency (based on SEFD of LOFAR stations).
* :ref:`Upsample` to upsample visibilities in time
* :ref:`Output` to add intermediate output steps
* :ref:`Interpolate` for improving the accuracy of data averaging.
* :ref:`User defined ` steps provide a plugin mechanism for arbitrary steps implemented in C++.
* :ref:`Python defined ` steps provide a plugin mechanism for arbitrary steps implemented in Python.
The input is one or more (regularly shaped) MeasurementSets (MSs). The data in the given column are piped through the steps defined in the parset file and finally written (if needed). It makes it possible to, say, flag at the full resolution, average, flag on a lower resolution, average further, and finally write the data.
Regularly shaped means that all time slots in the MS must contain the same baselines and channels. DP3 can handle only one spectral window. If the MS has multiple spectral windows, one has to be selected.
If multiple MSs are given as input, their data are combined in frequency. It means that the time, phase direction, etc. of the different MSs have to be the same. Note that other steps (like averaging) can still be used.
When combining MSs (thus combining subbands), it is possible that one or more of them do not exist. Flagged data will be inserted for them. The missing frequency info is deduced from the other subbands.
Note that in order to insert missing subbands in the data, the names of the missing MSs have to be given at the right place in the list of MS names. Otherwise DP3 does not know that subbands are missing.
The output can be a new MeasurementSet, but it is also possible to update the flags if the input is a single MS. If averaging or phase-shifting to another phase center is done, the only option is to create a new MeasurementSet.
At the end the run time is shown. Note that on a multi-core machine, the user time can exceed the elapsed time (user time is an accumulated count per core). By default, the percentage of time each step took is also shown.
The AOFlagger, MADFlagger, and Demixer, by far the most expensive parts of DP3, can run multi-threaded if DP3 is built with OpenMP. It is possible to define the number of threads to use by the global key `numthreads`. If that is not set, it uses the environment variable `OMP_NUM_THREADS`. If also that variable is undefined, a DP3 run uses as many threads as there are CPU cores. Thus, if multiple DP3 runs are started on a machine, the default total number of threads will exceed the number of CPU cores.
MeasurementSet Access
---------------------
* The :ref:`msin step` defines which MS and which DATA column to use. It is possible to specify multiple MSs using a glob-pattern or a vector of MS names.
* If multiple MSs are given, they will be concatenated in frequency. It means that all MSs must have the same times, baselines, etc. Flagged data can be inserted for MSs that are specified, but do not exist.
* It is possible to select baselines and/or a band (spectral window) and/or skip leading or trailing channels. This is the same for each input MS.
* Optionally proper weights can be calculated using the auto-correlation data.
* It sets flags for invalid data (NaN or infinite).
* Dummy, fully flagged data with correct UVW coordinates will be inserted for missing time slots in the MS. This can only be done if a single input MS is used.
* Missing time slots at the beginning or end of the MS can be detected by giving the correct start and end time. This is particularly useful for the imaging pipeline where BBS requires that the MSs of all subbands of an observation have the same time slots. When updating an MS, those inserted slots are temporary and not put back into the MS.
* The :ref:`ms_out