Skip to content

Commit

Permalink
add/update csv2bufr plugin documentation (#839)
Browse files Browse the repository at this point in the history
  • Loading branch information
maaikelimper authored Jan 10, 2025
1 parent 538903a commit bbafee3
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 40 deletions.
18 changes: 12 additions & 6 deletions docs/source/reference/running/data-pipeline-plugins.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,16 @@ Default pipeline plugins
wis2box provides a number of data pipeline plugins by default, which users can be used "out of the box". The
list below describes each plugin and provides an example data mappings configuration.

.. _csv2bufr-plugin:

``wis2box.data.csv2bufr.ObservationDataCSV2BUFR``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This plugin converts CSV observation data into BUFR using ``csv2bufr``. A csv2bufr template
can be configured to process the data accordingly. In addition, ``file-pattern`` can be used
to filter on incoming data based on a regular expression. Consult the `csv2bufr`_ documentation
for more information on configuration and templating.
This plugin converts CSV observation data into BUFR using `csv2bufr`_.

A `template` is required to convert the CSV columns to BUFR encoded values. See `csv2bufr-examples`_ on how to create a template or use one of the built-in templates.

A ``file-pattern`` is used to filter on incoming data based on a regular expression.

A typical csv2bufr plugin workflow definition would by defined as follows:

Expand All @@ -38,7 +41,7 @@ A typical csv2bufr plugin workflow definition would by defined as follows:
The default templates are defined by the `csv2bufr-templates`_ repository.

In the case the user wants to use a custom template, the template should be located in the ``$WIS2BOX_HOST_DATADIR/mappings`` directory.
To use a custom template, the template should be located in the ``$WIS2BOX_HOST_DATADIR/mappings`` directory and the `wis2box.env` file should include `CSV2BUFR_TEMPLATES=${WIS2BOX_DATADIR}/mappings`.

The plugin configuration would then be defined as follows:

Expand All @@ -50,6 +53,7 @@ The plugin configuration would then be defined as follows:
notify: true # trigger GeoJSON publishing for API and UI
file-pattern: '^.*\.csv$'
Environment variables can be set in `wis2box.env` to customize the behavior of the csv2bufr-plugin within the wis2box, see `csv2bufr-environment-variables`_ for the full list of environment variables.

``wis2box.data.bufr4.ObservationDataBUFR2GeoJSON``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -165,7 +169,9 @@ For example, to publish GRIB2 data matching the file-pattern ``^.*_(\d{8})\d{2}.
See :ref:`data-mappings` for a full example data mapping configuration.

.. _`csv2bufr`: https://csv2bufr.readthedocs.io
.. _`csv2bufr`: https://csv2bufr.readthedocs.io/en/v0.8.5/
.. _`csv2bufr-examples`: https://csv2bufr.readthedocs.io/en/v0.8.5/example.html
.. _`csv2bufr-environment-variables`: https://csv2bufr.readthedocs.io/en/v0.8.5/installation.html#environment-variables
.. _`csv2bufr-templates`: https://github.com/wmo-im/csv2bufr-templates
.. _`bufr2geojson`: https://github.com/wmo-im/bufr2geojson
.. _`synop2bufr`: https://synop2bufr.readthedocs.io
Expand Down
57 changes: 23 additions & 34 deletions docs/source/user/data-ingest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,52 +10,20 @@ The wis2box storage is provided using a `MinIO`_ container that provides S3-comp
Any file received in the ``wis2box-incoming`` storage bucket will trigger an action to process the file.
What action to take is determined by the data mappings that were setup in the previous section.

wis2box-webapp
--------------

The wis2box-webapp is a web application that includes the following forms for data validation and ingestion:

* user interface to ingest `FM-12 SYNOP data <https://library.wmo.int/idviewer/35713/33>`_
* user interface to ingest CSV data using the :ref:`AWS template<aws-template>`

The wis2box-webapp is available on your host at `http://<your-public-ip>/wis2box-webapp`.

Interactive data ingestion requires an execution token, which can be generated using the ``wis2box auth add-token`` command inside the wis2box-management container:

.. code-block:: bash
python3 wis2box-ctl.py login
wis2box auth add-token --path processes/wis2box
.. note::

Be sure to record the token value, as it will not be shown again. If you lose the token, you can generate a new one.

data mappings plugins
---------------------
^^^^^^^^^^^^^^^^^^^^^

The plugins you have configured for your dataset mappings will determine the actions taken when data is received in the MinIO storage bucket.

The wis2box provides 3 types of built-in plugins to publish data in BUFR format:

* `bufr2bufr` : the input is received in BUFR format and split by subset, where each subset is published as a separate bufr message
* `synop2bufr` : the input is received in `FM-12 SYNOP format <https://library.wmo.int/idviewer/35713/33>`_ and converted to BUFR format. The year and month are extracted from the file pattern
* `csv2bufr` : the input is received in CSV format and converted to BUFR format, a mapping template is used to convert the CSV columns to BUFR encoded values. Custom mapping templates need to be placed in the ``$WIS2BOX_HOST_DATADIR/mappings`` directory. See :ref:`csv2bufr-templates` for examples of mapping templates
* `csv2bufr` : the input is received in CSV format and converted to BUFR format, a mapping template is required to convert the CSV columns to BUFR encoded values. See :ref:`csv2bufr-plugin` for information on how to configure the csv2bufr plugin.

To publish data for other data formats you can use the 'Universal' plugin, which will pass through the data without any conversion.
Please note that you will need to ensure that the date timestamp can be extracted from the file pattern when using this plugin.

.. _aws-template:

The AWS template in csv2bufr plugin
-----------------------------------

When using the csv2bufr plugin, the columns are mapped to BUFR encoded values using a template as defined in the repository `csv2bufr-templates`_.

An example of a CSV file that can be ingested using the 'AWS' mappings template can be downloaded here :download:`AWS-example <../_static/aws-example.csv>`

The CSV columns description of the AWS template can be downloaded here :download:`AWS-reference <../_static/aws-minimal.csv>`


MinIO user interface
--------------------
Expand Down Expand Up @@ -194,6 +162,27 @@ For example using the command line from the host running wis2box:
put /path/to/your/datafile.csv wis2box-incoming/urn:wmo:md:it-meteoam:surface-weather-observations.synop
EOF
wis2box-webapp
--------------
The wis2box-webapp is a web application that includes the following forms for data validation and ingestion:
* user interface to ingest `FM-12 SYNOP data <https://library.wmo.int/idviewer/35713/33>`_
* user interface to ingest CSV data using the csv2bufr-plugin and using the predefined "AWS-template" mapping.
The wis2box-webapp is available on your host at `http://<your-public-ip>/wis2box-webapp`.
Interactive data ingestion requires an execution token, which can be generated using the ``wis2box auth add-token`` command inside the wis2box-management container:
.. code-block:: bash
python3 wis2box-ctl.py login
wis2box auth add-token --path processes/wis2box
.. note::
Be sure to record the token value, as it will not be shown again. If you lose the token, you can generate a new one.
wis2box-data-subscriber
-----------------------
Expand Down

0 comments on commit bbafee3

Please sign in to comment.