Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features/get bufr restructure #223

Merged
merged 26 commits into from
Feb 16, 2024
Merged

Features/get bufr restructure #223

merged 26 commits into from
Feb 16, 2024

Conversation

ladsmund
Copy link
Contributor

@ladsmund ladsmund commented Jan 24, 2024

I have updated thet BUFR export functionalities to

  • Parameterize relative sensor heights. Eg anometer relative to sonic ranger and barometer relative to gps to enable station specific configurations.
  • Make it easier to (re)create BUFR files for specific timestamps
    • DMI requests missing BUFR files from downtime period
    • Testable code

In order to achieve that I have

  • Organised the code Into three main modules:
    • bufr_utilities contains functionalities related to BUFR serialisation including both a read and write functions.
    • real_time_utilities contains functionality for filtering and selecting instantaneous data from time series.
    • get_bufr contains main scripts for generating bufr files AWS L3 csv data - aligning with out current operational pipeline.
  • Implemented unit tests to ensure the updated version don't change the output.
  • Parameterized get_bufr and restructured the use of global variables (WMO configs)
  • Stoped using global context such as datetime.utcnow()

* Updated get_buffer to become more testable by changing global variables to parameters
* Added csv file with l3 data for testing
* Implemented unit tests covering the current get_buffer imlemented
* Fixed issue where find_positions were incorrectly expected to return a single value instead of a tuple. It could have been solved by adding parenthesis but is better to use a consistent signature of the return values.
@ladsmund ladsmund force-pushed the features/get_bufr_restructure branch from c65fe66 to 7a80dcf Compare January 30, 2024 13:44
Copy link
Member

@BaptisteVandecrux BaptisteVandecrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mads for this rework!

I've always struggled to understand this part of the code and I have made comments about structure and readability that will hopefully make the code clearer.

I did not see anything wrong and I expect the code to work for 80% of the stations. I have listed some weird station geometries and weird cases where SR50s are misbehaving. It is fine for now to use the standard geometries everywhere and to add more flexibility to the representation of the station design in the code at a later point.

I have a problem with eccodes on my machine and therefore cannot test the code myself.

src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/real_time_utilities.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/station_dimensions.csv Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Outdated Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Show resolved Hide resolved
src/pypromice/postprocess/csv2bufr.py Show resolved Hide resolved
src/pypromice/postprocess/get_bufr.py Show resolved Hide resolved
src/pypromice/test/bufr_export/test_bufr_export.py Outdated Show resolved Hide resolved
* Made variables more descriptive
* Applied auto clean (black)
* Updated logging
* Added earliest date as parameter
* Extracted functionalities for determine bufr variables
* Restructured get current data
    * Extracted last_data_row functionality
    * Updated selecting current_timestamp
* Added csv-table to support station specific dimensions
* Created unit tests that explicitly checks bufr variables
* Reorder processing steps
* Fix spelling
* Spell correction anometer -> anemometer
* Removed dev argument
* renamed csv2bufr -> bufr_utilities.py
* Various cleanup and renaming
* Moved realtime helper functions to real_time_utilities.py
* Renamed and updated getBUFR -> write_bufr_message
* Changed signature to take only variable Series and a writable fileobject as input
* Add __all__ list to bufr_utilities.py
* Implemented bufr_reader

Configuration handling
* Eliminate pypromice.postprocess.wmo_config
* Add wmo id to BUFRVariables to remove station specific configurations
* Moved BUFR template configurations to bufr_utilities.py
* Added BUFRVariables attrs class to manage configurations instead of pd.Series

Unittests
* Added BUFR export tests
* Test exception handling
* test_land_station_export, WEB_B: e87f17d029d76cda45ee23172933a4dcca554fb3d7235116eb8ecc68105902ec
* Renaemd test_bufr_export.py -> test_get_bufr
* Add process_station(..) to isolate functionalities for single stations for readability and testability
* use Path as inputs
* Use position seeds as input

StationConfiguration
* Added attrs class for handling station configurations
  * Implemented functions for reading and writing toml files
  * Renamed station_dimensions.csv -> station_configurations.toml
* Move configuration from wmo_config to station_configuration
* Added skip stid to station_configuration
* Added skipped_variables to station_configuration

Compile data checks and position export from real_time_utilities.py
* Removed time filtering from get_latest
* Removed position storage from get_latest.

Other
* Moved tx_l3_test1.csv
The previous version incorrectly used older variables for fill nan.
@ladsmund ladsmund force-pushed the features/get_bufr_restructure branch from 526fac2 to 6b3391f Compare February 15, 2024 10:55
Copy link
Contributor Author

@ladsmund ladsmund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished refactoring and addressed change requests

FutureWarning: last is deprecated and will be removed in a future version. Please create a mask and filter using `.loc` instead
  df_limited = df.last(time_limit).copy()
Handle case where station dimension attributes were None: station_configuration.barometer_from_gps

Removed default path in ArgumentParser
Copy link
Member

@BaptisteVandecrux BaptisteVandecrux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work.
It is indeed clearer and will definitely be easier to test, adapt and debug.

Unfortunately, I do not have the time to go through the main scripts (bufr_utilities.py and get_bufr.py) until mid-next week. If you are ready to make it operational, and if we are still waiting for DMI to fix technical issues, then please go ahead. I'd personally switch to this new version and debug it now rather than after the initiation of data flow to GTS.

The only things I'd be careful with are potential wrong settings for export_bufr and skipped_variables attributes in the configuration file. By "wrong" I mean different from the previous version but also potential new station/sensors failures. But I guess you went through that file and through the data a couple of times already!

So I approve the PR for now and will have a second look whenever I have the time.
I let you decide how you want to proceed.

@ladsmund ladsmund merged commit dda3da0 into main Feb 16, 2024
4 checks passed
@ladsmund
Copy link
Contributor Author

I have made a slight update to aws-operational-processing to provide explicit cli parameters since I removed the default paths.

https://github.com/GEUS-Glaciology-and-Climate/aws-operational-processing/pull/23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants