implement data storage directory structure #31

tomkralidis · 2021-11-16T13:03:23Z

Create a storage directory structure for wis2node to support the various data/files managed within an installation.

data
- local data processing
  - incoming
  - outgoing
- downloaded data from other wis2node services
metadata
- discovery
- station

tomkralidis · 2021-11-25T19:50:42Z

Initial proposal to feed discussion:

|____data
| |____outgoing  # for publication
| |____public  # data available via PubSub and STAC
| | |____YYYY-MM-DD
| | | |____originating-centre
| | | | |____data-category
| | | | | |____dataset-name
| |____incoming  # incoming (CSV) data to be processed
|____metadata  # to feed API publication and OSCAR/Surface caching
| |____discovery
| |____station

tomkralidis · 2021-11-29T21:48:45Z

Based on discussions with @petersilva

Example structure: /<RFC3339>/<source>/<ISO-3166-2>/<originating-centre>

originating-centre: https://github.com/wmo-im/GTStoWIS2/blob/main/GTStoWIS2/TableCCCC.json
relative sarra path starts at <source>

tomkralidis · 2021-12-01T22:46:25Z

updated proposal:

|____data
| |____config
| |____outgoing
| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____country-code
| | | | | |____originating-centre
| | | | | | |____data-category
| | | | | | | |____dataset-name
| |____incoming
| | |____YYYY-MM-DD
| |____errors
| | |____YYYY-MM-DD
|____metadata
| |____discovery
| |____station

efucile · 2021-12-02T07:57:49Z

my suggestion

|____data
| |____config
| |____outgoing
| |____public
| | |____YYYY-MM-DD
| | | |____source #please define source. Is it the an identifier for the broker?
| | | | |____tree type = land observations #different tree structure depending on land, ocean, satellite obs or other products
| | | | | |____country-code #ocean, satellite, NWP data don't have country-code
| | | | | | |____originating-centre
| | | | | | | |____data-category
| | | | | | | | |____dataset-name
| |____incoming
| | |____YYYY-MM-DD
| |____errors
| | |____YYYY-MM-DD
|____metadata
| |____discovery
| |____station

tomkralidis · 2021-12-02T12:43:44Z

/data/public/YYYY-MM-DD/source would be a fixed value of wis here
tree type: the thinking was that data-category would cover this? Having said this
country-code: perhaps we omit in lieu of originating-centre only?

Update (narrowed to /data/public):

| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____data-category/tree-type
| | | | | |____originating-centre
| | | | | | |____dataset-name

One issue would be that "data-category/tree-type" could potentially have varying levels of nesting.

cc @petersilva for comments/insight

efucile · 2021-12-02T13:29:23Z

The originating center is not a country. You may want to have a country level split and know who is generating the data

tomkralidis · 2021-12-02T15:53:25Z

Update based on discussion with ECMWF:

|____public
| |____YYYY-MM-DD
| | |____source
| | | |____tree-type
| | | | |____country-code
| | | | | |____originating-centre
| | | | | | |____data-category

Notes: the dataset-name (now removed) would be a composite of tree-type.country-code.originating-centre.data-category

tomkralidis · 2021-12-02T21:32:11Z

Update:

there will varying levels of hierarchy based on the data category (observations, NWP, satellite, ocean, etc.)

Building with surface observations as critical path:

data_category.country_code.originating_centre.station_type.representation

where:

data_category: new codelist
country_code: ISO 3166-2
originating_centre: keys from https://github.com/wmo-im/GTStoWIS2/blob/main/GTStoWIS2/TableCCCC.json
station_type: https://codes.wmo.int/wmdr/_FacilityType
representation: new codelist

Example: observations-surface-land.ca.cwao.landFixed.bufr

petersilva · 2021-12-02T23:58:26Z

That looks really helpful, but it does not have complete coverage: station_type only mentions obs. what about: nwp outputs, forecasts, synthetic satellite imagery? or text forecasts and warnings? Also, is RADAR landFixed?

tomkralidis · 2021-12-03T00:00:01Z

Good points @petersilva: land surface obs is our initial iteration. The tree will evolve as we build out NWP, forecast, satellite.

petersilva · 2021-12-03T00:02:16Z

If that is the hierarchy you want, then we can attempt to modify the tables in GTStoWIS2 to generate them for matching WMO386 AHL's. This is exactly the kind of feedback we have been trying to solicit to the proposed topic tree.

petersilva · 2021-12-03T00:25:08Z

In discussions in GTStoWIS (TT protocols team) we have agreed to omit file format (aka representation) as it was agreed that files should have, as is universal convention outside the weather world, appropriate file extensions. As the topics correspond to file folders, having type in the topic (which is also a folder) will result in all files having the representation twice. e.g:

a/b/type/c/filename.type

The file type/data representation will unavoidably show up twice. Ideally, one looks up some form of information, say alerts, and the directory offers .cap files, .geojson files, and .crex files, and perhaps .txt ones. The committee considers the fact that data with different representations does not show up in the same directory currently to be a bug in our proposed topic tree. One that we have issues open to address ( e.g.: wmo-im/GTStoWIS2#55, wmo-im/GTStoWIS2#39 )

tomkralidis · 2021-12-06T13:24:51Z

Update standup 2021-12-03:

HH: example for SYNOP, do we extend the tree
DB: leave that for data inspection
DB: for ocean, we can use regions as named geographies
DB: collaborating networks are not originating centres, how do we include them as well?
PM: how do we deal with lake buoy data on Lake Malawi?
XC: what is the origin of https://github.com/wmo-im/GTStoWIS2/blob/main/GTStoWIS2/TableCCCC.json ? Seems 2x bigger than from the GTS manual (@petersilva comment?)
DB: ...public/YYYY-MM-DD: is this the publish date or date of observation? Should be the latter
DB/XC: output data filename convention can be more indicative: TODO in csv2bufr

david-i-berry · 2021-12-06T15:21:48Z

For marine regions the high level geographies from https://www.marineregions.org/gazetteer.php?p=details&id=23616 may be useful.

david-i-berry · 2021-12-06T15:25:23Z

DB: collaborating networks are not originating centres, how do we include them as well?

In terms of my comment, one of the big advantages of the WIS2.0 is that it opens up the system to observations from other communities. For example, if say an oceanographic data centre wanted to setup a WIS2.0 node and make data available via the node the proposed originating centre hierarchy may not be appropriate.

tomkralidis · 2021-12-07T00:24:58Z

Current state of directory structure (based on initial iteration of surface weather data):

|____data
| |____config
| |____outgoing
| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____country-code
| | | | | | |____originating-centre
| | | | | | | |____data-category
| |____incoming
| | |____YYYY-MM-DD
| |____errors
| | |____YYYY-MM-DD
|____metadata
| |____discovery
| |____station

Remaining points:

file type as part of topic hierarchy (@petersilva / @efucile): based on TT-Protocols recommendation, we should not include the data representation as part of the tree per se
collaborating networks are not always originating centres (@petersilva / @david-i-berry). Should there be guidance on how non-originating centres can define themselves at this part of the tree?

petersilva · 2021-12-07T12:40:12Z

Table CCCC:

derivation of current table: identify authoritative source for CCCC. GTStoWIS2#4
looking at notes in GTStoWIS2 module, another source is: http://weather.rap.ucar.edu/surface/stations.txt
suggesting further additions: Add missing CCCC in TableCCCC.json if only country is known? GTStoWIS2#43

Table was built by converting the PDF online to a text file, transposing it, manual cleaning. followed by merging with the file from UCAR, and manual additions, as the data flowed for weeks, and we saw CCCC's show up that were unaccounted for. The flow is the normal bulleting flow of UCAR/UNIDATA. (deriving_CCCC subdir in GTStoWIS2 repo.)

petersilva · 2021-12-07T12:56:29Z

CCCC is just a GTS transition mechanism... The mapping being done by GTStoWIS2 module is to the "centre" field in the table, which is kind of simplified, lower-case but much more readable names. "AMRF" -> "melbourne_regional_forecasting_centre" ... This use of ascii constrained simplified place names, far more readable for most than the four letter CCCC's is invented as as part of and implicit in the GTStoWIS2 proposal. This proposal is made in the absence of a higher quality source.

efucile · 2021-12-07T13:05:42Z

we could probably replace originating-center with originator. Just to clarify that it may be an entity that is not a center and that is not the previous existing table. I guess that we will need a controlled vocabulary for this and I would suggest it to go in codes.wmo.int

petersilva · 2021-12-07T13:53:28Z

@efucile are you pointing to adding to http://codes.wmo.int/common/centre ? or creating a new table?

efucile · 2021-12-07T13:57:34Z

we need new tables for WIS2

tomkralidis · 2021-12-07T16:20:00Z

cc @amilan17

In TT-WISMD, we will start working on WCMP 2.0 codelists at https://github.com/wmo-im/wcmp2-codelists, and will kick off this activity at our next meeting.

tomkralidis · 2021-12-07T16:51:03Z

Updated tree:

|____data
| |____config
| |____outgoing
| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____country-code
| | | | | | |____originator
| | | | | | | |____data-category
| |____incoming
| | |____YYYY-MM-DD
| |____errors
| | |____YYYY-MM-DD
|____metadata
| |____discovery
| |____station

Closing this issue, given the implementation in wis2node proper. Formal codelist creation will be put forth at TT-WISMD.

tomkralidis · 2021-12-17T03:47:07Z

Notes based on 2021-12-15 discussion:

mirror config and incoming directory structures with public

Next steps:

implement directory structure in incoming and config

@petersilva / @efucile thinking more:

| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____country-code
| | | | | | |____originator
| | | | | | | |____data-category

Example: observations-surface-land.ca.cwao.landFixed

Should we consider the data category higher up in the hierarchy, i.e.:

Example

| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____data-category
| | | | | | |____country-code
| | | | | | | |____originator

Example: observations-surface-land.landFixed.ca.cwao

Thoughts?

petersilva · 2021-12-17T13:14:41Z

in hierarchies, normally, each level of the hierarchy has "control" of lower levels of the hierarchy. This principle is expressed both in the OID mechanism ( discussed here: wmo-im/GTStoWIS2#37 ) and was given by @remygiraud as a constraint on the hierarchy.

The country is supposed to be top... the national authority then can permit / assign / control / refuse the next level of the tree (centre names within the country.) Each centre has control of what it publishes under their centre-id.

So country-code.centre-id was a starting point for our committee work.

This control is also a kind of permission to write in the tree. it is natural to implement permissions that align with the hierarchy, but it is hard to see how it would be done if bulletins from every originator are scatterred throughout the tree.

tomkralidis · 2022-01-27T18:17:22Z

After discussion with @efucile, should we add a level to the topic hierarchy based on the WMO Unified Data Policy, so:

core
recommended
other ?

Example: core.observations-surface-land.ca.cwao.landFixed

Or is this a function of the discovery metadata evaluation (to assess whether to further bind to a resource)?

cc @petersilva @kaiwirt

petersilva · 2022-01-27T20:22:53Z

I don't know enough about the Unified Data Policy to understand the implications. I suspect that "core/recommended/other" is a distinction that is not material to most subscribers, who will not know what it means, or why it matters. Questions:

Are we saying that "core" roughly corresponds to traditional WMO 386/Volume C1 data?
I guess a particular format of land observation is considered "core" by the WMO?
Presumably, a BUFR ob would be in core, and a geojson for the same location in "other"?
If someone defines a new template, departing from the standard one, does that BuFR have to go under "other"?
when the policy goes through revisions, and a new type gets accepted, do we have two locations and a transition period. (progression from "other" to "recommended" and later "core" in successive versions?)

david-i-berry · 2022-01-28T11:15:40Z

Core data is described here:

https://meetings.wmo.int/Cg-Ext-2021/_layouts/15/WopiFrame.aspx?sourcedoc=/Cg-Ext-2021/InformationDocuments/Cg-Ext(2021)-INF04-1-CATALOGUE-OF-CORE-DATA_en.docx&action=default

and resolution 1 / UDP here:

https://ane4bf-datap1.s3-eu-west-1.amazonaws.com/wmocms/s3fs-public/ckeditor/files/Cg-Ext2021-d04-1-WMO-UNIFIED-POLICY-FOR-THE-INTERNATIONAL-approved_en_0.pdf?4pv38FtU6R4fDNtwqOxjBCndLIfntWeR

The format or message type is not important by my reading and I would have thought (hoped) that the BUFR and geojson for the same observation would have the same classification, i.e. if the BUFR is classed as core data then the geojson should also be considered core. If this is not the case it then gets very messy.

I would have thought the easiest way to flag/control would be to have the classification within the topic hierarchy. If we do this do we also need to consider other data licensing models, for example provision under one of the creative commons licenses? We considered this issue when making data available through the C3S data store, code table here:

https://glamod.github.io/cdm-obs-documentation/tables/code_tables/data_policy_licence/data_policy_licence.html

petersilva · 2022-01-28T18:20:20Z

I don´t know that format is irrelevant... I gather (perhaps wrongly) that there are discussions between aviation and meteorological people, where the aviation community tends to want the Aviation XML to be limited access, perhaps only commercially available, and the Met community circulates BuFR obs publically with no restrictions for the same location. Bufr being harder for non-met users to deal with, both communities are happy. So I guess AvXML stuff would be non-core...

From @david-i-berry´s links the core/etc... thing is more about distribution rights and requirements (must be available at no cost, vs. potentially restricted.) I don´t think most users care what the IP regime for data they are obtaining is (beyond whether they can access it or not.)

Looking at the last link david provided... I guess we get a topic corresponding to "data policy license" with values like "Attribution-NonCommercial-ShareAlike-CC-BY-NC-SA" (likely shortened to CC-BY-NC-SA ?) to distinguish between data set licensing, and then potentially identical trees under them.. with different content depend on how each product is licensed.

hmm... Is that what people intend?

tomkralidis · 2022-01-31T01:37:59Z

I agree with @david-i-berry. File format/representation should be independent from data management/identification.

david-i-berry · 2022-01-31T14:02:30Z

My takeaway from the WIS2node standup call today is that we only want to worry about WMO data and that other sources are out of scope. In defining the topic tree we don't want to include WIS but do want to include the data category / license.

From the WMO Unified Data Policy we have the following definitions:

Members shall provide on a free and unrestricted basis the core data that are necessary for the provision of services in support of the protection of life and property and for the well-being of all nations, at a minimum those data described in Annex 1 to this resolution which are required to monitor and predict seamlessly and accurately weather, climate, water and related environmental conditions;
Members should also provide the recommended data that are required to support Earth system monitoring and prediction activities at the global, regional and national levels and to further assist other Members with the provision of weather, climate, water and related environmental services in their States and Territories. Conditions may be placed on the use of recommended data;

giving rise to two branches

| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____core
| | | | | | |____country-code
| | | | | | | |____originator

and

| |____public
| | |____YYYY-MM-DD
| | | |____source
| | | | |____tree-type
| | | | | |____recommended
| | | | | | |____country-code
| | | | | | | |____originator

For data with restrictions in the recommended branch those restrictions would be specified in the metadata and not the tree itself.

petersilva · 2022-01-31T14:26:57Z

@david-i-berry ... that sounds like what I heard...possible optimization: source in the above tree is "WIS" ... from the discussions, @efucile was advocating getting rid of WIS, so we could just promote 'core' and 'restricted' up three levels, and have... say... WMO-Core just under the date, and WMO-Recommended, WMO-Other... etc... we save a level in the tree... and still get the concept in there.

petersilva · 2022-01-31T14:30:22Z

sorry, I just noticed tree-type is separate from "recommended" or "core" ... perhaps I misunderstood... I initially thought they were the same thing... I don't know what tree-type is. I edited the previous comment... to omit discussion of tree-type.

tomkralidis · 2022-02-28T22:25:38Z

Implemented in initial iteration. Will evolve in parallel with direction from WMO topic hierarchy efforts.

tomkralidis added the data persistence label Nov 16, 2021

tomkralidis added this to the sprint-001 milestone Nov 16, 2021

tomkralidis self-assigned this Nov 16, 2021

tomkralidis mentioned this issue Nov 25, 2021

add micro-specification to discovery MCFs #26

Closed

tomkralidis assigned efucile and david-i-berry Nov 25, 2021

tomkralidis closed this as completed Dec 7, 2021

tomkralidis reopened this Dec 17, 2021

tomkralidis closed this as completed Feb 28, 2022

tomkralidis modified the milestones: sprint-001, sprint-002 Feb 28, 2022

davidgondwe mentioned this issue Aug 30, 2023

wis2box not starting well #502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement data storage directory structure #31

implement data storage directory structure #31

tomkralidis commented Nov 16, 2021

tomkralidis commented Nov 25, 2021

tomkralidis commented Nov 29, 2021

tomkralidis commented Dec 1, 2021 •

edited

Loading

efucile commented Dec 2, 2021 •

edited

Loading

tomkralidis commented Dec 2, 2021

efucile commented Dec 2, 2021

tomkralidis commented Dec 2, 2021

tomkralidis commented Dec 2, 2021

petersilva commented Dec 2, 2021 •

edited

Loading

tomkralidis commented Dec 3, 2021

petersilva commented Dec 3, 2021

petersilva commented Dec 3, 2021

tomkralidis commented Dec 6, 2021

david-i-berry commented Dec 6, 2021

david-i-berry commented Dec 6, 2021

tomkralidis commented Dec 7, 2021

petersilva commented Dec 7, 2021 •

edited

Loading

petersilva commented Dec 7, 2021

efucile commented Dec 7, 2021

petersilva commented Dec 7, 2021

efucile commented Dec 7, 2021

tomkralidis commented Dec 7, 2021

tomkralidis commented Dec 7, 2021

tomkralidis commented Dec 17, 2021 •

edited

Loading

petersilva commented Dec 17, 2021 •

edited

Loading

tomkralidis commented Jan 27, 2022

petersilva commented Jan 27, 2022

david-i-berry commented Jan 28, 2022 •

edited

Loading

petersilva commented Jan 28, 2022 •

edited

Loading

tomkralidis commented Jan 31, 2022

david-i-berry commented Jan 31, 2022

petersilva commented Jan 31, 2022 •

edited

Loading

petersilva commented Jan 31, 2022

tomkralidis commented Feb 28, 2022

implement data storage directory structure #31

implement data storage directory structure #31

Comments

tomkralidis commented Nov 16, 2021

tomkralidis commented Nov 25, 2021

tomkralidis commented Nov 29, 2021

tomkralidis commented Dec 1, 2021 • edited Loading

efucile commented Dec 2, 2021 • edited Loading

tomkralidis commented Dec 2, 2021

efucile commented Dec 2, 2021

tomkralidis commented Dec 2, 2021

tomkralidis commented Dec 2, 2021

petersilva commented Dec 2, 2021 • edited Loading

tomkralidis commented Dec 3, 2021

petersilva commented Dec 3, 2021

petersilva commented Dec 3, 2021

tomkralidis commented Dec 6, 2021

david-i-berry commented Dec 6, 2021

david-i-berry commented Dec 6, 2021

tomkralidis commented Dec 7, 2021

petersilva commented Dec 7, 2021 • edited Loading

petersilva commented Dec 7, 2021

efucile commented Dec 7, 2021

petersilva commented Dec 7, 2021

efucile commented Dec 7, 2021

tomkralidis commented Dec 7, 2021

tomkralidis commented Dec 7, 2021

tomkralidis commented Dec 17, 2021 • edited Loading

petersilva commented Dec 17, 2021 • edited Loading

tomkralidis commented Jan 27, 2022

petersilva commented Jan 27, 2022

david-i-berry commented Jan 28, 2022 • edited Loading

petersilva commented Jan 28, 2022 • edited Loading

tomkralidis commented Jan 31, 2022

david-i-berry commented Jan 31, 2022

petersilva commented Jan 31, 2022 • edited Loading

petersilva commented Jan 31, 2022

tomkralidis commented Feb 28, 2022

tomkralidis commented Dec 1, 2021 •

edited

Loading

efucile commented Dec 2, 2021 •

edited

Loading

petersilva commented Dec 2, 2021 •

edited

Loading

petersilva commented Dec 7, 2021 •

edited

Loading

tomkralidis commented Dec 17, 2021 •

edited

Loading

petersilva commented Dec 17, 2021 •

edited

Loading

david-i-berry commented Jan 28, 2022 •

edited

Loading

petersilva commented Jan 28, 2022 •

edited

Loading

petersilva commented Jan 31, 2022 •

edited

Loading