-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement data storage directory structure #31
Comments
Initial proposal to feed discussion:
|
Based on discussions with @petersilva Example structure:
|
updated proposal:
|
my suggestion
|
Update (narrowed to
One issue would be that "data-category/tree-type" could potentially have varying levels of nesting. cc @petersilva for comments/insight |
The originating center is not a country. You may want to have a country level split and know who is generating the data |
Update based on discussion with ECMWF:
Notes: the dataset-name (now removed) would be a composite of |
Update:
Building with surface observations as critical path:
where:
Example: |
That looks really helpful, but it does not have complete coverage: station_type only mentions obs. what about: nwp outputs, forecasts, synthetic satellite imagery? or text forecasts and warnings? Also, is RADAR landFixed? |
Good points @petersilva: land surface obs is our initial iteration. The tree will evolve as we build out NWP, forecast, satellite. |
If that is the hierarchy you want, then we can attempt to modify the tables in GTStoWIS2 to generate them for matching WMO386 AHL's. This is exactly the kind of feedback we have been trying to solicit to the proposed topic tree. |
In discussions in GTStoWIS (TT protocols team) we have agreed to omit file format (aka representation) as it was agreed that files should have, as is universal convention outside the weather world, appropriate file extensions. As the topics correspond to file folders, having type in the topic (which is also a folder) will result in all files having the representation twice. e.g:
The file type/data representation will unavoidably show up twice. Ideally, one looks up some form of information, say alerts, and the directory offers .cap files, .geojson files, and .crex files, and perhaps .txt ones. The committee considers the fact that data with different representations does not show up in the same directory currently to be a bug in our proposed topic tree. One that we have issues open to address ( e.g.: wmo-im/GTStoWIS2#55, wmo-im/GTStoWIS2#39 ) |
Update standup 2021-12-03:
|
For marine regions the high level geographies from https://www.marineregions.org/gazetteer.php?p=details&id=23616 may be useful. |
In terms of my comment, one of the big advantages of the WIS2.0 is that it opens up the system to observations from other communities. For example, if say an oceanographic data centre wanted to setup a WIS2.0 node and make data available via the node the proposed originating centre hierarchy may not be appropriate. |
Current state of directory structure (based on initial iteration of surface weather data):
Remaining points:
|
Table CCCC:
Table was built by converting the PDF online to a text file, transposing it, manual cleaning. followed by merging with the file from UCAR, and manual additions, as the data flowed for weeks, and we saw CCCC's show up that were unaccounted for. The flow is the normal bulleting flow of UCAR/UNIDATA. (deriving_CCCC subdir in GTStoWIS2 repo.) |
CCCC is just a GTS transition mechanism... The mapping being done by GTStoWIS2 module is to the "centre" field in the table, which is kind of simplified, lower-case but much more readable names. "AMRF" -> "melbourne_regional_forecasting_centre" ... This use of ascii constrained simplified place names, far more readable for most than the four letter CCCC's is invented as as part of and implicit in the GTStoWIS2 proposal. This proposal is made in the absence of a higher quality source. |
we could probably replace originating-center with originator. Just to clarify that it may be an entity that is not a center and that is not the previous existing table. I guess that we will need a controlled vocabulary for this and I would suggest it to go in codes.wmo.int |
@efucile are you pointing to adding to http://codes.wmo.int/common/centre ? or creating a new table? |
we need new tables for WIS2 |
cc @amilan17 In TT-WISMD, we will start working on WCMP 2.0 codelists at https://github.com/wmo-im/wcmp2-codelists, and will kick off this activity at our next meeting. |
Updated tree:
Closing this issue, given the implementation in wis2node proper. Formal codelist creation will be put forth at TT-WISMD. |
Notes based on 2021-12-15 discussion:
Next steps:
@petersilva / @efucile thinking more:
Example: Should we consider the data category higher up in the hierarchy, i.e.: Example
Example: Thoughts? |
in hierarchies, normally, each level of the hierarchy has "control" of lower levels of the hierarchy. This principle is expressed both in the OID mechanism ( discussed here: wmo-im/GTStoWIS2#37 ) and was given by @remygiraud as a constraint on the hierarchy. The country is supposed to be top... the national authority then can permit / assign / control / refuse the next level of the tree (centre names within the country.) Each centre has control of what it publishes under their centre-id. So country-code.centre-id was a starting point for our committee work. This control is also a kind of permission to write in the tree. it is natural to implement permissions that align with the hierarchy, but it is hard to see how it would be done if bulletins from every originator are scatterred throughout the tree. |
After discussion with @efucile, should we add a level to the topic hierarchy based on the WMO Unified Data Policy, so:
Example: Or is this a function of the discovery metadata evaluation (to assess whether to further bind to a resource)? |
I don't know enough about the Unified Data Policy to understand the implications. I suspect that "core/recommended/other" is a distinction that is not material to most subscribers, who will not know what it means, or why it matters. Questions:
|
Core data is described here: and resolution 1 / UDP here: The format or message type is not important by my reading and I would have thought (hoped) that the BUFR and geojson for the same observation would have the same classification, i.e. if the BUFR is classed as core data then the geojson should also be considered core. If this is not the case it then gets very messy. I would have thought the easiest way to flag/control would be to have the classification within the topic hierarchy. If we do this do we also need to consider other data licensing models, for example provision under one of the creative commons licenses? We considered this issue when making data available through the C3S data store, code table here: |
I don´t know that format is irrelevant... I gather (perhaps wrongly) that there are discussions between aviation and meteorological people, where the aviation community tends to want the Aviation XML to be limited access, perhaps only commercially available, and the Met community circulates BuFR obs publically with no restrictions for the same location. Bufr being harder for non-met users to deal with, both communities are happy. So I guess AvXML stuff would be non-core... From @david-i-berry´s links the core/etc... thing is more about distribution rights and requirements (must be available at no cost, vs. potentially restricted.) I don´t think most users care what the IP regime for data they are obtaining is (beyond whether they can access it or not.) Looking at the last link david provided... I guess we get a topic corresponding to "data policy license" with values like "Attribution-NonCommercial-ShareAlike-CC-BY-NC-SA" (likely shortened to CC-BY-NC-SA ?) to distinguish between data set licensing, and then potentially identical trees under them.. with different content depend on how each product is licensed. hmm... Is that what people intend? |
I agree with @david-i-berry. File format/representation should be independent from data management/identification. |
My takeaway from the WIS2node standup call today is that we only want to worry about WMO data and that other sources are out of scope. In defining the topic tree we don't want to include WIS but do want to include the data category / license. From the WMO Unified Data Policy we have the following definitions:
giving rise to two branches
and
For data with restrictions in the recommended branch those restrictions would be specified in the metadata and not the tree itself. |
@david-i-berry ... that sounds like what I heard...possible optimization: source in the above tree is "WIS" ... from the discussions, @efucile was advocating getting rid of WIS, so we could just promote 'core' and 'restricted' up three levels, and have... say... WMO-Core just under the date, and WMO-Recommended, WMO-Other... etc... we save a level in the tree... and still get the concept in there. |
sorry, I just noticed tree-type is separate from "recommended" or "core" ... perhaps I misunderstood... I initially thought they were the same thing... I don't know what tree-type is. I edited the previous comment... to omit discussion of tree-type. |
Implemented in initial iteration. Will evolve in parallel with direction from WMO topic hierarchy efforts. |
Create a storage directory structure for wis2node to support the various data/files managed within an installation.
The text was updated successfully, but these errors were encountered: