-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Character set permitted for variable and attribute names. #307
Comments
As far as I can tell there is an implicit restriction on the use of spaces, as the To indicate a unit, as in your example, the appropriate attribute should be used, CF provides the mechanisms for that. It is also a usability issue, as some characters may be hard to type, or be easy to confuse with characters in this range (o or ο, spot the difference). |
I would agree that it would be nice if any variable name found in a CF-compliant file were able to be adopted without modification in the programming languages used in climate science. This would facilitate code documentation and generally make it easier for users, I think. So I support the view expressed by @MaartenSneepKNMI and would restrict the character set. Don't know if UTF8 is the right set. |
The bit pattern of UTF-8 and ASCII overlap for the characters that I list as available for variable names, so there is no distinction. I don't think any choice other than UTF-8 should be made at this point in time. |
UTF-8 is not a character set, it is an encoding for unicode, how you actually store these names in the files is specified by the CDM Identifiers section and cannot be decided by CF. Since CF doesn't support string attributes yet and given how some libraries interact with string attributes (e.g. netcdf4 python will force a string attribute if the text attribute cannot be converted to ASCII). The implicit and in practice restriction is that variable names are restricted to unicode points lower than U+007F (i.e. ASCII) if their name is going to appear in a CF standardized attribute. I think CF should only go so far as to warn about this limitation for names which will appear in these attributes, but not care beyond that. |
Just to be clear, I wasn't suggesting that I tend to agree with @MaartenSneepKNMI and @taylor13 : the point that inclusion of spaces in variable break names would break other parts of the convention is a good one. I support the restriction proposed by @MaartenSneepKNMI above. This matches, I think, the approach used in all the CDL examples in the current convention. |
I will also second @MaartenSneepKNMI's recommendation to restrict legal CF variable names to ones that could be used as variable names in important programming languages. It enables some very useful programming paradigms. I think it would also be good to mention that reasoning explicitly in the text describing the restriction. |
@martinjuckes, CF section 2.3 says:
However, your opening statement seems to contradict:
What is your interpretation? Is section 2.3 advisory only, or did you miss that? |
@Dave-Allured : sorry, my mistake. Apologies for a unneeded discussion. |
@martinjuckes, I refer you to #237, Remove restrictions on netCDF object names. Please support that proposal, and further discussion there. |
The CF Convention does not impose any restrictions on the character set used in variable and attribute names. I have been, for a long time, working on the assumption that there were restrictions inherited from NetCDF, but that is no longer the case. The current NUG states that any UTF8 characters other than
/
can be used. This means that, for instance, a variable could be namedTemperature (°C)
, as in the following code fragment:Should CF impose some restriction, or is it OK to use the whole range of UTF8 in variable names?
The text was updated successfully, but these errors were encountered: