-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NIAC has decided about how to report errors/uncertainties #634
Comments
Also, describe that this pattern can be used throughout the NeXus structure |
At least these files should be reviewed for consistency with this decision:
|
Well, that's a pretty horrid decision, and sets us back quite a while. I've had a chat with Tim at Diamond, and perhaps it's possible to summarize our @uncertainties proposal and search for multilateral support from Diamond and ESS. |
Somethings to be considered:
|
In small-angle scattering, we are extremely dependent on the uncertainties: these uncertainties define the quality of our measurement, need to be propagated through the data corrections, and have a direct impact on the accuracy (estimate) of the results. We have put a lot of thought into how these can be estimated and how they are to be propagated (and further estimates determined) through the correction procedures. Defining uncertainties by a special ending to a dataset name, as opposed to using attributes is not a clean solution, as it limits the dataset namespace that can be used. It harkens back to using different file extensions for different purposes; a stopgap measure at best. With this rule, dataset names can not have _errors in the name as it has a special meaning. So having a dataset named "logging_errors" or "movement_errors" is suddenly dangerous as it might cause unexpected behavior in software that interprets it as the uncertainties of a dataset. I understand that that was the previous definition, but this is (unfortunately) an argumentative fallacy: just because it has been done so in the past doesn't necessarily mean it is a good solution. I don't quite understand the second point. The @uncertainties attribute would point to the dataset containing the most appropriate uncertainty estimate for the data in question (which can be internally or externally linked). If you have multiple estimates (which we normally have), it would point at your best estimate, but by no means needs to be the only estimate. If you use the @uncertainties attribute to your raw data uncertainties, how would that modify your original data? In any case, for our purposes, the uncertainties attribute isn't so much intended for the raw data, but for corrected data (i.e. NXcanSAS). The uncertainties attribute would be linked with a scaling attribute, would allow us to define the two main types of uncertainties in a clear yet flexible way. Imagine we have an array of corrected detector pixels A. These have been corrected by applying an identical scaling to all pixels (for example to normalize by thickness), and there are correction steps that are performed on a per-pixel basis (such as a flat-field correction). These uncertainties cannot be combined into a single final uncertainty, if so done, the uncertainties of the thickness correction will swamp the flat-field uncertainties. They will also be now useless, as data fits should care about the inter-pixel uncertainty differences rather than the overall scaling factor. So we propose to use two attributes: @uncertainties and @scaling. This would work as follows: The corrected data array A would have an @uncertainties attribute, defining the inter-pixel uncertainties. These can be used to fit the data to a model. If the data is in absolute units, @scaling would point at a scaling factor to be applied to array A. This scaling factor would have a nominal value of 1, but have its own @uncertainties attribute, pointing to the scaling uncertainty. For the parameters that are used in the data correction, these would carry one or both types of uncertainties (inter-pixel and/or scaling uncertainties), making it clear how they are to be propagated through the data correction steps. When a model fit would derive particle size distributions, the uncertainties on the array A define the accuracy of the distribution modes. If it would also determine a volume fraction of particles, the accuracy of that would be determined by the scaling uncertainty. With the uncertainty attribute, every scalar or array has the ability to point at its own best uncertainty estimates. These uncertainty scalars could then be further provided with metadata on how they were estimated, and point further at a collection of various uncertainty estimates from which the main uncertainty was derived. |
My 2c worth of opinion on this: At NIAC2018 we decided upon the _errors scheme. Revising this would require a NIAC vote. The reasoning behind the vote was that we may not be able to apply an attribute to an externally linked dataset. Logging_errors and movement_errors as dataset names are not standard NeXus. One problem with NeXus is that there are always special cases and exceptions to consider. The art of Nexus, IMHO, then is to apply the 20-80 rule where you cover 80% of use cases with an easy rule like the one we have devised at NIAC2018. And the NIAC has decided in some stage to allow almost anything in an application definition which has community support. Thus I believe there is leeway to cover that special use case in the NXcansas application definition. If the cansas community endorses it. |
I would just like to add to @mkoennecke's comment that the _errors approach was already existing in NeXus and that it was not a newly introduced rule. |
Alright, so we can continue using the flexible @uncertainties attribute in NXcanSAS. One more comment on this issue though:
Any operation that adds or modifies uncertainties to a given dataset should have the ability to access and modify the actual dataset as well. We are processing NeXus files into corrected NXcanSAS files, and calculating uncertainty estimates in the process. In the new files being generated, I cannot see why it would now be impossible to add an attribute to this data. The NIAC may be thinking of adding uncertainties post-hoc to NeXus datasets, but again, if you are adding to your raw data, you should make sure you have the necessary ability to do so. |
I strongly disagree about the first sentence of your last paragraph. You can add or modify uncertainties without modifying the actual dataset by using new files and external links. The uncertainties can be unknown or even wrong at the time of data collection. However, adding them as an attribute implies modifying an already acquired file that can be even archived and referenced with a DOI and whose integrity is warranted. |
The NIAC just decided, amongst the three possibilities in the current NXdata, how errors (uncertainties) should be reported.
The choice now is to use field(s) named
VARIABLE_errors
instead oferrors
or the@uncertainties
attribute. The other two possibilities should be marked as deprecated with reference to the chosen method.The documentation can point out that, for those files which use
@uncertainties
, a link can be created to satisfy this same choice. For example, NXcanSAS files would addI_errors --> Idev
The major problem with using attribute-based directives is that they do not propagate properly with externally-linked data. Similar treatment of local data and external links becomes a general motivation for not using attribute-based directives.
References:
errors
should beERRORS
)The text was updated successfully, but these errors were encountered: