NIAC has decided about how to report errors/uncertainties #634

prjemian · 2018-10-25T18:17:57Z

The NIAC just decided, amongst the three possibilities in the current NXdata, how errors (uncertainties) should be reported.

The choice now is to use field(s) named VARIABLE_errors instead of errors or the @uncertainties attribute. The other two possibilities should be marked as deprecated with reference to the chosen method.

The documentation can point out that, for those files which use @uncertainties, a link can be created to satisfy this same choice. For example, NXcanSAS files would add I_errors --> Idev

The major problem with using attribute-based directives is that they do not propagate properly with externally-linked data. Similar treatment of local data and external links becomes a general motivation for not using attribute-based directives.

References:

2014 proposition
NIAC discussion
NXdata
NXcanSAS
add "uncertainties" attribute to fieldType in nxdl.xsd #532 (add attribute to nxdl.xsd)
Suggested improvements to the NXdata base class definition #602 (suggested improvements)
in NXdata, field "errors" should be "ERRORS" #633 (field errors should be ERRORS)
early description

The text was updated successfully, but these errors were encountered:

prjemian · 2018-10-25T18:39:16Z

Also, describe that this pattern can be used throughout the NeXus structure

prjemian · 2018-11-05T17:31:24Z

At least these files should be reviewed for consistency with this decision:

base_classes/NXdata.nxdl.xml
applications/NXcanSAS.nxdl.xml
manual/source/datarules.rst
manual/source/defs_intro.rst
manual/source/design.rst

toqduj · 2018-12-12T14:45:08Z

Well, that's a pretty horrid decision, and sets us back quite a while. I've had a chat with Tim at Diamond, and perhaps it's possible to summarize our @uncertainties proposal and search for multilateral support from Diamond and ESS.

vasole · 2018-12-12T17:08:57Z

@toqduj

Somethings to be considered:

The _errors approach was already a NeXus specification. It is not new. We would have preferred to use _uncertainties but it was already there and therefore we decided to keep the existing solution instead of creating yet another way to specify uncertainties.
Perhaps you consider it not horrid but it is certainly dangerous to add attributes to datasets because implies modification of the original data if they are externally linked.

toqduj · 2019-07-21T10:35:12Z

In small-angle scattering, we are extremely dependent on the uncertainties: these uncertainties define the quality of our measurement, need to be propagated through the data corrections, and have a direct impact on the accuracy (estimate) of the results. We have put a lot of thought into how these can be estimated and how they are to be propagated (and further estimates determined) through the correction procedures.

Defining uncertainties by a special ending to a dataset name, as opposed to using attributes is not a clean solution, as it limits the dataset namespace that can be used. It harkens back to using different file extensions for different purposes; a stopgap measure at best. With this rule, dataset names can not have _errors in the name as it has a special meaning. So having a dataset named "logging_errors" or "movement_errors" is suddenly dangerous as it might cause unexpected behavior in software that interprets it as the uncertainties of a dataset.

I understand that that was the previous definition, but this is (unfortunately) an argumentative fallacy: just because it has been done so in the past doesn't necessarily mean it is a good solution.

I don't quite understand the second point. The @uncertainties attribute would point to the dataset containing the most appropriate uncertainty estimate for the data in question (which can be internally or externally linked). If you have multiple estimates (which we normally have), it would point at your best estimate, but by no means needs to be the only estimate. If you use the @uncertainties attribute to your raw data uncertainties, how would that modify your original data?

In any case, for our purposes, the uncertainties attribute isn't so much intended for the raw data, but for corrected data (i.e. NXcanSAS).

The uncertainties attribute would be linked with a scaling attribute, would allow us to define the two main types of uncertainties in a clear yet flexible way. Imagine we have an array of corrected detector pixels A. These have been corrected by applying an identical scaling to all pixels (for example to normalize by thickness), and there are correction steps that are performed on a per-pixel basis (such as a flat-field correction).

These uncertainties cannot be combined into a single final uncertainty, if so done, the uncertainties of the thickness correction will swamp the flat-field uncertainties. They will also be now useless, as data fits should care about the inter-pixel uncertainty differences rather than the overall scaling factor. So we propose to use two attributes: @uncertainties and @scaling. This would work as follows:

The corrected data array A would have an @uncertainties attribute, defining the inter-pixel uncertainties. These can be used to fit the data to a model. If the data is in absolute units, @scaling would point at a scaling factor to be applied to array A. This scaling factor would have a nominal value of 1, but have its own @uncertainties attribute, pointing to the scaling uncertainty. For the parameters that are used in the data correction, these would carry one or both types of uncertainties (inter-pixel and/or scaling uncertainties), making it clear how they are to be propagated through the data correction steps.

When a model fit would derive particle size distributions, the uncertainties on the array A define the accuracy of the distribution modes. If it would also determine a volume fraction of particles, the accuracy of that would be determined by the scaling uncertainty.

With the uncertainty attribute, every scalar or array has the ability to point at its own best uncertainty estimates. These uncertainty scalars could then be further provided with metadata on how they were estimated, and point further at a collection of various uncertainty estimates from which the main uncertainty was derived.

mkoennecke · 2020-01-23T10:16:38Z

My 2c worth of opinion on this:

At NIAC2018 we decided upon the _errors scheme. Revising this would require a NIAC vote. The reasoning behind the vote was that we may not be able to apply an attribute to an externally linked dataset.

Logging_errors and movement_errors as dataset names are not standard NeXus.

One problem with NeXus is that there are always special cases and exceptions to consider. The art of Nexus, IMHO, then is to apply the 20-80 rule where you cover 80% of use cases with an easy rule like the one we have devised at NIAC2018.

And the NIAC has decided in some stage to allow almost anything in an application definition which has community support. Thus I believe there is leeway to cover that special use case in the NXcansas application definition. If the cansas community endorses it.

vasole · 2020-01-23T10:22:22Z

I would just like to add to @mkoennecke's comment that the _errors approach was already existing in NeXus and that it was not a newly introduced rule.

toqduj · 2020-02-28T07:18:16Z

Alright, so we can continue using the flexible @uncertainties attribute in NXcanSAS. One more comment on this issue though:

The reasoning behind the vote was that we may not be able to apply an attribute to an externally linked dataset.

Any operation that adds or modifies uncertainties to a given dataset should have the ability to access and modify the actual dataset as well. We are processing NeXus files into corrected NXcanSAS files, and calculating uncertainty estimates in the process. In the new files being generated, I cannot see why it would now be impossible to add an attribute to this data. The NIAC may be thinking of adding uncertainties post-hoc to NeXus datasets, but again, if you are adding to your raw data, you should make sure you have the necessary ability to do so.

vasole · 2020-08-26T21:20:39Z

I strongly disagree about the first sentence of your last paragraph. You can add or modify uncertainties without modifying the actual dataset by using new files and external links.

The uncertainties can be unknown or even wrong at the time of data collection. However, adding them as an attribute implies modifying an already acquired file that can be even archived and referenced with a DOI and whose integrity is warranted.

prjemian added definitions high priority NIAC has requested The NIAC has requested this issue to be considered documentation labels Oct 25, 2018

prjemian added this to the NXDL 2019.1 milestone Oct 25, 2018

prjemian self-assigned this Oct 25, 2018

prjemian mentioned this issue Oct 25, 2018

[NIAC 2018 Proposals] Clarify/generalize use of uncertainties nexusformat/NIAC#27

Closed

This was referenced Nov 5, 2018

adjust NXlog to be consistent with #634 #639

Closed

adjust NXdata to be consistent with #634 #640

Closed

prjemian added a commit that referenced this issue Nov 5, 2018

#634 NXdata

f582f23

prjemian added a commit that referenced this issue Nov 5, 2018

#634

a30df1f

prjemian mentioned this issue Sep 11, 2019

integrate NIAC decision on errors & uncertainties #685

Merged

prjemian mentioned this issue Jan 22, 2020

revise NXcanSAS regarding use DATASET_errors v. @uncertainties #709

Open

prjemian added a commit that referenced this issue Jan 23, 2020

TST #634 GitHub API creds changed

b58410c

prjemian closed this as completed in #685 Jan 23, 2020

prjemian mentioned this issue Jan 26, 2020

Revise release note scripting #735

Merged

vasole mentioned this issue Aug 26, 2020

Changes to definitions for NXmx Gold Standard. Includes changes to NXdl #793

Merged

This was referenced Oct 19, 2020

adjust NXmonochromator to be consistent with #634 #820

Closed

adjust NXsnsevent and NXsnshisto to be consistent with #634 #821

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NIAC has decided about how to report errors/uncertainties #634

NIAC has decided about how to report errors/uncertainties #634

prjemian commented Oct 25, 2018

prjemian commented Oct 25, 2018 •

edited

Loading

prjemian commented Nov 5, 2018

toqduj commented Dec 12, 2018

vasole commented Dec 12, 2018

toqduj commented Jul 21, 2019

mkoennecke commented Jan 23, 2020

vasole commented Jan 23, 2020 •

edited

Loading

toqduj commented Feb 28, 2020

vasole commented Aug 26, 2020

NIAC has decided about how to report errors/uncertainties #634

NIAC has decided about how to report errors/uncertainties #634

Comments

prjemian commented Oct 25, 2018

prjemian commented Oct 25, 2018 • edited Loading

prjemian commented Nov 5, 2018

toqduj commented Dec 12, 2018

vasole commented Dec 12, 2018

toqduj commented Jul 21, 2019

mkoennecke commented Jan 23, 2020

vasole commented Jan 23, 2020 • edited Loading

toqduj commented Feb 28, 2020

vasole commented Aug 26, 2020

prjemian commented Oct 25, 2018 •

edited

Loading

vasole commented Jan 23, 2020 •

edited

Loading