Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spectrum reading to FormatNXmx and fix noncontiguous mask arrays #538

Merged
merged 14 commits into from
Sep 10, 2022

Conversation

phyy-nx
Copy link
Contributor

@phyy-nx phyy-nx commented Aug 9, 2022

This is mostly a port (copy) of the spectrum reading code from FormatNexus. I imagine we'll want to eliminate one of the two copies at some point.

@rjgildea this was my first time reading FormatNXmx.py and nexus/nxmx.py in detail. I like the H5Mapping class, which maps strings as handle names to h5 fields while doing unit conversion. However it breaks down for the variants available for incident_wavelength that we are using for SwissFEL data. These data have both a calibrated wavelength derived (I think) from analyzing the spectrum and doing some fitting, and they also have the full spectra available. Therefore, the way I set this up in NeXus is using these three datasets:

/entry/instrument/beam   Group
/entry/instrument/beam/incident_wavelength Dataset {1448}
/entry/instrument/beam/incident_wavelength_1Dspectrum Dataset {2560}
/entry/instrument/beam/incident_wavelength_1Dspectrum_weight Dataset {1448, 2560}

The dataset has 1448 images and the calibrated wavelengths are in incident_wavelength. That field has an attribute variant=incident_wavelength_1Dspectrum, which represents the spectrum that the wavelengths were derived from, one for each image. For each spectrum, the 2560 channels are the same. Therefore the weights array for the spectra is of shape Nimages, Nchannels.

This is implemented in FormatNexus by checking for the different permutations available for 0D, 1D and 2D wavelength arrays with possible variants. Note however that the key incident_wavelength_1Dspectrum isn't in the NXmx spec, as it's just presumed that this is how variants work. Because of this we can't use a direct mapping like the H5Mapping class does.

This is a draft PR because things are not quite working yet. This is fine:

import dxtbx
img = dxtbx.load("run_000795.JF07T32V01_master_spectrum.h5")
print(img.get_beam())
print(img.get_spectrum())

Output:

Beam:
    wavelength: 1.30886
    sample to source direction : {0,0,1}
    divergence: 0
    sigma divergence: 0
    polarization normal: {0,1,0}
    polarization fraction: 0.999
    flux: 0
    transmission: 1

Spectrum:
    weighted wavelength: 1.30791

Note the beam wavelength comes from the calibrated array, which is different than the weighted wavelength from the spectrum.

The image viewer is failing with

ValueError: Please report this error to [email protected]: Not a location (invalid object ID)

when reading incident_wavelength from nxbeam.handle. Inspecting the handle it is being reported as closed. Is CachedWavelengthBeamFactory closing the handle somewhere that I can't see?

Thanks, and feel free to revise the PR to better match FormatNXmx as desired (especially dropping my use of convert_units from FormatNexus). I'll try to get SwissFEL test data with spectra uploaded to Zenodo tomorrow.

When merged, this will close #537.

@phyy-nx phyy-nx requested a review from rjgildea August 9, 2022 01:11
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 11, 2022

Ran out of time to get the SwissFEL data with spectra uploaded. I'll try again early next week.

@rjgildea
Copy link
Contributor

to get SwissFEL test data with spectra uploaded to Zenodo tomorrow.

Would it be possible to construct a self-contained minimal example in the tests, as done e.g. here? We've tried where possible to make sure that the core functionality of the new NXmx code is covered by self-contained test that don't have dependencies on a huge array of different datasets. Although it would still be useful to have a full dataset on Zenodo for full testing and to better understand what the file structure looks like.

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 18, 2022

From Dials core: found the cache that was causing the handle to be closed. Should be fixable. Also, change the variant reading to return the newest version of the spectrum, not the oldest.

phyy-nx added a commit that referenced this pull request Aug 19, 2022
Requires #538 to be merged before it can be marked as passing

Fixes #93
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 19, 2022

This PR now depends on #541 as it modifies the xfailing test added by #541.

phyy-nx added a commit to dials/data that referenced this pull request Aug 23, 2022
Dataset reimported using new off by one pixel correction fix in FormatNexus from cctbx/dxtbx#538
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 23, 2022

@ndevenish @rjgildea re-reading the original code, I was wrong about the variant behavior. It bails out as soon as it finds a valid set of spectra. It doesn't go to the next one in the chain. So that's not a concern.

Additionally, I've uploaded a 4 image dataset with spectra here:
https://doi.org/10.5281/zenodo.7011529

@phyy-nx phyy-nx marked this pull request as ready for review August 24, 2022 00:02
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 24, 2022

Ok this is ready for review. Here is the order of things to review and merge.

  1. Review and merge Add 4 image SwissFEL JF16M dataset dials/data#389. This adds 2 datasets to dials.data, a simple SwissFEL 16M dataset, and a more complex one with spectra.
  2. Review and merge Add xfailing test for the JF16M #541. Tests only the first 16M dataset.
  3. Review and merge this PR. Fixes issues reading the 16M dataset and adds spectra support, with in-memory spectra tests.

Only thing left to consider is adding a test to specifically test the second Zenodo dataset, with spectra. That would add about 200 MB between the two tests to an Azure footprint. Might be better to switch the test added in #541 to use the second Zenodo dataset. But regardless, these PRs are ready for review.

Thanks folks.

@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #538 (6f7a104) into main (757b060) will increase coverage by 0.18%.
The diff coverage is 80.61%.

@@            Coverage Diff             @@
##             main     #538      +/-   ##
==========================================
+ Coverage   40.67%   40.85%   +0.18%     
==========================================
  Files         176      176              
  Lines       15552    15613      +61     
  Branches     2806     2818      +12     
==========================================
+ Hits         6325     6378      +53     
- Misses       8664     8666       +2     
- Partials      563      569       +6     

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Aug 25, 2022

@ndevenish not sure what these MacOS test failures are about. Maybe it just needs to be re-run?

phyy-nx added a commit to dials/data that referenced this pull request Sep 8, 2022
* Add 4 image SwissFEL JF16M dataset

* Add lysozyme_JF16M_4img imported.expt file

Made using FormatNexus as a baseline to compare to FormatNXmx

* Update to new version

Dataset reimported using new off by one pixel correction fix in FormatNexus from cctbx/dxtbx#538

* Add further JF16M dataset, this one with per-shot spectra
This is mostly a port (copy) of the spectrum reading code from FormatNexus.  I imagine we'll want to eliminate one of the two copies at some point.
Needed since new implementation of beam/spectra needs access to the file handle outside of get_raw_data
Test now reads the JF16M data using FormatNXmx, but there is a one pixel offset compared to FormatNexus
Two problems here when constructing the panel object as a leaf of the detector hierarchy using the fast_pixel_direction NeXus field. The fast and slow directions are ready out correctly, but the origin was read out using get_cumulative_change_of_basis in order to include any offsets.  However, since the pixel size is the value of this field, it was included as an offset, leading to an extra pixel added in the fast direction. Additionally, get_cumulative_change_of_basis relies on the equipment_component chain, which fast_pixel_direction doesn't use, so only the fast_pixel_direction offset was picked up.  This is why there isn't an extra pixel in the slow direciton.

Fix is to not include fast_pixel_direction field value when reading out the offset and to explicitly also read the slow_pixel_direction offset.
Off-by-one error fixed in in FormatNexus
phyy-nx and others added 4 commits September 9, 2022 16:34
Uses the in memory version of testing. Tests both a single spectra for a dataset and a set of 3 spectra using variants derived from calibrated energies.
@phyy-nx
Copy link
Contributor Author

phyy-nx commented Sep 9, 2022

Rebased onto main to pick up #541

@phyy-nx
Copy link
Contributor Author

phyy-nx commented Sep 9, 2022

Again, I think the build failure is the same one that @ndevenish reported in #548 (comment) which is known and unrelated so I presume this is safe to merge

Merging after confirming this is ok with @graeme-winter. Thanks everyone for the review on this.

@phyy-nx phyy-nx merged commit effd664 into main Sep 10, 2022
@phyy-nx phyy-nx deleted the nxmx_spectra branch September 10, 2022 00:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Errors reading SwissFEL JF16M NeXus data
3 participants