-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First submission for NUrF at LoKI #129
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution, you have obviously put a lot of effort into this 👍
See comments and questions below. If have at this point not looked into all the details of the more specific functions since I anticipate that they may change significantly if/once some of my more general comments are addressed.
docs/instruments/loki/nurf/nurf.py
Outdated
print("This is scippneutron ", scn.__version__) | ||
print("This is scippnexus ", snx.__version__) | ||
print("This is scipp ", sc.__version__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not have print
statements at module-level. If output is needed we should use the logging facilities. I would not advocate logging the versions though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The versions get logged when you configure the ess
logger for a workflow. Have a look at cell 2 in https://scipp.github.io/ess/instruments/amor/amor_reduction.html
But this should be done in a workflow, not in a module like nurf.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I remove them. It was more for myself. I am not familiar with logging facilities, but would like to see them.
docs/instruments/loki/nurf/nurf.py
Outdated
def complete_fname(scan_numbers): | ||
"""Converts a list of input numbers to a filename uses at ILL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this relevant for ESS? If it is just for testing now while you have sample files from ILL I think we should consider moving it elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. That is a function I wrote for my ILL test data. I could put all ILL related functions in a different .py file.
docs/instruments/loki/nurf/nurf.py
Outdated
|
||
Parameters | ||
---------- | ||
scan_numbers: list of int or a single int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using a Sphinx extension so the type information here is handled automatically based on type-hints. That is, here we should just have:
scan_numbers: list of int or a single int | |
scan_numbers: |
and the corresponding type hints above needs to be written.
For reference, see
Line 45 in d08178f
'sphinx_autodoc_typehints', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. All data types will go into the function call and an empty line (apart from the explanation) remains in the docstring.
docs/instruments/loki/nurf/nurf.py
Outdated
|
||
Returns: | ||
---------- | ||
flist_num: list of str or one str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See above, do not type info include in the docstring, use type hints and our extension does the rest.
flist_num: list of str or one str | |
flist_num: |
See also https://scipp.github.io/reference/developer/coding-conventions.html#docstrings.
docs/instruments/loki/nurf/nurf.py
Outdated
#TODO: wpotrzebowski points out that dark, reference are also data. Should we convert the entry into the proposed LoKI.nxs to 'is_sample'? | ||
data = da[da.coords["is_data"]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_sample
sounds ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. It will be is_sample
in the next iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this, but then I got into trouble while loading the file. It suddenly asked for other parameters not applicable to my case. For the moment, I will keep is_data
for i in range(num_spec): | ||
# start parameters | ||
p0 = [b0[i], m0[i]] | ||
popt, pcov = leastsq( | ||
residual, | ||
p0, | ||
args=( | ||
uv_da_filt.coords["wavelength"].values, | ||
uv_da_filt["spectrum", i].values, | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use scipp.optimize.curve_fit
(which uses scipy) instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure. I had a look at the scipy.optimize.curve_fit
function while writing my function. But I need here to minimise the difference between the measured spectral data and the turbidity that is fitted. I was not able to realize this with the curve_fit
function, hence I chose scipy.optimize.leastsq
.
# Switch on plots for verification | ||
if plot_corrections: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having such options on a function is often not a good design choice. Is this necessary? Can it be moved to a separate function that can be called on the return value if this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose it to declutter the output in the notebook. Sometimes it is useful to see the in-between plotting steps, sometimes not. How would you do this, Simon, please? Would you have a dummy code for a design pattern by chance?
And yes, I agree with your other comment - to separate processing tools from plotting workflows.
display(multi_uv_turb_corr_da) | ||
|
||
fig, ax = plt.subplots(ncols=3, figsize=(21, 7)) | ||
legend_props = {"show": True, "loc": 1} | ||
num_spectra = multi_uv_turb_corr_da.sizes["spectrum"] | ||
|
||
out = sc.plot( | ||
sc.collapse(multi_uv_turb_corr_da, keep="wavelength"), | ||
grid=True, | ||
ax=ax[0], | ||
legend=legend_props, | ||
title=f"All turbidity corrected UV spectra for {num_spectra} spectra", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned above, I think we should avoid mixing the "business logic" (such as fitting) from visualization/plotting. Keep in mind that we may want to run the processing code in script instead of a notebook, and we don't want to create plots then (or plot to files). Can you find a way to disentangle all the code that creates outputs (HTML or plots) from the rest? This applies to many functions here, not just this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree to remove the business (fitting, reducing, normalising, regrouping) from the plotting business. I might have to repat some code.
Yes, I know it applies to many functions I wrote. The plots were helpful for visual inspection as well.
assert ( | ||
wllim < wulim | ||
), "Lower wavelength limit needs to be smaller than upper wavelength limit" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this error is meant for the user, raise an exception instead of using assert
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure, it can be if I want. At least, it makes the user aware of his wrong choice.
What is the rule, please? assert
has the possibility to break the code and I was taught that if the code breaks in a controlled way, it is a good thing.
I don't know when to go for raise ValueError
or assert
.
General suggestion: Have you considering submitting the main "logic part" of this as a separate PR? That is, exclude all the plotting and debugging stuff, just keep the actual code for loading, normalization, filtering, fitting, etc.. This would make it easier to bring it into a state everyone is happy with to include in the package. You can then, as a second step, work out which plotting utilities and helpers the user/scientist may need and do all that in follow-up work. |
…o the proposed new NeXus format for LoKI
Yes, it occurred to me while writing the functions. I am constantly torn between writing code "quick and dirty" (to quote somebody else) and writing better code following good programming standards, at least I am trying. The idea was first to show the functionality/workflow and get an idea on the data. Please don't forget my data comes from an experiment where not everything has worked technically in a good way. That affected the datasets (=multiple nxs files). Priority was on the beamtime, and resources to support the implementation of the spectrometer were limited. That will be in the future a task for ESS maybe to ensure that data from the spectrometers are correctly recorded and handles errors. You and the others commentators are the first people, I can discuss my code with in terms of code design/strategy. |
@gudlot Yes, I know it is ILL data but I think there was typo in the doc string ("at" missing), which I think has been corrected now. |
I think it is mostly the question for @nvaytet. Sorry if I missed this in conversation but is there an intention to keep all files (including python files) in docs/instruments/loki/nurf/? As far as I know NURF is avaialable both at ISIS and ILL and I think in the future they may also be interested in this scripts in data processing. It can always be addressed later but It may worth to think about it now. |
|
||
def nurf_file_creator(loki_file, path_to_loki_file, data): | ||
""" | ||
Appends NUrF group to LOKI NeXus file for ESS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the way NUrF data will always be added to the LoKI files? Or is this a temporary way to do things until all the data is properly stored in the Nexus files?
If the latter, I am not sure this should become part of the ess
package. To me it looks like a (still experimental?) utility script that patches the data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I started this project, we discussed how the Nurf data could be part of a Loki file. I appended my content to an example Loki file. At the moment it is more of a temporary way and my current nxs files contain only the uv and fluo relevant parts, nothing else. But for Kenan I append my data to the Loki file.
No, it does not have to become part of any package. Please consider all functions in this file as helper functions that I needed to convert my data into a better format.
wllim=300, | ||
wulim=400, | ||
wl_unit=None, | ||
medfilter=True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You seem to have several functions that say they do one thing, "keep the data between min and max wavelength" in this case, but are actually doing more under the hood, since you can also apply a mean filter.
This is usually a sign that you should either make two functions and call both one after the other, or change the function name.
In this case, I feel that selecting a wavelength range is quite different from applying a filter. So I would vote for two functions.
Try to have a look at other places where you are doing something similar, and change accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. My thoughts were that using a medfilter is optional, depending on the data quality. Sometimes the data has a lot of outliners. With the medfilter I smooth it first and then I select the maximum intensity in a given wavelength range. If there would be an outliner, that could be misinterpreted.
I could separate them into functions, but probably would have a third to bring the two together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote now for utils.py
a little wrapper nurf_median_filter(da,kernel_size)
to apply the scipp.ndimage.median_filter
in direction of the the wavelength
.
@@ -0,0 +1,1892 @@ | |||
# standard library imports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As Simon suggested, we need to split things up before we can move forward here.
Splitting up the processing from the plotting is a good start. I think I would like to break it up even further.
You seem to have two different workflows, UV and fluo. Are the two related (i.e. you cannot process one without the other?), or could you run them independently? Basically, could you submit changes for just one of these first (e.g. just the UV?), and once we nail the design, the general layout could be copied for processing the other one?
My suggestion:
- Identify the minimal list of steps you need to do for a workflow (I think for now we should just start with processing and leave exporting data to ascii out of this PR? Maybe things like fitting can also be added later?)
- Ideally, your notebook should just read like a list of function calls from the
ess.loki.nurf
module. - Move the
nurf.py
file out of the docs and into its own submodule iness.loki
(i.e. in a folder calledsrc/ess/loki/nurf
) - Strip out all the plotting code for now, and just have the list of processing steps in a notebook that you put in the docs
- Add some final simple plotting in the notebook for now (we will add plotting of intermediate steps later)
- Note that if a user wants a highly customized figure made for a paper, it will not be stored here. We will just have simple plots
Don't do all this in one go. Start with item 1. here, once we have a list, we can look at the other items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much.
You seem to have two different workflows, UV and fluo. Are the two related (i.e. you cannot process one without the other?), or could you run them independently?
Yes, there are two workflows, one for the UV spectroscopy one for the fluo spectroscopy. I can process each without the other. Only later in the data interpretation, both will be together as they are complementary.
Basically, could you submit changes for just one of these first (e.g. just the UV?),
Does it mean in two or even three separated python files? UV, fluo, medfilter (because this is a function that can be applied to both workflows?)
and once we nail the design,
Should this be graphically outlined?
the general layout could be copied for processing the other one?
No, the workflows for UV and fluo are really different and extract different parts of each spectrum.
- Identify the minimal list of steps you need to do for a workflow (I think for now we should just start with processing and leave exporting data to ascii out of this PR? Maybe things like fitting can also be added later?)
Ok to the minimal list of steps.
Re: export to ascii: This is a wish of the user. The idea behind was if the user needs to further process the data with another program that cannot handle hdf5. The users wishes to have access to all raw data and processed data.
Re fitting: Turbidity is a standard correction in UV spectroscopy I understood
- Ideally, your notebook should just read like a list of function calls from the ess.loki.nurf module.
What will be part then of the nurf module? Will it be composed of UV, fluo, plotting?
Will the plot functions not be part of ess.loki.nurf
? They are also part of the data interpretation/visualisation and an important part during the experiment.
- Strip out all the plotting code for now, and just have the list of processing steps in a notebook that you put in the docs
Okay, that can be done.
- Add some final simple plotting in the notebook for now (we will add plotting of intermediate steps later)
What would be considered a final simple plot? UV signal as a function of wavelength, maybe processed? Same for fluo spectra?
- Note that if a user wants a highly customized figure made for a paper, it will not be stored here. We will just have simple plots
The user does not request highly customized figures. The plots that I generated are standard plots for UV, fluo spectroscopy. They are just a bit more complicated than an I(q) plot for example. The idea is to measure a perturbation, record SANS, fluo, UV along with it and show this during the experiment to the user.
Don't do all this in one go. Start with item 1. here, once we have a list, we can look at the other items.
Okay. I could provide a workflow list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments.
For the file/module organisation, I think separating uv, fluo and plotting is good.
So something like:
src/ess/loki/
|--> nurf/
|--> __init__.py
|--> fluo.py
|--> uv.py
|--> tools.py # or utils.py, or common.py, however you want to name it
|--> plot.py
?
We would start with either fluo.py
or uv.py
, probably with the tools.py
as well, and then add the rest later.
Should this be graphically outlined?
No need to draw things, just outlining a list of steps or function calls should be fine.
@gudlot What is the status on this? Can I help you with something to get this moving again? |
This is my first submission for the NUrF module. Work in progress.
nurf.py
contains all functions andscipp_947.ipynb
contains a Jupyter notebook.This is a PR to open the discussion on code for the NUrF module.
New_fluo_plot_function
is my current feature development branch.