Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First submission for NUrF at LoKI #129

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

gudlot
Copy link

@gudlot gudlot commented Jun 21, 2022

This is my first submission for the NUrF module. Work in progress. nurf.py contains all functions and scipp_947.ipynb contains a Jupyter notebook.
This is a PR to open the discussion on code for the NUrF module. New_fluo_plot_function is my current feature development branch.

Copy link
Member

@SimonHeybrock SimonHeybrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, you have obviously put a lot of effort into this 👍

See comments and questions below. If have at this point not looked into all the details of the more specific functions since I anticipate that they may change significantly if/once some of my more general comments are addressed.

Comment on lines 22 to 24
print("This is scippneutron ", scn.__version__)
print("This is scippnexus ", snx.__version__)
print("This is scipp ", sc.__version__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have print statements at module-level. If output is needed we should use the logging facilities. I would not advocate logging the versions though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The versions get logged when you configure the ess logger for a workflow. Have a look at cell 2 in https://scipp.github.io/ess/instruments/amor/amor_reduction.html

But this should be done in a workflow, not in a module like nurf.py.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I remove them. It was more for myself. I am not familiar with logging facilities, but would like to see them.

Comment on lines 27 to 28
def complete_fname(scan_numbers):
"""Converts a list of input numbers to a filename uses at ILL.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this relevant for ESS? If it is just for testing now while you have sample files from ILL I think we should consider moving it elsewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. That is a function I wrote for my ILL test data. I could put all ILL related functions in a different .py file.


Parameters
----------
scan_numbers: list of int or a single int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using a Sphinx extension so the type information here is handled automatically based on type-hints. That is, here we should just have:

Suggested change
scan_numbers: list of int or a single int
scan_numbers:

and the corresponding type hints above needs to be written.

For reference, see

'sphinx_autodoc_typehints',
for the extension.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. All data types will go into the function call and an empty line (apart from the explanation) remains in the docstring.


Returns:
----------
flist_num: list of str or one str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above, do not type info include in the docstring, use type hints and our extension does the rest.

Suggested change
flist_num: list of str or one str
flist_num:

See also https://scipp.github.io/reference/developer/coding-conventions.html#docstrings.

Comment on lines 74 to 75
#TODO: wpotrzebowski points out that dark, reference are also data. Should we convert the entry into the proposed LoKI.nxs to 'is_sample'?
data = da[da.coords["is_data"]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_sample sounds ok to me.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. It will be is_sample in the next iteration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but then I got into trouble while loading the file. It suddenly asked for other parameters not applicable to my case. For the moment, I will keep is_data

Comment on lines +1344 to +1354
for i in range(num_spec):
# start parameters
p0 = [b0[i], m0[i]]
popt, pcov = leastsq(
residual,
p0,
args=(
uv_da_filt.coords["wavelength"].values,
uv_da_filt["spectrum", i].values,
),
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use scipp.optimize.curve_fit (which uses scipy) instead?

Copy link
Author

@gudlot gudlot Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. I had a look at the scipy.optimize.curve_fit function while writing my function. But I need here to minimise the difference between the measured spectral data and the turbidity that is fitted. I was not able to realize this with the curve_fit function, hence I chose scipy.optimize.leastsq.

Comment on lines +1383 to +1384
# Switch on plots for verification
if plot_corrections:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having such options on a function is often not a good design choice. Is this necessary? Can it be moved to a separate function that can be called on the return value if this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose it to declutter the output in the notebook. Sometimes it is useful to see the in-between plotting steps, sometimes not. How would you do this, Simon, please? Would you have a dummy code for a design pattern by chance?
And yes, I agree with your other comment - to separate processing tools from plotting workflows.

Comment on lines +1515 to +1527
display(multi_uv_turb_corr_da)

fig, ax = plt.subplots(ncols=3, figsize=(21, 7))
legend_props = {"show": True, "loc": 1}
num_spectra = multi_uv_turb_corr_da.sizes["spectrum"]

out = sc.plot(
sc.collapse(multi_uv_turb_corr_da, keep="wavelength"),
grid=True,
ax=ax[0],
legend=legend_props,
title=f"All turbidity corrected UV spectra for {num_spectra} spectra",
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I think we should avoid mixing the "business logic" (such as fitting) from visualization/plotting. Keep in mind that we may want to run the processing code in script instead of a notebook, and we don't want to create plots then (or plot to files). Can you find a way to disentangle all the code that creates outputs (HTML or plots) from the rest? This applies to many functions here, not just this one.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree to remove the business (fitting, reducing, normalising, regrouping) from the plotting business. I might have to repat some code.
Yes, I know it applies to many functions I wrote. The plots were helpful for visual inspection as well.

docs/instruments/loki/nurf/nurf.py Outdated Show resolved Hide resolved
Comment on lines +1627 to +1629
assert (
wllim < wulim
), "Lower wavelength limit needs to be smaller than upper wavelength limit"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this error is meant for the user, raise an exception instead of using assert.

Copy link
Author

@gudlot gudlot Jun 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure, it can be if I want. At least, it makes the user aware of his wrong choice.
What is the rule, please? assert has the possibility to break the code and I was taught that if the code breaks in a controlled way, it is a good thing.
I don't know when to go for raise ValueError or assert.

@SimonHeybrock
Copy link
Member

General suggestion: Have you considering submitting the main "logic part" of this as a separate PR? That is, exclude all the plotting and debugging stuff, just keep the actual code for loading, normalization, filtering, fitting, etc.. This would make it easier to bring it into a state everyone is happy with to include in the package. You can then, as a second step, work out which plotting utilities and helpers the user/scientist may need and do all that in follow-up work.

@gudlot
Copy link
Author

gudlot commented Jun 22, 2022

General suggestion: Have you considering submitting the main "logic part" of this as a separate PR? That is, exclude all the plotting and debugging stuff, just keep the actual code for loading, normalization, filtering, fitting, etc.. This would make it easier to bring it into a state everyone is happy with to include in the package. You can then, as a second step, work out which plotting utilities and helpers the user/scientist may need and do all that in follow-up work.

Yes, it occurred to me while writing the functions. I am constantly torn between writing code "quick and dirty" (to quote somebody else) and writing better code following good programming standards, at least I am trying. The idea was first to show the functionality/workflow and get an idea on the data. Please don't forget my data comes from an experiment where not everything has worked technically in a good way. That affected the datasets (=multiple nxs files). Priority was on the beamtime, and resources to support the implementation of the spectrometer were limited. That will be in the future a task for ESS maybe to ensure that data from the spectrometers are correctly recorded and handles errors. You and the others commentators are the first people, I can discuss my code with in terms of code design/strategy.
My idea would be to do a process function, return a dataarray or dataset (or a function layer in between that gathers the values from multiple files) and build around it the corresponding plot function. I would name all plot functions with plot_ in the beginning.
Thank you very much for the discussion, the helpful comments, and your patience. I try my best to improve and learn.

@wpotrzebowski
Copy link
Contributor

used at ILL

@gudlot Yes, I know it is ILL data but I think there was typo in the doc string ("at" missing), which I think has been corrected now.

@wpotrzebowski
Copy link
Contributor

wpotrzebowski commented Jun 23, 2022

I think it is mostly the question for @nvaytet. Sorry if I missed this in conversation but is there an intention to keep all files (including python files) in docs/instruments/loki/nurf/? As far as I know NURF is avaialable both at ISIS and ILL and I think in the future they may also be interested in this scripts in data processing. It can always be addressed later but It may worth to think about it now.


def nurf_file_creator(loki_file, path_to_loki_file, data):
"""
Appends NUrF group to LOKI NeXus file for ESS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the way NUrF data will always be added to the LoKI files? Or is this a temporary way to do things until all the data is properly stored in the Nexus files?
If the latter, I am not sure this should become part of the ess package. To me it looks like a (still experimental?) utility script that patches the data?

Copy link
Author

@gudlot gudlot Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I started this project, we discussed how the Nurf data could be part of a Loki file. I appended my content to an example Loki file. At the moment it is more of a temporary way and my current nxs files contain only the uv and fluo relevant parts, nothing else. But for Kenan I append my data to the Loki file.
No, it does not have to become part of any package. Please consider all functions in this file as helper functions that I needed to convert my data into a better format.

wllim=300,
wulim=400,
wl_unit=None,
medfilter=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You seem to have several functions that say they do one thing, "keep the data between min and max wavelength" in this case, but are actually doing more under the hood, since you can also apply a mean filter.
This is usually a sign that you should either make two functions and call both one after the other, or change the function name.
In this case, I feel that selecting a wavelength range is quite different from applying a filter. So I would vote for two functions.

Try to have a look at other places where you are doing something similar, and change accordingly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. My thoughts were that using a medfilter is optional, depending on the data quality. Sometimes the data has a lot of outliners. With the medfilter I smooth it first and then I select the maximum intensity in a given wavelength range. If there would be an outliner, that could be misinterpreted.
I could separate them into functions, but probably would have a third to bring the two together.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote now for utils.py a little wrapper nurf_median_filter(da,kernel_size) to apply the scipp.ndimage.median_filter in direction of the the wavelength.

@@ -0,0 +1,1892 @@
# standard library imports
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Simon suggested, we need to split things up before we can move forward here.
Splitting up the processing from the plotting is a good start. I think I would like to break it up even further.

You seem to have two different workflows, UV and fluo. Are the two related (i.e. you cannot process one without the other?), or could you run them independently? Basically, could you submit changes for just one of these first (e.g. just the UV?), and once we nail the design, the general layout could be copied for processing the other one?

My suggestion:

  1. Identify the minimal list of steps you need to do for a workflow (I think for now we should just start with processing and leave exporting data to ascii out of this PR? Maybe things like fitting can also be added later?)
  2. Ideally, your notebook should just read like a list of function calls from the ess.loki.nurf module.
  3. Move the nurf.py file out of the docs and into its own submodule in ess.loki (i.e. in a folder called src/ess/loki/nurf)
  4. Strip out all the plotting code for now, and just have the list of processing steps in a notebook that you put in the docs
  5. Add some final simple plotting in the notebook for now (we will add plotting of intermediate steps later)
  6. Note that if a user wants a highly customized figure made for a paper, it will not be stored here. We will just have simple plots

Don't do all this in one go. Start with item 1. here, once we have a list, we can look at the other items.

Copy link
Author

@gudlot gudlot Jun 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much.

You seem to have two different workflows, UV and fluo. Are the two related (i.e. you cannot process one without the other?), or could you run them independently?

Yes, there are two workflows, one for the UV spectroscopy one for the fluo spectroscopy. I can process each without the other. Only later in the data interpretation, both will be together as they are complementary.

Basically, could you submit changes for just one of these first (e.g. just the UV?),

Does it mean in two or even three separated python files? UV, fluo, medfilter (because this is a function that can be applied to both workflows?)

and once we nail the design,

Should this be graphically outlined?

the general layout could be copied for processing the other one?

No, the workflows for UV and fluo are really different and extract different parts of each spectrum.

  1. Identify the minimal list of steps you need to do for a workflow (I think for now we should just start with processing and leave exporting data to ascii out of this PR? Maybe things like fitting can also be added later?)

Ok to the minimal list of steps.
Re: export to ascii: This is a wish of the user. The idea behind was if the user needs to further process the data with another program that cannot handle hdf5. The users wishes to have access to all raw data and processed data.
Re fitting: Turbidity is a standard correction in UV spectroscopy I understood

  1. Ideally, your notebook should just read like a list of function calls from the ess.loki.nurf module.

What will be part then of the nurf module? Will it be composed of UV, fluo, plotting?
Will the plot functions not be part of ess.loki.nurf? They are also part of the data interpretation/visualisation and an important part during the experiment.

  1. Strip out all the plotting code for now, and just have the list of processing steps in a notebook that you put in the docs

Okay, that can be done.

  1. Add some final simple plotting in the notebook for now (we will add plotting of intermediate steps later)

What would be considered a final simple plot? UV signal as a function of wavelength, maybe processed? Same for fluo spectra?

  1. Note that if a user wants a highly customized figure made for a paper, it will not be stored here. We will just have simple plots

The user does not request highly customized figures. The plots that I generated are standard plots for UV, fluo spectroscopy. They are just a bit more complicated than an I(q) plot for example. The idea is to measure a perturbation, record SANS, fluo, UV along with it and show this during the experiment to the user.

Don't do all this in one go. Start with item 1. here, once we have a list, we can look at the other items.

Okay. I could provide a workflow list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments.

For the file/module organisation, I think separating uv, fluo and plotting is good.
So something like:

src/ess/loki/
        |--> nurf/
              |--> __init__.py
              |--> fluo.py
              |--> uv.py
              |--> tools.py  # or utils.py, or common.py, however you want to name it
              |--> plot.py

?

We would start with either fluo.py or uv.py, probably with the tools.py as well, and then add the rest later.

Should this be graphically outlined?

No need to draw things, just outlining a list of steps or function calls should be fine.

@nvaytet
Copy link
Member

nvaytet commented Sep 28, 2022

@gudlot What is the status on this? Can I help you with something to get this moving again?

@wpotrzebowski wpotrzebowski mentioned this pull request Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants