Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of an nT-like likelihood #32

Merged
merged 48 commits into from
Jul 17, 2023
Merged

Conversation

hammannr
Copy link
Collaborator

This PR adds the nT-like blueice-based likelihood structure to alea (solves #23 ) and also introduces a suggestion for split configs (#15)
The model seems to work for my first basic tests. I'll test it again with the changes on the current master and then add some docstrings, clean up, and some examples for how to use it 😊 But of course, if you can already spot something that you would suggest doing differently: Let me know! 😄

@github-actions
Copy link

Pull Request Test Coverage Report for Build 5534394288

  • 0 of 149 (0.0%) changed or added relevant lines in 5 files are covered.
  • 2 unchanged lines in 2 files lost coverage.
  • Overall coverage remained the same at 0.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
alea/statistical_model.py 0 7 0.0%
alea/simulators.py 0 8 0.0%
alea/parameters.py 0 10 0.0%
alea/utils.py 0 17 0.0%
alea/blueice_extended_model.py 0 107 0.0%
Files with Coverage Reduction New Missed Lines %
alea/simulators.py 1 0%
alea/utils.py 1 0%
Totals Coverage Status
Change from base Build 5529398966: 0.0%
Covered Lines: 0
Relevant Lines: 2801

💛 - Coveralls

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

pep8

alea/blueice_extended_model.py|142 col 1| S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
alea/blueice_extended_model.py|151 col 1| D102 Missing docstring in public method
alea/blueice_extended_model.py|154 col 1| D102 Missing docstring in public method
alea/blueice_extended_model.py|157 col 1| S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.
alea/blueice_extended_model.py|177 col 1| D102 Missing docstring in public method
alea/blueice_extended_model.py|178 col 9| WPS221 Found line with high Jones Complexity: 17 > 14
alea/utils.py|266 col 1| D103 Missing docstring in public function
alea/utils.py|744 col 1| D103 Missing docstring in public function
alea/utils.py|750 col 1| S307 Use of possibly insecure function - consider using safer ast.literal_eval.
alea/statistical_model.py|152 col 1| D102 Missing docstring in public method
alea/statistical_model.py|168 col 22| E231 missing whitespace after ':'
alea/statistical_model.py|168 col 30| E231 missing whitespace after ','
alea/parameters.py|218 col 1| D200 One-line docstring should fit on one line with quotes
alea/parameters.py|221 col 9| WPS221 Found line with high Jones Complexity: 16 > 14
alea/parameters.py|229 col 9| WPS221 Found line with high Jones Complexity: 16 > 14

@github-actions
Copy link

github-actions bot commented Jul 12, 2023

Pull Request Test Coverage Report for Build 5572804010

  • 0 of 213 (0.0%) changed or added relevant lines in 5 files are covered.
  • 3 unchanged lines in 3 files lost coverage.
  • Overall coverage remained the same at 0.0%

Changes Missing Coverage Covered Lines Changed/Added Lines %
alea/simulators.py 0 9 0.0%
alea/parameters.py 0 13 0.0%
alea/statistical_model.py 0 20 0.0%
alea/utils.py 0 31 0.0%
alea/blueice_extended_model.py 0 140 0.0%
Files with Coverage Reduction New Missed Lines %
alea/simulators.py 1 0%
alea/statistical_model.py 1 0%
alea/utils.py 1 0%
Totals Coverage Status
Change from base Build 5529398966: 0.0%
Covered Lines: 0
Relevant Lines: 2852

💛 - Coveralls

@kdund
Copy link
Collaborator

kdund commented Jul 12, 2023

Tried shortly :)!

I hit some path issues (it could not find the templates) that lead me to this:

https://github.com/XENONnT/alea/blob/0faf6a304ef2bae89ac35152b48b9d97d70f26bc/alea/utils.py#L273C18-L273C18

This will then fail unless you're running it in the base alea folder.
I think in most cases, these paths will be absolute paths (or paths relative to the placement of the yaml file, not to alea-- I think I would remove the prefixed alea. (possibly allow some global template_folder arg instead?)

@hammannr
Copy link
Collaborator Author

The basic functionality of the blueice-based likelihood is now fully implemented (and now also cleaned up). There are a few open points that I marked with TODO or IDEA that I would implement in a dedicated PR since I don't think they are crucial for now and this way we can continue implementing things collaboratively.
If everyone agrees, I'd then add those open items as issues.

Below you can find an example code snippet on how to run the likelihood and what one can do with it.
Looking forward to your comments! 😊

from alea.blueice_extended_model import BlueiceExtendedModel
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
import os
import alea
from tqdm import tqdm

alea_dir = os.path.dirname(alea.__file__)
config_path = os.path.join(alea_dir, "examples/unbinned_wimp_statistical_model.yaml")

# Initialize the statistical model
statistical_model = BlueiceExtendedModel.from_config(config_path)

##### 1) Simple example #####
# Generate and assign some data
data = statistical_model.generate_data()
statistical_model.data = data

# Fit the model
best_result, best_ll = statistical_model.fit(verbose=True)

# Plot the data
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
for i, ax in enumerate(axes):
    for source in range(2):
        source_selection = data[i]["source"] == source
        source_name = statistical_model._likelihood.likelihood_list[i].source_name_list[source]
        ax.plot(data[i]["cs1"][source_selection],
                 data[i]["cs2"][source_selection],
                 "o", label=source_name)
        ax.legend()
        ax.semilogy()
        ax.set_xlabel("cs1 [PE]")
        ax.set_ylabel("cs2 [PE]")
        ax.set_ylim(1e2, 1e4)
        ax.set_title(f"{statistical_model.likelihood_names[i]}")
plt.show()

##### 2) Compute the confidence interval #####
dl, ul = statistical_model.confidence_interval("wimp_rate_multiplier",
                                      parameter_interval_bounds=[0, 50],
                                      )

# Check the result
rate_vals = np.linspace(max(0, dl - .5), ul + .5, 200)
lls = []
for rate in rate_vals:
    res, ll = statistical_model.fit(wimp_rate_multiplier=rate)
    lls.append(ll)
lls = np.array(lls)

plt.plot(rate_vals, 2 * (best_ll - lls))
plt.axhline(sps.chi2(1).ppf(.9), c="orange", ls="--")
plt.axvline(best_result["wimp_rate_multiplier"], c="r")
plt.axvline(dl, c="orange")
plt.axvline(ul, c="orange")
plt.ylim(0, 4)
plt.xlim(0)
plt.xlabel("wimp_rate_multiplier")
plt.ylabel("-2 * LLR")
plt.show()

##### 3) Compute median upper limit distribution for b-only toys #####
n_mc = 200
dls = []
uls = []
for i in tqdm(range(n_mc), desc="Computing confidence intervals"):
    # generate background only data
    data = statistical_model.generate_data(wimp_rate_multiplier=0)
    statistical_model.data = data
    dl, ul = statistical_model.confidence_interval(
        "wimp_rate_multiplier",
        parameter_interval_bounds=[0, 50]
        )
    dls.append(dl)
    uls.append(ul)
dls = np.array(dls)
uls = np.array(uls)

fig, axes = plt.subplots(1, 2, figsize=(10, 5))
dls[dls == -np.inf] = 0
axes[0].hist(dls, bins=20)
axes[0].set_xlabel("dl [wimp_rate_multiplier]")

axes[1].hist(uls, bins=20)
axes[1].set_xlabel("ul [wimp_rate_multiplier]")
axes[1].axvline(np.median(uls), c="crimson")

for ax in axes:
    ax.semilogy()
    ax.set_xlim(0)

@hammannr hammannr marked this pull request as ready for review July 13, 2023 14:27
@hammannr hammannr requested review from kdund and dachengx July 13, 2023 14:28
eval_analysis_space.append(eval_element)
return eval_analysis_space

def find_file_resource(file: str, possible_locations: list) -> str:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E302 expected 2 blank lines, found 1


def find_file_resource(file: str, possible_locations: list) -> str:
"""
Function to look for a file in a list of folders (the file name may include folder structure too).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (102 > 100 characters)

fname = Path(loc, file)
if fname.is_file():
return str(fname)
raise FileNotFoundError("{:s} not found in any of {:s}".format(file, ",".join(possible_locations)))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (103 > 100 characters)

@hammannr hammannr added the enhancement New feature or request label Jul 13, 2023
This was linked to issues Jul 13, 2023
Copy link
Collaborator

@kdund kdund left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runs sensibly, code sensible, thanks!

if isinstance(likelihood_config["template_folder"], str):
template_folder_list = [likelihood_config["template_folder"]]
elif isinstance(likelihood_config["template_folder"], list):
template_folder_list = likelihood_config["template_folder"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding the alea example data folder as a default search folder here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you mean always just add "alea/examples/templates/wimp_templates" to the end of the list? It probably doesn't hurt adding it, though I'm not sure how often this will be used (for the examples I think it is good to write things explicitly so that people know what to change when implementing their own model).

ret = dict()
for ll in self._likelihood.likelihood_list[:-1]: # ancillary likelihood does not contribute

ll_pars = list(ll.rate_parameters.keys()) + list(ll.shape_parameters.keys())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS221 Found line with high Jones Complexity: 15 > 14


# add all parameters to extra_dont_hash for each source unless it is used:
for i, source in enumerate(config["sources"]):
parameters_to_ignore: List[str] = [p.name for p in self.parameters if (p.ptype == "shape")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p


# add all parameters to extra_dont_hash for each source unless it is used:
for i, source in enumerate(config["sources"]):
parameters_to_ignore: List[str] = [p.name for p in self.parameters if (p.ptype == "shape")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS465 Found likely bitwise and boolean operation mixup


# add all parameters to extra_dont_hash for each source unless it is used:
for i, source in enumerate(config["sources"]):
parameters_to_ignore: List[str] = [p.name for p in self.parameters if (p.ptype == "shape")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS441 Found control variable used after block: p


# add all parameters to extra_dont_hash for each source unless it is used:
for i, source in enumerate(config["sources"]):
parameters_to_ignore: List[str] = [p.name for p in self.parameters if (p.ptype == "shape")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
E501 line too long (106 > 100 characters)

assert efficiency_parameter.ptype == "efficiency", "The parameter {:s} must" \
" be an efficiency".format(efficiency_name)
limits = efficiency_parameter.fit_limits
assert 0 <= limits[0], 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
N400: Found backslash that is used for line breaking

assert efficiency_parameter.ptype == "efficiency", "The parameter {:s} must" \
" be an efficiency".format(efficiency_name)
limits = efficiency_parameter.fit_limits
assert 0 <= limits[0], 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
WPS309 Found reversed compare order

limits = efficiency_parameter.fit_limits
assert 0 <= limits[0], 'Efficiency parameters including {:s} must be' \
' constrained to be nonnegative'.format(efficiency_name)
assert np.isfinite(limits[1]), 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
S101 Use of assert detected. The enclosed code will be removed when compiling to optimised byte code.

limits = efficiency_parameter.fit_limits
assert 0 <= limits[0], 'Efficiency parameters including {:s} must be' \
' constrained to be nonnegative'.format(efficiency_name)
assert np.isfinite(limits[1]), 'Efficiency parameters including {:s} must be' \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
N400: Found backslash that is used for line breaking

@@ -150,39 +163,33 @@ def store_data(
kw = {'metadata': metadata} if metadata is not None else dict()
toydata_to_file(file_name, data_list, data_name_list, **kw)

def get_expectations(self):
return NotImplementedError("get_expectation is optional to implement")
def get_expectation_values(self, **parameter_values):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[pep8] reported by reviewdog 🐶
D102 Missing docstring in public method

@hammannr hammannr merged commit 257ab91 into master Jul 17, 2023
@hammannr hammannr deleted the nt_likelihood branch July 17, 2023 07:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement blueice-extended likelihood Split Configs
2 participants