-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read poi and expectation directly from output_filename
to accelerate NeymanConstructor
#108
Conversation
Pull Request Test Coverage Report for Build 6851918194
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't go through it line-by-line yet but I think it looks good!
I already added a few comments and I'm now running a small toy study to see if everything works as expected. I'll let you know as soon as it's done 😊
alea/submitters/local.py
Outdated
_output_filename = fpat_split[0] + "_*" + fpat_split[1] | ||
if len(sorted(glob(_output_filename))) != 0: | ||
output_filename = _output_filename | ||
output_filename_list = sorted(glob(output_filename)) | ||
if len(output_filename_list) == 0: | ||
raise ValueError(f"Can not find any output file {output_filename}!") | ||
|
||
# read metadata including generate_values | ||
metadata_list = [] | ||
for _output_filename in output_filename_list: | ||
with h5py.File(_output_filename, "r", libver="latest", swmr=True) as ipt: | ||
metadata = dict( | ||
zip( | ||
ipt.attrs.keys(), | ||
[json.loads(ipt.attrs[key]) for key in ipt.attrs.keys()], | ||
) | ||
) | ||
metadata.pop("date", None) | ||
metadata_list.append(metadata) | ||
if len(set([deterministic_hash(m) for m in metadata_list])) != 1: | ||
raise ValueError( | ||
"The metadata are not the same for all " | ||
f"the {len(output_filename_list)} output!" | ||
) | ||
metadata = metadata_list[0] | ||
if metadata["poi"] != self.poi: | ||
raise ValueError( | ||
f"The poi in the metadata {metadata['poi']} is not " | ||
f"the same as the poi {self.poi}!" | ||
) | ||
|
||
# read poi and poi_expectation | ||
needed_kwargs = {**metadata["generate_values"], **needed_kwargs} | ||
poi_expectation = needed_kwargs.pop("poi_expectation", None) | ||
poi_value = needed_kwargs.get(self.poi, None) | ||
if poi_value is None: | ||
raise ValueError("Can not find the poi value in the generate_values in metadata!") | ||
# read expectation_values from metadata | ||
expectation_values = metadata["expectation_values"] | ||
# check if the poi_expectation is in expectation_values | ||
source = self.poi.replace("_rate_multiplier", "") | ||
if source not in expectation_values: | ||
warnings.warn( | ||
f"poi {self.poi} does not corresponds to any source in the model!" | ||
" so can not calculate the poi_expectation!" | ||
) | ||
else: | ||
_poi_expectation = expectation_values[source] | ||
if poi_expectation is not None: | ||
# check if the poi_expectation is consistent | ||
if not np.isclose(poi_expectation, _poi_expectation): | ||
raise ValueError( | ||
f"The poi_expectation from model {poi_expectation} is not " | ||
f"the same as the poi_expectation from toymc {_poi_expectation}!" | ||
) | ||
else: | ||
warnings.warn( | ||
"Can not find the poi_expectation in the generate_values " | ||
f"{generate_values}, so will not check if the poi_expectation " | ||
"are consistent!" | ||
) | ||
poi_expectation = _poi_expectation | ||
|
||
# read the likelihood ratio | ||
results = toyfiles_to_numpy(output_filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we test some of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, this method has become pretty lengthy. Do you think it could make sense to break out some functionality into smaller functions that you can then call in submit
? This makes it easier to test and to read 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The functions are split. I have no idea currently how to test the code of NeymanConstructor
because then we need to save the output files or run the configs when testing. That can be time-consuming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. Maybe we can write and load some trivial example or something? But it's not that straightforward, I agree.
output_filename
to accelerate NeymanConstructor
output_filename
to accelerate NeymanConstructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split version is now much more readable -- I like it! I think we should think about unittests for the NeymanConstructor
but I'd suggest to merge this now without them.
I ran the code with some examples and the results were sensible. Thanks a lot @dachengx for this improvement 😊
We do not need to initialize runner for each **{**nominal_values, **generate_values}, but we can read the poi directly from toymc files
output_filename
.This was initially an idea from @hammannr, #100 (comment).
output_filename
mode or multiple(batch)output_filename
modeoutput_filename
and make sure that the metadata is the same in the listalea/alea/runner.py
Line 340 in 80bd030
This PR is superseding #103.