Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read poi and expectation directly from output_filename to accelerate NeymanConstructor #108

Merged
merged 8 commits into from
Nov 13, 2023

Conversation

dachengx
Copy link
Collaborator

@dachengx dachengx commented Nov 10, 2023

We do not need to initialize runner for each **{**nominal_values, **generate_values}, but we can read the poi directly from toymc files output_filename.

This was initially an idea from @hammannr, #100 (comment).

  1. Save expectation_values to metadata
  2. First figure out whether it is single output_filename mode or multiple(batch) output_filename mode
  3. Read metadata from the list of output_filename and make sure that the metadata is the same in the list
  4. Take the first entry of the metadata list as metadata
  5. Assert it contains poi, which is guaranteed by
    metadata["generate_values"] = self.generate_values
  6. Read poi and expectation_values from metadata
  7. Read poi_expectation from expectation_values

This PR is superseding #103.

Copy link

github-actions bot commented Nov 10, 2023

Pull Request Test Coverage Report for Build 6851918194

  • 8 of 23 (34.78%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.8%) to 90.842%

Changes Missing Coverage Covered Lines Changed/Added Lines %
alea/utils.py 3 18 16.67%
Totals Coverage Status
Change from base Build 6831651108: -0.8%
Covered Lines: 1359
Relevant Lines: 1496

💛 - Coveralls

@dachengx dachengx requested review from kdund and hammannr and removed request for kdund November 10, 2023 06:37
Copy link
Collaborator

@hammannr hammannr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't go through it line-by-line yet but I think it looks good!
I already added a few comments and I'm now running a small toy study to see if everything works as expected. I'll let you know as soon as it's done 😊

Comment on lines 172 to 236
_output_filename = fpat_split[0] + "_*" + fpat_split[1]
if len(sorted(glob(_output_filename))) != 0:
output_filename = _output_filename
output_filename_list = sorted(glob(output_filename))
if len(output_filename_list) == 0:
raise ValueError(f"Can not find any output file {output_filename}!")

# read metadata including generate_values
metadata_list = []
for _output_filename in output_filename_list:
with h5py.File(_output_filename, "r", libver="latest", swmr=True) as ipt:
metadata = dict(
zip(
ipt.attrs.keys(),
[json.loads(ipt.attrs[key]) for key in ipt.attrs.keys()],
)
)
metadata.pop("date", None)
metadata_list.append(metadata)
if len(set([deterministic_hash(m) for m in metadata_list])) != 1:
raise ValueError(
"The metadata are not the same for all "
f"the {len(output_filename_list)} output!"
)
metadata = metadata_list[0]
if metadata["poi"] != self.poi:
raise ValueError(
f"The poi in the metadata {metadata['poi']} is not "
f"the same as the poi {self.poi}!"
)

# read poi and poi_expectation
needed_kwargs = {**metadata["generate_values"], **needed_kwargs}
poi_expectation = needed_kwargs.pop("poi_expectation", None)
poi_value = needed_kwargs.get(self.poi, None)
if poi_value is None:
raise ValueError("Can not find the poi value in the generate_values in metadata!")
# read expectation_values from metadata
expectation_values = metadata["expectation_values"]
# check if the poi_expectation is in expectation_values
source = self.poi.replace("_rate_multiplier", "")
if source not in expectation_values:
warnings.warn(
f"poi {self.poi} does not corresponds to any source in the model!"
" so can not calculate the poi_expectation!"
)
else:
_poi_expectation = expectation_values[source]
if poi_expectation is not None:
# check if the poi_expectation is consistent
if not np.isclose(poi_expectation, _poi_expectation):
raise ValueError(
f"The poi_expectation from model {poi_expectation} is not "
f"the same as the poi_expectation from toymc {_poi_expectation}!"
)
else:
warnings.warn(
"Can not find the poi_expectation in the generate_values "
f"{generate_values}, so will not check if the poi_expectation "
"are consistent!"
)
poi_expectation = _poi_expectation

# read the likelihood ratio
results = toyfiles_to_numpy(output_filename)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we test some of this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this method has become pretty lengthy. Do you think it could make sense to break out some functionality into smaller functions that you can then call in submit? This makes it easier to test and to read 😊

Copy link
Collaborator Author

@dachengx dachengx Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The functions are split. I have no idea currently how to test the code of NeymanConstructor because then we need to save the output files or run the configs when testing. That can be time-consuming.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. Maybe we can write and load some trivial example or something? But it's not that straightforward, I agree.

@dachengx dachengx changed the title Read poi directly from output_filename to accelerate NeymanConstructor Read poi and expectation directly from output_filename to accelerate NeymanConstructor Nov 13, 2023
Copy link
Collaborator

@hammannr hammannr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The split version is now much more readable -- I like it! I think we should think about unittests for the NeymanConstructor but I'd suggest to merge this now without them.

I ran the code with some examples and the results were sensible. Thanks a lot @dachengx for this improvement 😊

@dachengx dachengx merged commit 372fa14 into main Nov 13, 2023
@dachengx dachengx deleted the fast_neyman_2 branch November 13, 2023 20:40
@dachengx dachengx added the enhancement New feature or request label Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants