Read poi and expectation directly from `output_filename` to accelerate `NeymanConstructor` #108

dachengx · 2023-11-10T01:36:51Z

We do not need to initialize runner for each **{**nominal_values, **generate_values}, but we can read the poi directly from toymc files output_filename.

This was initially an idea from @hammannr, #100 (comment).

Save expectation_values to metadata
First figure out whether it is single output_filename mode or multiple(batch) output_filename mode
Read metadata from the list of output_filename and make sure that the metadata is the same in the list
Take the first entry of the metadata list as metadata
Assert it contains poi, which is guaranteed by

alea/alea/runner.py

Line 340 in 80bd030

metadata["generate_values"] = self.generate_values
Read poi and expectation_values from metadata
Read poi_expectation from expectation_values

This PR is superseding #103.

github-actions · 2023-11-10T01:39:44Z

Pull Request Test Coverage Report for Build 6851918194

8 of 23 (34.78%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.8%) to 90.842%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
alea/utils.py	3	18	16.67%

Totals
Change from base Build 6831651108:	-0.8%
Covered Lines:	1359
Relevant Lines:	1496

💛 - Coveralls

hammannr

I didn't go through it line-by-line yet but I think it looks good!
I already added a few comments and I'm now running a small toy study to see if everything works as expected. I'll let you know as soon as it's done 😊

alea/models/blueice_extended_model.py

hammannr · 2023-11-13T10:05:56Z

alea/submitters/local.py

+            _output_filename = fpat_split[0] + "_*" + fpat_split[1]
+            if len(sorted(glob(_output_filename))) != 0:
+                output_filename = _output_filename
+            output_filename_list = sorted(glob(output_filename))
+            if len(output_filename_list) == 0:
+                raise ValueError(f"Can not find any output file {output_filename}!")
+
+            # read metadata including generate_values
+            metadata_list = []
+            for _output_filename in output_filename_list:
+                with h5py.File(_output_filename, "r", libver="latest", swmr=True) as ipt:
+                    metadata = dict(
+                        zip(
+                            ipt.attrs.keys(),
+                            [json.loads(ipt.attrs[key]) for key in ipt.attrs.keys()],
+                        )
+                    )
+                metadata.pop("date", None)
+                metadata_list.append(metadata)
+            if len(set([deterministic_hash(m) for m in metadata_list])) != 1:
+                raise ValueError(
+                    "The metadata are not the same for all "
+                    f"the {len(output_filename_list)} output!"
+                )
+            metadata = metadata_list[0]
+            if metadata["poi"] != self.poi:
+                raise ValueError(
+                    f"The poi in the metadata {metadata['poi']} is not "
+                    f"the same as the poi {self.poi}!"
+                )
+
+            # read poi and poi_expectation
+            needed_kwargs = {**metadata["generate_values"], **needed_kwargs}
+            poi_expectation = needed_kwargs.pop("poi_expectation", None)
+            poi_value = needed_kwargs.get(self.poi, None)
+            if poi_value is None:
+                raise ValueError("Can not find the poi value in the generate_values in metadata!")
+            # read expectation_values from metadata
+            expectation_values = metadata["expectation_values"]
+            # check if the poi_expectation is in expectation_values
+            source = self.poi.replace("_rate_multiplier", "")
+            if source not in expectation_values:
+                warnings.warn(
+                    f"poi {self.poi} does not corresponds to any source in the model!"
+                    " so can not calculate the poi_expectation!"
+                )
+            else:
+                _poi_expectation = expectation_values[source]
+                if poi_expectation is not None:
+                    # check if the poi_expectation is consistent
+                    if not np.isclose(poi_expectation, _poi_expectation):
+                        raise ValueError(
+                            f"The poi_expectation from model {poi_expectation} is not "
+                            f"the same as the poi_expectation from toymc {_poi_expectation}!"
+                        )
+                else:
+                    warnings.warn(
+                        "Can not find the poi_expectation in the generate_values "
+                        f"{generate_values}, so will not check if the poi_expectation "
+                        "are consistent!"
+                    )
+                    poi_expectation = _poi_expectation
+
+            # read the likelihood ratio
+            results = toyfiles_to_numpy(output_filename)


Can we test some of this?

In general, this method has become pretty lengthy. Do you think it could make sense to break out some functionality into smaller functions that you can then call in submit? This makes it easier to test and to read 😊

The functions are split. I have no idea currently how to test the code of NeymanConstructor because then we need to save the output files or run the configs when testing. That can be time-consuming.

You're right. Maybe we can write and load some trivial example or something? But it's not that straightforward, I agree.

hammannr

The split version is now much more readable -- I like it! I think we should think about unittests for the NeymanConstructor but I'd suggest to merge this now without them.

I ran the code with some examples and the results were sensible. Thanks a lot @dachengx for this improvement 😊

Read poi directly from output_filename to accelerate NeymanConstructor

d27c8b1

dachengx added 3 commits November 9, 2023 22:47

Add option to add poi_expectation to limit_threshold

efb7a21

Save expectation_values to metadata, so NeymanConstructor can refer to

8f0d11e

Be compatible to the simplest model

e39d0a4

dachengx requested review from kdund and hammannr and removed request for kdund November 10, 2023 06:37

Merge branch 'main' into fast_neyman_2

b1e8012

hammannr reviewed Nov 13, 2023

View reviewed changes

Remove unnecessary functions

cf02ab0

dachengx changed the title ~~Read poi directly from output_filename to accelerate NeymanConstructor~~ Read poi and expectation directly from output_filename to accelerate NeymanConstructor Nov 13, 2023

dachengx added 2 commits November 13, 2023 08:58

Move out search_filename_pattern to utils

2b38a2f

Split functions to utils and methods of NeymanConstructor

7ef203f

dachengx mentioned this pull request Nov 13, 2023

Only initialize runner once in NeymanConstructor #103

Closed

hammannr approved these changes Nov 13, 2023

View reviewed changes

dachengx merged commit 372fa14 into main Nov 13, 2023

dachengx deleted the fast_neyman_2 branch November 13, 2023 20:40

dachengx added the enhancement New feature or request label Nov 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read poi and expectation directly from `output_filename` to accelerate `NeymanConstructor` #108

Read poi and expectation directly from `output_filename` to accelerate `NeymanConstructor` #108

dachengx commented Nov 10, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading

hammannr left a comment

hammannr Nov 13, 2023

hammannr Nov 13, 2023

dachengx Nov 13, 2023 •

edited

Loading

hammannr Nov 13, 2023

hammannr left a comment

Read poi and expectation directly from output_filename to accelerate NeymanConstructor #108

Read poi and expectation directly from output_filename to accelerate NeymanConstructor #108

Conversation

dachengx commented Nov 10, 2023 • edited Loading

github-actions bot commented Nov 10, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6851918194

💛 - Coveralls

hammannr left a comment

Choose a reason for hiding this comment

hammannr Nov 13, 2023

Choose a reason for hiding this comment

hammannr Nov 13, 2023

Choose a reason for hiding this comment

dachengx Nov 13, 2023 • edited Loading

Choose a reason for hiding this comment

hammannr Nov 13, 2023

Choose a reason for hiding this comment

hammannr left a comment

Choose a reason for hiding this comment

Read poi and expectation directly from `output_filename` to accelerate `NeymanConstructor` #108

Read poi and expectation directly from `output_filename` to accelerate `NeymanConstructor` #108

dachengx commented Nov 10, 2023 •

edited

Loading

github-actions bot commented Nov 10, 2023 •

edited

Loading

dachengx Nov 13, 2023 •

edited

Loading