Fix `Channel.hash` in multiprocessing contexts #11251

jakelishman · 2023-11-15T16:08:31Z

Summary

Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by fork.

In this case, transmitting the stored _hash over pickle meant that a DriveChannel(0) created in the main process of a macOS runner could compare equal to a DriveChannel(0) created in a separate process (standard start method spawn) and pickled over the wire to the main process, but have different hashes, violating the Python data model.

Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe.

Details and comments

This fixes a test failure that appears on Mac and Windows runners locally (but for some reason not in CI) in test.python.compiler.test_transpiler.TestTranspileParallel.test_parallel_dispatch_lazy_cal_loading.

Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe.

qiskit-bot · 2023-11-15T16:08:36Z

One or more of the the following people are requested to review this:

@Qiskit/terra-core
@nkanazawa1989

coveralls · 2023-11-15T16:28:23Z

Pull Request Test Coverage Report for Build 6880788672

13 of 19 (68.42%) changed or added relevant lines in 5 files are covered.
22 unchanged lines in 4 files lost coverage.
Overall coverage decreased (-0.03%) to 85.902%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
qiskit/pulse/model/mixed_frames.py	0	1	0.0%
qiskit/pulse/model/pulse_target.py	2	4	50.0%
qiskit/pulse/model/frames.py	9	12	75.0%

Files with Coverage Reduction	New Missed Lines	%
crates/qasm2/src/expr.rs	1	93.76%
qiskit/pulse/library/waveform.py	3	93.75%
crates/qasm2/src/lex.rs	6	91.92%
crates/qasm2/src/parse.rs	12	96.67%

Totals
Change from base Build 6879718232:	-0.03%
Covered Lines:	65938
Relevant Lines:	76760

💛 - Coveralls

wshanks

Weird. I have never thought about hash values getting compared across subprocesses with different hash seeds. It's nice that a test flagged this.

Checking the rest of the pulse code, should we fix Instruction while we are at it? That one takes more work to trigger the problem because it caches the hash on first call rather than in __init__.

qiskit/qiskit/pulse/instructions/instruction.py

Lines 303 to 306 in f7f6cb1

    
           def __hash__(self) -> int: 
        
               if self._hash is None: 
        
                   self._hash = hash((type(self), self.operands, self.name)) 
        
               return self._hash

jakelishman · 2023-11-15T16:45:11Z

I can do if you prefer:

[~/code/qiskit/terra] (qiskit/3.10) (main *)
jake@ninetales$ rg '_hash =' qiskit/pulse
qiskit/pulse/channels.py
99:        self._hash = hash((self.__class__.__name__, self._index))

qiskit/pulse/instructions/instruction.py
56:        self._hash = None
305:            self._hash = hash((type(self), self.operands, self.name))

qiskit/pulse/model/frames.py
42:        self._hash = hash((type(self), identifier))
53:        return type(self) is type(other) and self._hash == other._hash

qiskit/pulse/model/logical_elements.py
42:        self._hash = hash((self._index, type(self)))

qiskit/pulse/model/mixed_frames.py
50:        self._hash = hash((self._logical_element, self._frame))

Looks like there's a few places where it could be happening.

Our CI test suite didn't actually catch this, it's just something that happens locally for me. Matt and I are trying to track down why the CI isn't catching it, because as far as we can tell, it should be. It's probably something to do with multiprocessing detection. It popped up again when Matt was trying to get PGO set up for our wheel builds, which caused us to use GHA runners to run the test suite (as a load for profiling), and they also show the problem.

wshanks · 2023-11-15T17:02:26Z

Looks like there's a few places where it could be happening.

Good catch. I was grepping the stable channel and missed the new instances in the new model package.

Our CI test suite didn't actually catch this, it's just something that happens locally for me. Matt and I are trying to track down why the CI isn't catching it, because as far as we can tell, it should be.

Linux defaults to using fork which you noted uses the same hash seed across subprocesses and Windows and macOS default to disabling multiprocessing for transpilation, so I don't think it is surprising that the CI didn't catch this, unless CI changes one of these defaults?

jakelishman · 2023-11-15T17:10:18Z

We override the parallel default in the relevant test, so it should appear. On macOS, I'm running the same stestr run command that Azure CI runs (as far as I recall / saw), so in theory if I see it locally, CI should see it as well. It's all a bit weird.

The new models implementations of __eq__ are actually quite wrong and definitely need to be changed - __eq__ absolutely should not depend on hash as its principal source of equality; it's totally possible for there to be a hash collision without implying the two objects are equal. I can fix that in this PR - just running the test suite locally now. Good to know that that code is new, though, so doesn't need a release note, thanks.

wshanks · 2023-11-15T17:17:19Z

It's all a bit weird.

What is the failure? I am guessing that self.assertEqual(added_cal, ref_cal) fails, but I thought that would just call __eq__ which doesn't use the hash (unlike the model classes that you pointed out).

jakelishman · 2023-11-15T17:23:12Z

Yeah, that's the failure, and the trick is that __eq__ somewhere in there calls set(Tuple[Channel, ...]), and Python's set equality is implemented by effectively len(x) == len(y) and all(x_item in y for x_item in x), which is why the hash of the channels comes in.

This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal.

jakelishman · 2023-11-15T17:26:41Z

All objects now updated in the latest commit.

wshanks · 2023-11-15T17:58:35Z

the trick is that eq somewhere in there calls set(Tuple[Channel, ...])

Right, that makes sense because we are focused on Channel.__eq__ here but the test is comparing ScheduleBlock objects that have channels nested inside of them.

wshanks

This looks good to me. We could mention Instruction in the release note but it is a pretty minor case.

@TsafrirA should approve since he just added the model classes. One thing I wonder about with those is if the Frame base class wants to codify storing an unhashed string identifier. With the changes here, the base class no longer indicates that subclasses need to be hashable.

jakelishman · 2023-11-15T18:36:21Z

The magic methods interact a bit unusually with subclassing, especially __hash__. For safety in the data model, Python actually sets __hash__ = None for subclasses if they override __eq__; hashing isn't inherited if equality is overriden, and that's something a subclass would frequently do, so it can be a little misleading to think that a base class having __hash__ ensures everything is hashable.

Of course, well behaved subclasses should know that if they override equality they should override hashing as well, but I've seen enough examples across Qiskit to know that this is an area of the Python data model that's not super well understood by everyone.

wshanks · 2023-11-15T18:55:10Z

I've seen enough examples across Qiskit to know that this is an area of the Python data model that's not super well understood by everyone.

Definitely something I forget about. Doing another code search of the pulse package, I see that RelativeBarrier and Call (which is deprecated any way) fall into this trap. Also, ScheduleBlock explicitly sets __hash__ = None though it doesn't need to.

jakelishman · 2023-11-15T18:59:31Z

Yeah, even if you understand it fully, it's definitely very easy to overlook or get wrong. Hashing is not my favourite corner of Python as a language, for sure.

TsafrirA · 2023-11-15T19:51:27Z

Thanks @jakelishman . To be honest, I didn't have a good reason to use self._hash outside of the fact that the legacy Channel used that pattern.

jakelishman · 2023-11-15T20:27:21Z

Thanks both - Tsafrir, are you happy for me to tag this for merge?

TsafrirA · 2023-11-15T20:28:20Z

@jakelishman sure.

* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c) # Conflicts: # qiskit/pulse/model/frames.py # qiskit/pulse/model/mixed_frames.py # qiskit/pulse/model/pulse_target.py

* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c)

* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c) Co-authored-by: Jake Lishman <[email protected]>

jakelishman added stable backport potential The bug might be minimal and/or import enough to be port to stable Changelog: Bugfix Include in the "Fixed" section of the changelog mod: pulse Related to the Pulse module labels Nov 15, 2023

jakelishman requested review from eggerdj, wshanks and a team as code owners November 15, 2023 16:08

wshanks previously approved these changes Nov 15, 2023

View reviewed changes

jakelishman dismissed wshanks’s stale review via 14d7f89 November 15, 2023 17:26

wshanks approved these changes Nov 15, 2023

View reviewed changes

jakelishman added this pull request to the merge queue Nov 15, 2023

Merged via the queue into Qiskit:main with commit 3c1a87c Nov 15, 2023
13 checks passed

mergify bot mentioned this pull request Nov 15, 2023

Fix Channel.__hash__ in multiprocessing contexts (backport #11251) #11255

Merged

jakelishman deleted the fix-pulse-channel-hash branch November 15, 2023 21:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `Channel.hash` in multiprocessing contexts #11251

Fix `Channel.hash` in multiprocessing contexts #11251

jakelishman commented Nov 15, 2023

qiskit-bot commented Nov 15, 2023

coveralls commented Nov 15, 2023 •

edited

Loading

wshanks left a comment

jakelishman commented Nov 15, 2023 •

edited

Loading

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

wshanks left a comment

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

TsafrirA commented Nov 15, 2023

jakelishman commented Nov 15, 2023

TsafrirA commented Nov 15, 2023

	def __hash__(self) -> int:
	if self._hash is None:
	self._hash = hash((type(self), self.operands, self.name))
	return self._hash

Fix Channel.__hash__ in multiprocessing contexts #11251

Fix Channel.__hash__ in multiprocessing contexts #11251

Conversation

jakelishman commented Nov 15, 2023

Summary

Details and comments

qiskit-bot commented Nov 15, 2023

coveralls commented Nov 15, 2023 • edited Loading

Pull Request Test Coverage Report for Build 6880788672

💛 - Coveralls

wshanks left a comment

Choose a reason for hiding this comment

jakelishman commented Nov 15, 2023 • edited Loading

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

wshanks left a comment

Choose a reason for hiding this comment

jakelishman commented Nov 15, 2023

wshanks commented Nov 15, 2023

jakelishman commented Nov 15, 2023

TsafrirA commented Nov 15, 2023

jakelishman commented Nov 15, 2023

TsafrirA commented Nov 15, 2023

Fix `Channel.hash` in multiprocessing contexts #11251

Fix `Channel.hash` in multiprocessing contexts #11251

coveralls commented Nov 15, 2023 •

edited

Loading

jakelishman commented Nov 15, 2023 •

edited

Loading