-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Channel.__hash__
in multiprocessing contexts
#11251
Conversation
Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe.
One or more of the the following people are requested to review this:
|
Pull Request Test Coverage Report for Build 6880788672
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird. I have never thought about hash values getting compared across subprocesses with different hash seeds. It's nice that a test flagged this.
Checking the rest of the pulse code, should we fix Instruction
while we are at it? That one takes more work to trigger the problem because it caches the hash on first call rather than in __init__
.
qiskit/qiskit/pulse/instructions/instruction.py
Lines 303 to 306 in f7f6cb1
def __hash__(self) -> int: | |
if self._hash is None: | |
self._hash = hash((type(self), self.operands, self.name)) | |
return self._hash |
I can do if you prefer:
Looks like there's a few places where it could be happening. Our CI test suite didn't actually catch this, it's just something that happens locally for me. Matt and I are trying to track down why the CI isn't catching it, because as far as we can tell, it should be. It's probably something to do with multiprocessing detection. It popped up again when Matt was trying to get PGO set up for our wheel builds, which caused us to use GHA runners to run the test suite (as a load for profiling), and they also show the problem. |
Good catch. I was grepping the stable channel and missed the new instances in the new model package.
Linux defaults to using |
We override the parallel default in the relevant test, so it should appear. On macOS, I'm running the same The new |
What is the failure? I am guessing that |
Yeah, that's the failure, and the trick is that |
This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal.
All objects now updated in the latest commit. |
Right, that makes sense because we are focused on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. We could mention Instruction
in the release note but it is a pretty minor case.
@TsafrirA should approve since he just added the model classes. One thing I wonder about with those is if the Frame
base class wants to codify storing an unhashed string identifier. With the changes here, the base class no longer indicates that subclasses need to be hashable.
The magic methods interact a bit unusually with subclassing, especially Of course, well behaved subclasses should know that if they override equality they should override hashing as well, but I've seen enough examples across Qiskit to know that this is an area of the Python data model that's not super well understood by everyone. |
Definitely something I forget about. Doing another code search of the pulse package, I see that |
Yeah, even if you understand it fully, it's definitely very easy to overlook or get wrong. Hashing is not my favourite corner of Python as a language, for sure. |
Thanks @jakelishman . To be honest, I didn't have a good reason to use |
Thanks both - Tsafrir, are you happy for me to tag this for merge? |
@jakelishman sure. |
* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c) # Conflicts: # qiskit/pulse/model/frames.py # qiskit/pulse/model/mixed_frames.py # qiskit/pulse/model/pulse_target.py
* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c)
* Fix `Channel.__hash__` in multiprocessing contexts Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by `fork`. In this case, transmitting the stored `_hash` over pickle meant that a `DriveChannel(0)` created in the main process of a macOS runner could compare equal to a `DriveChannel(0)` created in a separate process (standard start method `spawn`) and pickled over the wire to the main process, but have different hashes, violating the Python data model. Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe. * Fix `hash` and equality in other pulse objects This removes all caching of items' `hash`es. This practice is quite fraught in multiprocessing contexts, and should only be done when it is absolutely performance critical. In a couple of cases, the pulse objects were using the cached `hash` as the main component of their `__eq__` methods, which is not correct; it's totally valid to have hash collisions without implying that two objects are equal. (cherry picked from commit 3c1a87c) Co-authored-by: Jake Lishman <[email protected]>
Summary
Storing an explicit hash key is fragile in cases that a channel might be created in a different process to where it might be compared or the hash used, because the hash seeding can vary depending on how the new interpreter process was created, especially if it's not done by
fork
.In this case, transmitting the stored
_hash
over pickle meant that aDriveChannel(0)
created in the main process of a macOS runner could compare equal to aDriveChannel(0)
created in a separate process (standard start methodspawn
) and pickled over the wire to the main process, but have different hashes, violating the Python data model.Instead, we can just use the standard Python behaviour of creating the hash on demand when requested; this should typically be preferred unless absolutely necessary for critical performance reasons, because it will generally fail safe.
Details and comments
This fixes a test failure that appears on Mac and Windows runners locally (but for some reason not in CI) in
test.python.compiler.test_transpiler.TestTranspileParallel.test_parallel_dispatch_lazy_cal_loading
.