Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harden vs. TaskState collisions #6593

Merged
merged 2 commits into from
Jun 25, 2022
Merged

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Jun 17, 2022

#6525 removed the assertion that multiple TaskState objects can't exist at the same time, because it was incompatible with the new instances weakset.
This PR reintroduces the check, but limited to the WorkerState and more methodical. It also makes sure that validate_state will fail if the previous incarnation of a task remained in the WorkerState for any reason, whereas before it would only happen in case of key hash collision.

@crusaderky crusaderky self-assigned this Jun 17, 2022
@crusaderky crusaderky added the stability Issue or feature related to cluster stability (e.g. deadlock) label Jun 17, 2022
@crusaderky crusaderky linked an issue Jun 17, 2022 that may be closed by this pull request
@github-actions
Copy link
Contributor

github-actions bot commented Jun 17, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  +       15         15 suites  +15   10h 20m 22s ⏱️ + 10h 20m 22s
  2 892 tests +  2 892    2 803 ✔️ +  2 803    84 💤 +  84  5 +5 
21 423 runs  +21 423  20 451 ✔️ +20 451  967 💤 +967  5 +5 

For more details on these failures, see this check.

Results for commit 6294428. ± Comparison against base commit 3551d15.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator Author

crusaderky commented Jun 20, 2022

This PR accidentally relates with #6585, which is dealing with a very similar design issue on the scheduler side:

  • the SchedulerState holding sets of scheduler.WorkerState objects there
  • the worker_state_machine.WorkerState holding sets of TaskState objects here

In both cases, there should never be a case of duplicate objects in the state sets.
We should agree on a coherent design for the two.

Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not merge this until the conversation in #6585 (comment) is settled.

Comment on lines 285 to 293
def __hash__(self) -> int:
return hash(self.key)
# See note in __eq__
return id(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change required to get the validation passing?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, both __eq__ and __hash__ are unnecessary to get the behavior we want:

User-defined classes have __eq__() and __hash__() methods by default; with them, all objects compare unequal (except with themselves) and x.__hash__() returns an appropriate value such that x == y implies both that x is y and hash(x) == hash(y).
https://docs.python.org/3/reference/datamodel.html#object.__hash__

I'd kinda prefer to not implement them at all and just use the defaults. We can keep a comment explaining this, but having the explicit methods feels redundant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You must implement __hash__ otherwise @dataclass will make it unhashable.
I removed __eq__ and overhauled the PR.

crusaderky added a commit to crusaderky/distributed that referenced this pull request Jun 22, 2022
@crusaderky crusaderky force-pushed the WSMR/task_hash branch 6 times, most recently from 49ffe66 to 6f9acb8 Compare June 22, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stability Issue or feature related to cluster stability (e.g. deadlock)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants