-
-
Notifications
You must be signed in to change notification settings - Fork 720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transition tracing for scheduler task transitions #5849
Comments
@crusaderky, it looks like there are already stimuli generated by |
They capture what the AMM decided to do, but not why. The why is currently captured by enabling the (extremely verbose) task_logger. There may be more structured ways. |
The worker currently implements a tracing system to link cause and effect and follow all transitions that were triggered by a given stimulus. This trace ID is usually referred to as
stimulus_id
.The scheduler generates some of these
stimulus_id
s and includes them in RPC calls to the worker in a few places. However, it does not trace its own transitions making it very hard to infer why such a stimulus was generated. Introducing the same system on scheduler side and including the appropriate IDs in requests to the worker would allow us to close the circle and reconstruct a cluster wide history and link all transitions which were caused by an event.The most difficult thing to figure out is where to generate the unique
stimulus_id
s since if we just keep on passing the IDs through every call, every transition would be linked by the same ID.My thinking is that new events/stimulus IDs should generated on the following events (please correct me if I miss anything)
All other state modifying handlers should accept a stimulus ID and forward it accordingly through the transition enginer.
Similar to the worker, the story should not only filter on keys but also stim IDs.
The text was updated successfully, but these errors were encountered: