-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove no listeners #717
Remove no listeners #717
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 1 files at r5.
Reviewable status: 0 of 1 LGTMs obtained
a discussion (no related file):
We have a couple options here.
- We could call
notify_channel.send()
to tell the worker to terminate the job. - We could cancel only queued actions.
I prefer 1, but it'll be more difficult.
a discussion (no related file):
At this phase, I suggest writing some tests.
nativelink-config/src/schedulers.rs
line 124 at r6 (raw file):
/// Remove action from queue after this much time has elapsed without a listener /// amount of time in seconds.
nit: End the sentence above and make this line say:
Units in seconds.
nativelink-scheduler/src/simple_scheduler.rs
line 356 at r6 (raw file):
self.queued_actions_set.insert(action_info.clone()); self.queued_actions.insert(action_info.clone(), awaited_action); self.metrics.time_without_listeners.reset();
Don't store this number on metrics
. Metrics is only for statistics. We should not rely on it for any required features. Users can disable all metrics everywhere, so this may cause issues in weird ways.
Got it, my instinct was that using notifications would be a better way but need to read more into how to get it working, the original issue only referenced the queue so got a little confused. I think doing it right will pay dividends in the long run so lets go with number 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
a discussion (no related file):
Previously, allada (Nathan (Blaise) Bruer) wrote…
At this phase, I suggest writing some tests.
Please reword the github description, as it will become the commit message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 4 files at r7.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
nativelink-scheduler/src/simple_scheduler.rs
line 210 at r8 (raw file):
workers: Workers, active_actions: HashMap<Arc<ActionInfo>, RunningAction>, actions_no_listeners: HashMap<Arc<ActionInfoHashKey>, SystemTime>,
nit: Lets track this in field in AwaitedAction
with something like last_client_event
. This field will need to be updated on every connect of a client and a timer will need to update it periodically if it still has listeners.
nativelink-scheduler/src/worker.rs
line 208 at r8 (raw file):
} pub fn drop_action(&mut self, action_info: &Arc<ActionInfo>) {
nit: We don't want to do anything here, we need to worker to send the RPC message back to the scheduler saying it errored/completed and let the normal flow take it.
nativelink-util/src/action_messages.rs
line 634 at r8 (raw file):
/// Worker is executing the action. Executing, // TODO(zbirenbaum) Add dropped status and rpc handling for actions which were cancelled due to disconnection
nit: This is not the RPC message you are looking for, check worker_api.proto
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 LGTMs obtained, and pending CI: Bazel Dev / ubuntu-22.04, asan / ubuntu-22.04, docker-compose-compiles-nativelink (20.04), docker-compose-compiles-nativelink (22.04), macos-13, ubuntu-22.04, zig-cc ubuntu-20.04, zig-cc ubuntu-22.04
nativelink-scheduler/src/simple_scheduler.rs
line 61 at r2 (raw file):
/// Default timeout for actions without any listeners /// If this changes, remember to change the documentation in the config. const DEFAULT_NO_LISTENERS_TIMEOUT_S: u64 = 10;
nit: Lets make this 60 seconds.
Closing this and continuing in #778 which is much more polished and addresses most of the problems here |
Fixes #338
There might be a better way to do this if listeners can be checked before trying to send off the result of an action to them, but I don't know of a way to do that so I'm opening this as a draft. Also was wondering what the easiest way to test for this behavior would be, as I couldn't find a test case for actions without listeners.
This change isdata:image/s3,"s3://crabby-images/d0bb7/d0bb7f7625ca5bf5c3cf7a2b7a514cf841ab8395" alt="Reviewable"