Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slurm Collector: Handle job state / missing timestamps #811

Closed
rkleinem opened this issue May 8, 2024 · 0 comments · Fixed by #844
Closed

Slurm Collector: Handle job state / missing timestamps #811

rkleinem opened this issue May 8, 2024 · 0 comments · Fixed by #844

Comments

@rkleinem
Copy link
Collaborator

rkleinem commented May 8, 2024

Fist part of the problem

Currently the collector naively takes a list of job states from the config file and tries to collect all corresponding jobs.
But obviously not all job states make sense in this context. E.g. Pending makes no sense.
I think that we should document a list of job states that can be sensibly used with this collector.

Second part

There are job states that are a little more involved, like Cancelled. You might want to account for cancelled jobs when they were cancelled after running for a few days.
On the other hand there is no guarantee that a cancelled job was ever started. In this case the start_time is Unknown and tokenization of the sacct output will fail at

let v = match kc.key_type.parse(&v) {

I believe we need to define which fields might be missing for what job states and have the collector ignore certain entries. (Like a Cancelled job with no start_time instead of crashing.
I didn't think the list through (https://slurm.schedmd.com/sacct.html#SECTION_JOB-STATE-CODES). Cancelled might be the only problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant