Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External clusters: jobs sent at interval do not get a machine even if there is still room #5437

Closed
Tracked by #950
sanderegg opened this issue Mar 7, 2024 · 3 comments
Assignees
Labels
a:autoscaling autoscaling service in simcore's stack bug buggy, it does not work as expected
Milestone

Comments

@sanderegg
Copy link
Member

sanderegg commented Mar 7, 2024

Issue shown by @mguidon :

  • from s4l start a simulation with 8 jobs -> cluster is started, 8 workers are started and take the jobs
  • from 2nd instance of s4l, start a simulation with X jobs
    --> they get queued instead of using the 2 remaining machines
@sanderegg sanderegg self-assigned this Mar 7, 2024
@sanderegg sanderegg transferred this issue from ITISFoundation/osparc-issues Mar 7, 2024
@sanderegg sanderegg added a:autoscaling autoscaling service in simcore's stack bug buggy, it does not work as expected labels Mar 7, 2024
@sanderegg
Copy link
Member Author

current stand:
try to reproduce with a jupyter notebook and sleepers

  • deployed sleeper 2.2.0 to be able to sleep longer than 60 seconds
  • fixed other issues along the way
  • ongoing

@sanderegg sanderegg added this to the Schoggilebe milestone Mar 8, 2024
@sanderegg
Copy link
Member Author

#5474 now improves the situation:

  • machines are now created when needed
  • the balancing of jobs is improved even when still not optimal

@sanderegg
Copy link
Member Author

sanderegg commented Mar 15, 2024

closing this entry as the main issue of non starting machines was fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a:autoscaling autoscaling service in simcore's stack bug buggy, it does not work as expected
Projects
None yet
Development

No branches or pull requests

1 participant