How to scale from 1 to 2 jobs in parallel #1186
-
I have this configuration now: apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: my-scaled-job
namespace: simulations
spec:
pollingInterval: 5 # Optional. Default: 30 seconds
successfulJobsHistoryLimit: 50 # Optional. Default: 100. How many completed jobs should be kept.
failedJobsHistoryLimit: 50 # Optional. Default: 100. How many failed jobs should be kept.
maxReplicaCount: 100 # Optional. Default: 100
jobTargetRef:
parallelism: 1 # [max number of desired pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
completions: 1 # [desired number of successfully finished pods](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#controlling-parallelism)
activeDeadlineSeconds: 3600 # Specifies the duration in seconds relative to the startTime that the job may be active before the system tries to terminate it; value must be positive integer
backoffLimit: 6 # Specifies the number of retries before marking this job failed. Defaults to 6
template:
# describes the [job template](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/)
metadata:
labels:
jobgroup: my-sim
spec:
containers:
- name: longrunningsimulation
image: mycontainerregistry.azurecr.io/mycompanyname/simulationservice:{imagetag}
command: ['node', 'heavy-simulation-code.js']
env:
- name: SIMULATION_JOB_QUEUE_NAME
value: simulation-job-queue
- name: STORAGE_ACCOUNT_CONNECTION_STRING
valueFrom:
secretKeyRef:
name: my-secrets
key: STORAGE_ACCOUNT_CONNECTION_STRING
restartPolicy: Never
terminationGracePeriodSeconds: 3600
triggers:
- type: azure-queue
metadata:
queueName: simulation-job-queue
queueLength: '1' # Optional. Queue length target for HPA. Default: 5 messages
connectionFromEnv: STORAGE_ACCOUNT_CONNECTION_STRING The docker image referred to in this job is a NodeJS application doing the following:
It seems to work fine the most of the time, but I have encountered the following issues now:
A lot of different cases seems to be working, like adding 10 messages "at the same time" seems to make all of them to start at once and complete OK. But gradually adding messages does not seem to work as I want. I really would like KEDA to spin up new jobs as soon as possible after a new message arrives, as long as there are resources available etc. But now, the last item added is not processed until the second last item is done. What am I doing wrong? I guess I would like to set the I also tried setting a long lease period on my message and delete it at the end, in case the scaling metrics looked at the sum of number of visible and invisible messages in the queue, but it seems like it is scaling solely on the number of visible queue items (at least that seems to be my hunch). So compared to the tables from https://keda.sh/docs/2.0/concepts/scaling-jobs/#details, I think our case is this:
|
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 12 replies
-
Maybe it is better if I move this into an issue instead? I can't really understand how this is suppsed to be the desired behaviour for anyone for scaling jobs (however I am open to get some more contexts/examples on how the scale calculation works or is supposed to be used, in case it is my mental model that is just wrong). But if it is not an obvious config mistake I have made, I think I either have to get it resolved or worked around, or - worst case - rip KEDA out of our systems again. There are just so few alternatives solving the job-scaling approach, so I was very excited when I saw that Job-scaling-feature coming into KEDA - I really hope this can serve my use case for long running simulations in kubernetes, but right now it just doesn't. |
Beta Was this translation helpful? Give feedback.
-
Just tested the following combinations:
|
Beta Was this translation helpful? Give feedback.
-
I am also facing the same issue, please find my job.yaml. If i gradually add messages it is not scaling once the existing running pod completes after that only it is scaling. For example, if i have 6 messages in the queue, already 4 pods are running, anyone of the 4 pods completes then it is picking another one from the queue and scaling and remaining 3 messages are still waiting in the queue. I want like as soon as message arrives in the queue, it should scale one pod. can u please help me to fix this, Thanks.apiVersion: keda.sh/v1alpha1
kind: Service
|
Beta Was this translation helpful? Give feedback.
-
This not working for me as described here for RabbitMQ message. What scalingStrategy i should use to get this behaviour?! I'm using "accurate" as recommended above and still facing the same issue when a job is currently running, and a new message received in the queue, it doesn't scale up a new job. I would like to spin up a new job as long as there's a message in the queue regardless if there's a job running or no. |
Beta Was this translation helpful? Give feedback.
-
@audunsol Thank u for ur reply. It's the latest version of KEDA 2.9. I provided the I don't think this has to do anything of acknowledgement of messages. I'm having a problem when there's an already a long running Job processing some message that had already been pulled from the queue/acknowledged---> So the queue is now empty but a job is running processing it. ---> Then a new single message is received (1 message in the queue)---> No new job gets created to process this new message.
|
Beta Was this translation helpful? Give feedback.
@audunsol
I am also facing the same issue, please find my job.yaml. If i gradually add messages it is not scaling once the existing running pod completes after that only it is scaling. For example, if i have 6 messages in the queue, already 4 pods are running, anyone of the 4 pods completes then it is picking another one from the queue and scaling and remaining 3 messages are still waiting in the queue.
I want like as soon as message arrives in the queue, it should scale one pod. can u please help me to fix this, Thanks.
apiVersion: keda.sh/v1alpha1
kind: ScaledJob
metadata:
name: extract-feature-scaledjob
namespace: test
spec:
jobTargetRef:
parallelism: 1 # max number of desired pods
com…