-
Notifications
You must be signed in to change notification settings - Fork 665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[3.x] Trim job IDs from failed job tags #743
Conversation
Everytime a job fails, the ID is added to each tag. If there is no other failure in the next 48 hours, the tags expires. But otherwise, the `StoreTagsForFailedJob` keeps adding job IDs and updates the expiration date, but the previous IDs were **never** removed, which probably causes memory issues. In this commit we'll remove the job IDs for that are older than 48 hours for each tag, in respect of the TTL in `StoreTagsForFailedJob`.
I think you are mostly correct, but I think you are only solving half the issue. For example, in my production environment, the Redis keys that are filling up are:
In my case, each of these keys has a TTL of 15 minutes. The problem is that I create a job of each ID of each model every 2 minutes (our workflow polls a remote system for each model every 2 minutes). So the TTL starts at 15 minutes, and when it hits ~13 minutes, a new job is dispatched for the model, which adds a new Job ID to that key, and the TTL is reset to 15 minutes. Since it never times out, my keys have 50,000 or more job IDs each even though Horizon only shows about 1,000 recent jobs. I have not tested your PR, but, based on your writeup, you are only removing keys from Thanks for the PR. I am excited to have it!! |
@travisaustin the How did you set the TTL to 15 minutes? The TTL for the failed tags is hardcoded to 2880 seconds here. Regarding your example, if this PR gets merged, you won't accumulate 50,000 IDs per key, because each time a job ID is added to the queue, the old IDs (now() - 2880 min) will be removed. Redis will know which IDs to remove because they are now added with a score that matches the timestamp they were added. I hope that makes sense to you! |
* @param array $tags | ||
* @return void | ||
*/ | ||
public function trim(array $tags); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't add this in a minor release because it'll break implementations who don't have this method defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means when trim
is called it will also break those implementations, since they don't define this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll break as soon as they install this package because the implementation won't adhere to the contract.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if we didn't add it to the contract, the trim
method would fail during runtime because it's probably not defined on the implementation, when it's called on the listener.
I think I will just hold off on this for now. |
In light of #715 and #740, and after reading the insights from @travisaustin, I think I found where the memory hog comes from.
Everytime a job fails, the ID is added to each tag. If there is no other failure in the next 48 hours, the tags expires. But otherwise, the
StoreTagsForFailedJob
keeps adding job IDs and updates the expiration date, but the previous IDs were never removed, which probably causes memory issues.In this commit we'll remove the job IDs for that are older than 48 hours for each tag, in respect of the TTL in
StoreTagsForFailedJob
.For example:
2020-01-01 00:00:00
a job with ID of1
and taggedApp\User:1
fails. The tagfailed:App\User:1
is added with the job ID 1.2020-01-03 00:00:00
thefailed:User:1
tag should expire.2020-01-02 00:00:00
another job with ID of2
and taggedApp\User:1
fails. The ID2
is added to thefailed:App\User:1
tag, and the expiration of the tag is delayed for 2880 minutes, i.e.2020-01-04 00:00:00
.Now with this PR:
2020-01-03 00:00:00
for a job with ID of3
, it will be added as usual to thefailed:App\User:1
tag, along with the existing IDs 1 and 2, but now the score is the precise timestamp that it was added, like inRedisJobRepository@storeJobReference
.RedisJobRepository@trimFailedJobs
. So the id1
will be removed.If this is correct, then we could revert #741 and trim the job IDs from the
recent:
tags each time a job is added, like with thefailed:
tags.