Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Trim job IDs from failed job tags #743

Closed
wants to merge 1 commit into from
Closed

[3.x] Trim job IDs from failed job tags #743

wants to merge 1 commit into from

Conversation

sebdesign
Copy link
Contributor

@sebdesign sebdesign commented Jan 13, 2020

In light of #715 and #740, and after reading the insights from @travisaustin, I think I found where the memory hog comes from.

Everytime a job fails, the ID is added to each tag. If there is no other failure in the next 48 hours, the tags expires. But otherwise, the StoreTagsForFailedJob keeps adding job IDs and updates the expiration date, but the previous IDs were never removed, which probably causes memory issues.

In this commit we'll remove the job IDs for that are older than 48 hours for each tag, in respect of the TTL in StoreTagsForFailedJob.

For example:

  • On 2020-01-01 00:00:00 a job with ID of 1 and tagged App\User:1 fails. The tag failed:App\User:1 is added with the job ID 1.
  • So after 2880 minutes, i.e. on 2020-01-03 00:00:00 the failed:User:1 tag should expire.
  • Meanwhile, on 2020-01-02 00:00:00 another job with ID of 2 and tagged App\User:1 fails. The ID 2 is added to the failed:App\User:1 tag, and the expiration of the tag is delayed for 2880 minutes, i.e. 2020-01-04 00:00:00.
  • If there are more failures in the next 2880 minutes, the new IDs will keep getting appended, and the previous IDs are never removed.

Now with this PR:

  • If there is a failure on 2020-01-03 00:00:00 for a job with ID of 3, it will be added as usual to the failed:App\User:1 tag, along with the existing IDs 1 and 2, but now the score is the precise timestamp that it was added, like in RedisJobRepository@storeJobReference.
  • Then, it will trim all the job IDs that were added 2880 minutes ago or earlier from this tag, like in RedisJobRepository@trimFailedJobs. So the id 1 will be removed.

If this is correct, then we could revert #741 and trim the job IDs from the recent: tags each time a job is added, like with the failed: tags.

Everytime a job fails, the ID is added to each tag. If there is no other failure in the next 48 hours, the tags expires.  But otherwise, the `StoreTagsForFailedJob` keeps adding job IDs and updates the expiration date, but the previous IDs were **never** removed, which probably causes memory issues.

In this commit we'll remove the job IDs for that are older than 48 hours for each tag, in respect of the TTL in `StoreTagsForFailedJob`.
@travisaustin
Copy link

travisaustin commented Jan 14, 2020

I think you are mostly correct, but I think you are only solving half the issue.

For example, in my production environment, the Redis keys that are filling up are:

horizon:failed:<MODEL CLASS NAME>:<MODEL ID>
horizon:recent:<MODEL CLASS NAME>:<MODEL ID>

In my case, each of these keys has a TTL of 15 minutes. The problem is that I create a job of each ID of each model every 2 minutes (our workflow polls a remote system for each model every 2 minutes). So the TTL starts at 15 minutes, and when it hits ~13 minutes, a new job is dispatched for the model, which adds a new Job ID to that key, and the TTL is reset to 15 minutes. Since it never times out, my keys have 50,000 or more job IDs each even though Horizon only shows about 1,000 recent jobs.

I have not tested your PR, but, based on your writeup, you are only removing keys from horizon:failed. I would like to see horizon:recent trimmed, as well.

Thanks for the PR. I am excited to have it!!

@sebdesign
Copy link
Contributor Author

@travisaustin the horizon:recent tags feature was removed in #741, that's why i'ts not implemented in this PR.

How did you set the TTL to 15 minutes? The TTL for the failed tags is hardcoded to 2880 seconds here.

Regarding your example, if this PR gets merged, you won't accumulate 50,000 IDs per key, because each time a job ID is added to the queue, the old IDs (now() - 2880 min) will be removed. Redis will know which IDs to remove because they are now added with a score that matches the timestamp they were added. I hope that makes sense to you!

@driesvints driesvints changed the title Trim job IDs from failed job tags [3.x] Trim job IDs from failed job tags Jan 14, 2020
* @param array $tags
* @return void
*/
public function trim(array $tags);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't add this in a minor release because it'll break implementations who don't have this method defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means when trim is called it will also break those implementations, since they don't define this method?

Copy link
Member

@driesvints driesvints Jan 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll break as soon as they install this package because the implementation won't adhere to the contract.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if we didn't add it to the contract, the trim method would fail during runtime because it's probably not defined on the implementation, when it's called on the listener.

@taylorotwell
Copy link
Member

I think I will just hold off on this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants