Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] - Nightly job run twice #789

Open
wandering-tales opened this issue Oct 22, 2024 · 8 comments
Open

[BUG] - Nightly job run twice #789

wandering-tales opened this issue Oct 22, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@wandering-tales
Copy link

wandering-tales commented Oct 22, 2024

Describe the bug

It seems this long standing issue (see #52, #132, #159, and #601) is still unresolved.

I have configured a daily job running at 3:00, which processes a batch of entities relevant to my domain. The job completes in around 170ms:

_, _ = scheduler.NewJob(
		gocron.DailyJob(1, gocron.NewAtTimes(gocron.NewAtTime(3, 0, 0))),
		gocron.NewTask(task.MyTask, myParam1, myParam2),
		gocron.WithEventListeners(
			gocron.AfterJobRunsWithError(func(jobID uuid.UUID, jobName string, err error) {
				log.Printf("failed my task: %v", err)
			})),
)

but this occurred last night:

2024-10-22T03:00:00.245294425Z stderr F 2024/10/22 03:00:00 processing thing with ID 59e50053-0d33-4729-9732-9e69ade399a2
...
2024-10-22T03:00:00.414909773Z stderr F 2024/10/22 03:00:00 processing thing with ID 59e50053-0d33-4729-9732-9e69ade399a2

Version

  • gocron: 2.12.1

Expected behavior

The task should be run only once.

Additional context

I have tested the scheduler behavior locally and the NextRuns method of the job returned what is expected.

@wandering-tales wandering-tales added the bug Something isn't working label Oct 22, 2024
@rsvix
Copy link

rsvix commented Nov 4, 2024

EDIT: Removing my comment since in my case it was an implementation error, sorry guys

@JohnRoesler
Copy link
Contributor

heyo @wandering-tales

Yes, this has been a very hard to track down issue, and not reproducible with any regularity, so it's been incredibly difficult to debug.

I've considered different ways to try and account for this but don't have a solid solution at this point.

One idea, is check if the time returned by calling the job type's .next() method is too close to the lastRun or time.Now(). But, I have take a couple stabs at writing up some logic for that and it's gotten quite hairy.

The bug here, at least in my estimation, seems to be that the time being passed into the .next() method to schedule from, is somehow wrong, leading to the same next value being returned. Either that, or somehow two instances of the job are being scheduled 🤔

@wandering-tales
Copy link
Author

Hi @JohnRoesler

Sorry for the late reply. I confirm this is keep happening systematically. Here's an excerpts of relevant Grafana logs entries (in the end there are no sensitive information, so I can paste it here):

Screenshot 2024-12-20 at 15 52 15

Considering that those ID's are processed in a deterministic order and doing some cheap math, it seems that the two job instances starts within ~36ms (691849517ns - 655338486ns).

Luckily those operations performed by the job are idempotent, so no big deal in this case.

@JohnRoesler
Copy link
Contributor

One thing I noted in the code is that for DailyJobs, it's patching on the nanosecond value as well, which, I believe could be the cause of the jobs running twice like you are seeing. Let me put up a PR, and we can look it over, and then get it released so you can test.

@JohnRoesler
Copy link
Contributor

Hey @wandering-tales - would you be willing to test out https://github.com/go-co-op/gocron/releases/tag/v2.14.2-rc1?

@JohnRoesler
Copy link
Contributor

One other thought I had, was that we could introduce a "minimum duration" config on the scheduler to allow tuning the minimum amount of time that can pass between job runs. For example, if you are running only Daily jobs on your scheduler, and only once a day, you could set a fairly high value for your minimum value (e.g. 23 hours, to be safe for daylight time, but even a minimum duration of 1 min or even 1 second, would ideally prevent this issue). This really shouldn't be needed, and the implementation would be interesting/tricky, but I don't recall that I have ever been able to reproduce this bug, so I'm really trying to identify any potential ways that slight time inconsistencies could be protected from.

@wandering-tales
Copy link
Author

Hey @wandering-tales - would you be willing to test out https://github.com/go-co-op/gocron/releases/tag/v2.14.2-rc1?

Sure thing! I'll try it out next week.

@wandering-tales
Copy link
Author

One other thought I had, was that we could introduce a "minimum duration" config on the scheduler to allow tuning the minimum amount of time that can pass between job runs. For example, if you are running only Daily jobs on your scheduler, and only once a day, you could set a fairly high value for your minimum value (e.g. 23 hours, to be safe for daylight time, but even a minimum duration of 1 min or even 1 second, would ideally prevent this issue). This really shouldn't be needed, and the implementation would be interesting/tricky, but I don't recall that I have ever been able to reproduce this bug, so I'm really trying to identify any potential ways that slight time inconsistencies could be protected from.

For sure adding such an extra guard would help, but I agree with you it could result in an overcomplication. Let's see if your latest fixes are enough to actually address this edge case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants