-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we set limit
?
#890
Comments
I'd say this is purely a missing API as there is no reason not to allow tasks to redefine the batch size that is being used under the hood. The batch size will need to be passed to the
So I'll try to put it on our roadmap or PRs are always welcomed! Unless someone else from maintainers remembers/has concerns on why this feature should not exist The only requirement for this feature is that the tasks shouldn't be forced to always specify the batch size and it still should default to |
I'm curious if you have any examples where a query may benefit from a larger batch size. Or is it just about reducing the number of queries performed by the task overall. I also just realized that we need to be careful and make sure people don't use way to big batch sizes because for example if the task uses batches of UPD: |
Ok, thanks - good to know I wasn't missing something. I'll try to get up a PR! An example would be: say you have to process 50 million records. If you fetch 100 per query, that's 500,000 queries. As opposed to 50,000 or 10,000 if you fetch 1k or 5k per query. If the query takes negligibly longer for the larger batch size, it's probably better to do a larger batch size to greatly reduce the number of queries in a potentially small time window. This is presuming each That's a good point re: So in our case, a big batch size is irrelevant. But I understand we have to keep that in mind for changes to the gem. |
This issue has been marked as stale because it has not been commented on in two months. |
Maple and I hacked a bit on this during RailsConf Hack day and came up with a simple proof of concept: https://github.com/Shopify/maintenance_tasks/tree/custom-relation-limit-size Thoughts on the API @nvasilevski @bray ? Opted for a task-level DSL |
I'm a little worried that any name with But the prototype looks clean! |
I agree with Nikita. Looks good, but |
Naming is hard 😅 I was trying to stay clear of |
When
collection
returns anActiveRecord::Relation
, how do we setlimit
for each "batch"?job-iteration
's Best Practices says it defaults to 100. 100 is too small for expensive queries and/or large datasets.It looks like that's passed in here.
But for
maintenance_tasks
, in thecollection
method, we're not explicitly building an enumerator. So how can we pass in or set the@batch_size
? I've tried some guesses like@batch_size
,@of
,def batch_size
,def of
inside theMaintenanceTasks::Task
.I also tried returning a few different classes in the collection method, but none have worked:
#find_each(batch_size: 2000)
gets us what we want, but this returns an Enumerator and#collection
can't be an Enumerator (it gives the error "#collection must be either an Active Record Relation, ActiveRecord::Batches::BatchEnumerator, Array, or CSV.").#in_batches(batch_size: 2000)
works (it returns anActiveRecord::Batches::BatchEnumerator
), but this is only if you want to process a batch in process, rather than individual items.#in_batches(batch_size: 2000).each_record
also returns an Enumerator.The text was updated successfully, but these errors were encountered: