Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add index to good_jobs to improve querying candidate jobs #726

Merged
merged 2 commits into from
Oct 22, 2022

Conversation

mitchellhenke
Copy link
Contributor

@mitchellhenke mitchellhenke commented Oct 19, 2022

I think we might be having the same issue as described in #720 and the index appears to help quite a bit.

I tried to follow the index naming scheme, but the name is a bit wordy, so I wasn't sure 🙂

SQL Explain on a `good_jobs` table with hundreds of thousands of unfinished jobs
EXPLAIN ANALYZE SELECT "good_jobs".* FROM "good_jobs" WHERE "good_jobs"."id" IN (WITH "rows" AS MATERIALIZED ( SELECT "good_jobs"."id", "good_jobs"."active_job_id" FROM "good_jobs" WHERE "good_jobs"."finished_at" IS NULL AND ("good_jobs"."scheduled_at" <= '2022-10-19 01:31:28.921876' OR "good_jobs"."scheduled_at" IS NULL) ORDER BY priority DESC NULLS LAST, created_at ASC ) SELECT "rows"."id" FROM "rows" WHERE pg_try_advisory_lock(('x' || substr(md5('good_jobs' || '-' || "rows"."active_job_id"::text), 1, 16))::bit(64)::bigint) LIMIT 1) ORDER BY priority DESC NULLS LAST, created_at ASC;
# Before index
 Sort  (cost=106863.51..106863.52 rows=1 width=511) (actual time=1138.681..1138.684 rows=1 loops=1)
   Sort Key: good_jobs.priority DESC NULLS LAST, good_jobs.created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=106855.47..106863.50 rows=1 width=511) (actual time=1138.661..1138.664 rows=1 loops=1)
         ->  HashAggregate  (cost=106855.05..106855.06 rows=1 width=16) (actual time=1138.620..1138.622 rows=1 loops=1)
               Group Key: rows.id
               Batches: 1  Memory Usage: 24kB
               ->  Limit  (cost=106854.89..106855.04 rows=1 width=16) (actual time=1138.612..1138.613 rows=1 loops=1)
                     CTE rows
                       ->  Sort  (cost=105458.51..106854.89 rows=558553 width=44) (actual time=1138.590..1138.591 rows=1 loops=1)
                             Sort Key: good_jobs_1.priority DESC NULLS LAST, good_jobs_1.created_at
                             Sort Method: external merge  Disk: 31768kB
                             ->  Seq Scan on good_jobs good_jobs_1  (cost=0.00..34955.90 rows=558553 width=44) (actual time=0.056..374.313 rows=558583 loops=1)
                                   Filter: ((finished_at IS NULL) AND ((scheduled_at <= '2022-10-19 01:31:28.921876'::timestamp without time zone) OR (scheduled_at IS NULL)))
                                   Rows Removed by Filter: 488
                     ->  CTE Scan on rows  (cost=0.00..26531.27 rows=186184 width=16) (actual time=1138.610..1138.611 rows=1 loops=1)
                           Filter: pg_try_advisory_lock(((('x'::text || substr(md5(('good_jobs-'::text || (active_job_id)::text)), 1, 16)))::bit(64))::bigint)
         ->  Index Scan using good_jobs_pkey on good_jobs  (cost=0.42..8.44 rows=1 width=511) (actual time=0.035..0.035 rows=1 loops=1)
               Index Cond: (id = rows.id)
 Planning Time: 1.805 ms
 Execution Time: 1156.305 ms
 (21 rows)
# After index
 Sort  (cost=104596.63..104596.63 rows=1 width=508) (actual time=0.094..0.096 rows=1 loops=1)
   Sort Key: good_jobs.priority DESC NULLS LAST, good_jobs.created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=104588.59..104596.62 rows=1 width=508) (actual time=0.082..0.084 rows=1 loops=1)
         ->  HashAggregate  (cost=104588.17..104588.18 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
               Group Key: rows.id
               Batches: 1  Memory Usage: 24kB
               ->  Limit  (cost=104588.01..104588.15 rows=1 width=16) (actual time=0.059..0.060 rows=1 loops=1)
                     CTE rows
                       ->  Index Scan using index_good_jobs_jobs_on_priority_created_at_when_unfinished on good_jobs good_jobs_1  (cost=0.42..104588.01 rows=703427 width=44) (actual time=0.041..0.041 rows=1 loops=1)
                             Filter: ((scheduled_at <= '2022-10-19 01:31:28.921876'::timestamp without time zone) OR (scheduled_at IS NULL))
                     ->  CTE Scan on rows  (cost=0.00..33412.78 rows=234476 width=16) (actual time=0.058..0.058 rows=1 loops=1)
                           Filter: pg_try_advisory_lock(((('x'::text || substr(md5(('good_jobs-'::text || (active_job_id)::text)), 1, 16)))::bit(64))::bigint)
         ->  Index Scan using good_jobs_pkey on good_jobs  (cost=0.42..8.44 rows=1 width=508) (actual time=0.014..0.014 rows=1 loops=1)
               Index Cond: (id = rows.id)
 Planning Time: 0.443 ms
 Execution Time: 0.210 ms
(17 rows)

@bensheldon
Copy link
Owner

@ mitchellhenke thank you! This is great: Index Scan 🎉

I tried to follow the index naming scheme, but the name is a bit wordy, so I wasn't sure 🙂

It's perfect 😄 Naming indexes is one of my least favorite activities.

@mitchellhenke
Copy link
Contributor Author

Thank you for your work in making GoodJob!

I did also experiment with adding scheduled_at, but like your comment here, did not see any significant improvements. I suspect we'll be doing some more load tests today, and will test out this index.

@bensheldon
Copy link
Owner

bensheldon commented Oct 22, 2022

@mitchellhenke another suggestion, if you're still up for profiling, is seeing if Postgres will use multiple indexes with a bitmap union: https://www.postgresql.org/docs/15/indexes-bitmap-scans.html

I am somewhat skeptical because I have yet to see a real query plan that implements this strategy, but I'm imagining that maybe three indexes could be unioned by Postgres:

add_index :good_jobs, [:priority, :created_at], order: { priority: "DESC NULLS LAST", created_at: :asc }, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_priority_created_at_when_unfinished

add_index :good_jobs, :queue_name, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_queue_name

# this index already exists, and I haven't ever seen it used in a bitmap union
add_index :good_jobs, :scheduled_at, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_scheduled_at

@bensheldon bensheldon merged commit 9df954c into bensheldon:main Oct 22, 2022
@bensheldon bensheldon added the enhancement New feature or request label Oct 22, 2022
@mitchellhenke
Copy link
Contributor Author

I haven’t gotten a chance to try those indexes, but one thing did occur to me. Would it be possible to always set scheduled_at to allow dropping the OR scheduled_at IS NULL portion of those queries? Was hoping that might make it more amenable to indexing on that column.

@bensheldon
Copy link
Owner

Would it be possible to always set scheduled_at to allow dropping the OR scheduled_at IS NULL portion of those queries?

I've been reluctant to change the meaning of the scheduled_at column, but I'm open to changing it to be purely for queuing performance, and leaving the user-intention in the ActiveJob params json. I've also considered that for improving how Concurrency Controls work (currently, when there is a concurrency "miss" a new Execution is enqueued, whereas it would be lighter-weight to simply update the scheduled_at on the existing Execution).

An alternative is querying/indexing using a COALESCE statement, which I think would look something like this:

ORDER BY priority DESC NULLS LAST, COALESCE(scheduled_at, created_at) ASC, created_at ASC

I dunno if it's much better than the status quo. That's because ordering by priority (and also querying by WHERE queue_name IN (?)) means that Postgres still has to do filtering of the values as it scans over the index and I think those values are variably "compact" depending on the workload.

I've also been looking at Postgres 11+ feature of index INCLUDE which if I understand it might be a marginal improvement to ensure that the scheduled_at and queue_name filtering are more efficient. Postgres 10 is EOL on Nov 10, 2022.

@coorasse
Copy link

coorasse commented Nov 3, 2022

Adding the index sould be done in a separate migration, right? Otherwise people updating good_job from a previous version won't get them

@bensheldon
Copy link
Owner

@coorasse yep, you'll need to update to the latest version of GoodJob and run bin/rails g good_job:update and the migration with this index will be created. And then you'll have to apply it.

@coorasse
Copy link

coorasse commented Nov 3, 2022

Thanks! I missed this part of the docs.

@bensheldon
Copy link
Owner

bensheldon commented Sep 2, 2023

fyi, as of #928, all jobs records will have scheduled_at column set regardless of whether they are scheduled in the future or not (immediate jobs can be identified where created_at = scheduled_at)

mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.
mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
mkrfowler added a commit to mkrfowler/good_job that referenced this pull request Jan 14, 2024
The preexisting index (introduced in bensheldon#726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In bensheldon#883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
bensheldon pushed a commit that referenced this pull request Jan 19, 2024
…1213)

The preexisting index (introduced in #726) gave a direct answer to the
materialized subquery used to find candidate jobs, subject to the ordering rules
that were in place at that time.  In #883, `GoodJob` deprecated that ordering in
favour of a lower-is-more-important scheme, aligning with Active Job.

When there are a substantial number of completed jobs, the `priority desc` index
does not allow for a straight index scan, and may not actually be used by the
planner.  Aligning the sort orders here allows for the subquery to be satisfied
directly.

Although `ASC NULLS LAST` is the default (as observable in the generated schema)
it seems appropriate to be explicit in the actual index declaration, although
mostly for consistency with the previous version.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants