Add index to `good_jobs` to improve querying candidate jobs #726

mitchellhenke · 2022-10-19T02:23:58Z

I think we might be having the same issue as described in #720 and the index appears to help quite a bit.

I tried to follow the index naming scheme, but the name is a bit wordy, so I wasn't sure 🙂

SQL Explain on a `good_jobs` table with hundreds of thousands of unfinished jobs

EXPLAIN ANALYZE SELECT "good_jobs".* FROM "good_jobs" WHERE "good_jobs"."id" IN (WITH "rows" AS MATERIALIZED ( SELECT "good_jobs"."id", "good_jobs"."active_job_id" FROM "good_jobs" WHERE "good_jobs"."finished_at" IS NULL AND ("good_jobs"."scheduled_at" <= '2022-10-19 01:31:28.921876' OR "good_jobs"."scheduled_at" IS NULL) ORDER BY priority DESC NULLS LAST, created_at ASC ) SELECT "rows"."id" FROM "rows" WHERE pg_try_advisory_lock(('x' || substr(md5('good_jobs' || '-' || "rows"."active_job_id"::text), 1, 16))::bit(64)::bigint) LIMIT 1) ORDER BY priority DESC NULLS LAST, created_at ASC;

# Before index
 Sort  (cost=106863.51..106863.52 rows=1 width=511) (actual time=1138.681..1138.684 rows=1 loops=1)
   Sort Key: good_jobs.priority DESC NULLS LAST, good_jobs.created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=106855.47..106863.50 rows=1 width=511) (actual time=1138.661..1138.664 rows=1 loops=1)
         ->  HashAggregate  (cost=106855.05..106855.06 rows=1 width=16) (actual time=1138.620..1138.622 rows=1 loops=1)
               Group Key: rows.id
               Batches: 1  Memory Usage: 24kB
               ->  Limit  (cost=106854.89..106855.04 rows=1 width=16) (actual time=1138.612..1138.613 rows=1 loops=1)
                     CTE rows
                       ->  Sort  (cost=105458.51..106854.89 rows=558553 width=44) (actual time=1138.590..1138.591 rows=1 loops=1)
                             Sort Key: good_jobs_1.priority DESC NULLS LAST, good_jobs_1.created_at
                             Sort Method: external merge  Disk: 31768kB
                             ->  Seq Scan on good_jobs good_jobs_1  (cost=0.00..34955.90 rows=558553 width=44) (actual time=0.056..374.313 rows=558583 loops=1)
                                   Filter: ((finished_at IS NULL) AND ((scheduled_at <= '2022-10-19 01:31:28.921876'::timestamp without time zone) OR (scheduled_at IS NULL)))
                                   Rows Removed by Filter: 488
                     ->  CTE Scan on rows  (cost=0.00..26531.27 rows=186184 width=16) (actual time=1138.610..1138.611 rows=1 loops=1)
                           Filter: pg_try_advisory_lock(((('x'::text || substr(md5(('good_jobs-'::text || (active_job_id)::text)), 1, 16)))::bit(64))::bigint)
         ->  Index Scan using good_jobs_pkey on good_jobs  (cost=0.42..8.44 rows=1 width=511) (actual time=0.035..0.035 rows=1 loops=1)
               Index Cond: (id = rows.id)
 Planning Time: 1.805 ms
 Execution Time: 1156.305 ms
 (21 rows)

# After index
 Sort  (cost=104596.63..104596.63 rows=1 width=508) (actual time=0.094..0.096 rows=1 loops=1)
   Sort Key: good_jobs.priority DESC NULLS LAST, good_jobs.created_at
   Sort Method: quicksort  Memory: 25kB
   ->  Nested Loop  (cost=104588.59..104596.62 rows=1 width=508) (actual time=0.082..0.084 rows=1 loops=1)
         ->  HashAggregate  (cost=104588.17..104588.18 rows=1 width=16) (actual time=0.064..0.066 rows=1 loops=1)
               Group Key: rows.id
               Batches: 1  Memory Usage: 24kB
               ->  Limit  (cost=104588.01..104588.15 rows=1 width=16) (actual time=0.059..0.060 rows=1 loops=1)
                     CTE rows
                       ->  Index Scan using index_good_jobs_jobs_on_priority_created_at_when_unfinished on good_jobs good_jobs_1  (cost=0.42..104588.01 rows=703427 width=44) (actual time=0.041..0.041 rows=1 loops=1)
                             Filter: ((scheduled_at <= '2022-10-19 01:31:28.921876'::timestamp without time zone) OR (scheduled_at IS NULL))
                     ->  CTE Scan on rows  (cost=0.00..33412.78 rows=234476 width=16) (actual time=0.058..0.058 rows=1 loops=1)
                           Filter: pg_try_advisory_lock(((('x'::text || substr(md5(('good_jobs-'::text || (active_job_id)::text)), 1, 16)))::bit(64))::bigint)
         ->  Index Scan using good_jobs_pkey on good_jobs  (cost=0.42..8.44 rows=1 width=508) (actual time=0.014..0.014 rows=1 loops=1)
               Index Cond: (id = rows.id)
 Planning Time: 0.443 ms
 Execution Time: 0.210 ms
(17 rows)

bensheldon · 2022-10-19T14:55:25Z

@ mitchellhenke thank you! This is great: Index Scan 🎉

I tried to follow the index naming scheme, but the name is a bit wordy, so I wasn't sure 🙂

It's perfect 😄 Naming indexes is one of my least favorite activities.

mitchellhenke · 2022-10-19T15:07:07Z

Thank you for your work in making GoodJob!

I did also experiment with adding scheduled_at, but like your comment here, did not see any significant improvements. I suspect we'll be doing some more load tests today, and will test out this index.

bensheldon · 2022-10-22T00:29:46Z

@mitchellhenke another suggestion, if you're still up for profiling, is seeing if Postgres will use multiple indexes with a bitmap union: https://www.postgresql.org/docs/15/indexes-bitmap-scans.html

I am somewhat skeptical because I have yet to see a real query plan that implements this strategy, but I'm imagining that maybe three indexes could be unioned by Postgres:

add_index :good_jobs, [:priority, :created_at], order: { priority: "DESC NULLS LAST", created_at: :asc }, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_priority_created_at_when_unfinished

add_index :good_jobs, :queue_name, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_queue_name

# this index already exists, and I haven't ever seen it used in a bitmap union
add_index :good_jobs, :scheduled_at, where: "finished_at IS NULL", name: :index_good_jobs_jobs_on_scheduled_at

mitchellhenke · 2022-10-23T13:58:42Z

I haven’t gotten a chance to try those indexes, but one thing did occur to me. Would it be possible to always set scheduled_at to allow dropping the OR scheduled_at IS NULL portion of those queries? Was hoping that might make it more amenable to indexing on that column.

bensheldon · 2022-10-24T22:36:52Z

Would it be possible to always set scheduled_at to allow dropping the OR scheduled_at IS NULL portion of those queries?

I've been reluctant to change the meaning of the scheduled_at column, but I'm open to changing it to be purely for queuing performance, and leaving the user-intention in the ActiveJob params json. I've also considered that for improving how Concurrency Controls work (currently, when there is a concurrency "miss" a new Execution is enqueued, whereas it would be lighter-weight to simply update the scheduled_at on the existing Execution).

An alternative is querying/indexing using a COALESCE statement, which I think would look something like this:

ORDER BY priority DESC NULLS LAST, COALESCE(scheduled_at, created_at) ASC, created_at ASC

I dunno if it's much better than the status quo. That's because ordering by priority (and also querying by WHERE queue_name IN (?)) means that Postgres still has to do filtering of the values as it scans over the index and I think those values are variably "compact" depending on the workload.

I've also been looking at Postgres 11+ feature of index INCLUDE which if I understand it might be a marginal improvement to ensure that the scheduled_at and queue_name filtering are more efficient. Postgres 10 is EOL on Nov 10, 2022.

coorasse · 2022-11-03T20:55:55Z

Adding the index sould be done in a separate migration, right? Otherwise people updating good_job from a previous version won't get them

bensheldon · 2022-11-03T21:09:19Z

@coorasse yep, you'll need to update to the latest version of GoodJob and run bin/rails g good_job:update and the migration with this index will be created. And then you'll have to apply it.

coorasse · 2022-11-03T21:11:51Z

Thanks! I missed this part of the docs.

bensheldon · 2023-09-02T17:29:31Z

fyi, as of #928, all jobs records will have scheduled_at column set regardless of whether they are scheduled in the future or not (immediate jobs can be identified where created_at = scheduled_at)

The preexisting index (introduced in bensheldon#726) gave a direct answer to the materialized subquery used to find candidate jobs, subject to the ordering rules that were in place at that time. In bensheldon#883, `GoodJob` deprecated that ordering in favour of a lower-is-more-important scheme, aligning with Active Job. When there are a substantial number of completed jobs, the `priority desc` index does not allow for a straight index scan, and may not actually be used by the planner. Aligning the sort orders here allows for the subquery to be satisfied directly.

The preexisting index (introduced in bensheldon#726) gave a direct answer to the materialized subquery used to find candidate jobs, subject to the ordering rules that were in place at that time. In bensheldon#883, `GoodJob` deprecated that ordering in favour of a lower-is-more-important scheme, aligning with Active Job. When there are a substantial number of completed jobs, the `priority desc` index does not allow for a straight index scan, and may not actually be used by the planner. Aligning the sort orders here allows for the subquery to be satisfied directly. Although `ASC NULLS LAST` is the default (as observable in the generated schema) it seems appropriate to be explicit in the actual index declaration, although mostly for consistency with the previous version.

…1213) The preexisting index (introduced in #726) gave a direct answer to the materialized subquery used to find candidate jobs, subject to the ordering rules that were in place at that time. In #883, `GoodJob` deprecated that ordering in favour of a lower-is-more-important scheme, aligning with Active Job. When there are a substantial number of completed jobs, the `priority desc` index does not allow for a straight index scan, and may not actually be used by the planner. Aligning the sort orders here allows for the subquery to be satisfied directly. Although `ASC NULLS LAST` is the default (as observable in the generated schema) it seems appropriate to be explicit in the actual index declaration, although mostly for consistency with the previous version.

Add index to good_jobs to improve querying candidate jobs

ca3a9e8

mitchellhenke mentioned this pull request Oct 20, 2022

Add configurable limit (queue_select_limit) when querying candidate jobs #727

Merged

Migration should create index concurrently

9a73ab5

bensheldon merged commit 9df954c into bensheldon:main Oct 22, 2022

bensheldon added the enhancement New feature or request label Oct 22, 2022

bensheldon mentioned this pull request Oct 22, 2022

Materialized CTE performance bottleneck #720

Closed

bensheldon mentioned this pull request Jan 30, 2023

Uncommon/alarming increased latency for GoodJob query on production #813

Open

mkrfowler mentioned this pull request Jan 14, 2024

Add an index to better support smaller_number_is_higher_priority #1213

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add index to `good_jobs` to improve querying candidate jobs #726

Add index to `good_jobs` to improve querying candidate jobs #726

mitchellhenke commented Oct 19, 2022 •

edited

Loading

bensheldon commented Oct 19, 2022

mitchellhenke commented Oct 19, 2022

bensheldon commented Oct 22, 2022 •

edited

Loading

mitchellhenke commented Oct 23, 2022

bensheldon commented Oct 24, 2022

coorasse commented Nov 3, 2022

bensheldon commented Nov 3, 2022

coorasse commented Nov 3, 2022

bensheldon commented Sep 2, 2023 •

edited

Loading

Add index to good_jobs to improve querying candidate jobs #726

Add index to good_jobs to improve querying candidate jobs #726

Conversation

mitchellhenke commented Oct 19, 2022 • edited Loading

bensheldon commented Oct 19, 2022

mitchellhenke commented Oct 19, 2022

bensheldon commented Oct 22, 2022 • edited Loading

mitchellhenke commented Oct 23, 2022

bensheldon commented Oct 24, 2022

coorasse commented Nov 3, 2022

bensheldon commented Nov 3, 2022

coorasse commented Nov 3, 2022

bensheldon commented Sep 2, 2023 • edited Loading

Add index to `good_jobs` to improve querying candidate jobs #726

Add index to `good_jobs` to improve querying candidate jobs #726

mitchellhenke commented Oct 19, 2022 •

edited

Loading

bensheldon commented Oct 22, 2022 •

edited

Loading

bensheldon commented Sep 2, 2023 •

edited

Loading