CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026

mateusz-sekara · 2024-04-29T10:22:48Z

According to the Postgres EBNF, columns in the order by clause should be separated with , without opening and closing parenthesis https://www.postgresql.org/docs/13/queries-order.html.
That being said, sorting by (column_a, column_b) is syntactically incorrect. However, for our cases, it doesn't throw any errors but leads to poor execution in which Postgres fetches all matching tuples, loads them to memory, and sorts in memory using hpea/quick sort. The proper approach is to sort by column_a, column_b. Almost all LP queries sort by block_number, log_index. In most of the cases, Postgres has data already properly pre-sorted in the index, so it only needs to sort by log_index if there are conflicts.

However, the most important change here is the performance when sorting by columns versus sorting by tuples. Please check the following examples

Used query, around 40k logs matching (expand for details)

SELECT * FROM evm.logs
WHERE evm_chain_id = 43113
  AND event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'
  AND address = '\x3e0df884042c21e83276abb368b4388a17f78a82'
  AND block_number <= (SELECT greatest(block_number - 1, 0)
                       FROM evm.log_poller_blocks
                       WHERE evm_chain_id = 43113
                       ORDER BY block_number DESC LIMIT 1)
ORDER BY  .... ;

Sorted ascending limit 1 (75ms -> 1ms)

ORDER BY (block_number, log_index) LIMIT 1;

explain (analyze, buffers) SELECT * FROM evm.logs
WHERE evm_chain_id = 43113
  AND event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'
  AND address = '\x3e0df884042c21e83276abb368b4388a17f78a82'
  AND block_number <= (SELECT greatest(block_number - 1, 0)
                       FROM evm.log_poller_blocks
                       WHERE evm_chain_id = 43113
                       ORDER BY block_number DESC LIMIT 1)
ORDER BY (block_number, log_index) LIMIT 1;
                                                                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17180.02..17180.02 rows=1 width=420) (actual time=75.456..75.459 rows=1 loops=1)
   Buffers: shared hit=37678
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.55 rows=1 width=16) (actual time=0.063..0.064 rows=1 loops=1)
           Buffers: shared hit=5
           ->  Index Only Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..78406.39 rows=70136 width=16) (actual time=0.063..0.063 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '43113'::numeric)
                 Heap Fetches: 1
                 Buffers: shared hit=5
   ->  Sort  (cost=17178.47..17190.36 rows=4756 width=420) (actual time=75.455..75.455 rows=1 loops=1)
         Sort Key: (ROW(logs.block_number, logs.log_index))
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=37678
         ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..17154.69 rows=4756 width=420) (actual time=0.101..52.247 rows=39707 loops=1)
               Index Cond: ((evm_chain_id = '43113'::numeric) AND (address = '\x3e0df884042c21e83276abb368b4388a17f78a82'::bytea) AND (event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'::bytea) AND (block_number <= $0))
               Buffers: shared hit=37678
 Planning Time: 0.226 ms
 Execution Time: 75.504 ms

ORDER BY block_number, log_index LIMIT 1;

explain (analyze, buffers) SELECT * FROM evm.logs
WHERE evm_chain_id = 43113
  AND event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'
  AND address = '\x3e0df884042c21e83276abb368b4388a17f78a82'
  AND block_number <= (SELECT greatest(block_number - 1, 0)
                       FROM evm.log_poller_blocks
                       WHERE evm_chain_id = 43113
                       ORDER BY block_number DESC LIMIT 1)
ORDER BY block_number, log_index LIMIT 1;
                                                                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=5.80..9.45 rows=1 width=388) (actual time=0.098..0.099 rows=1 loops=1)
   Buffers: shared hit=10
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.55 rows=1 width=16) (actual time=0.063..0.064 rows=1 loops=1)
           Buffers: shared hit=5
           ->  Index Only Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..78406.39 rows=70136 width=16) (actual time=0.063..0.063 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '43113'::numeric)
                 Heap Fetches: 1
                 Buffers: shared hit=5
   ->  Incremental Sort  (cost=4.26..17365.00 rows=4756 width=388) (actual time=0.097..0.097 rows=1 loops=1)
         Sort Key: logs.block_number, logs.log_index
         Presorted Key: logs.block_number
         Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 25kB  Peak Memory: 25kB
         Buffers: shared hit=10
         ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..17154.69 rows=4756 width=388) (actual time=0.087..0.088 rows=2 loops=1)
               Index Cond: ((evm_chain_id = '43113'::numeric) AND (address = '\x3e0df884042c21e83276abb368b4388a17f78a82'::bytea) AND (event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'::bytea) AND (block_number <= $0))
               Buffers: shared hit=10
 Planning Time: 0.238 ms
 Execution Time: 0.132 ms
(19 rows)

Sorted ascending without limit (130ms -> 53ms)

ORDER BY (block_number, log_index);

explain (analyze, buffers) SELECT * FROM evm.logs
WHERE evm_chain_id = 43113
  AND event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'
  AND address = '\x3e0df884042c21e83276abb368b4388a17f78a82'
  AND block_number <= (SELECT greatest(block_number - 1, 0)
                       FROM evm.log_poller_blocks
                       WHERE evm_chain_id = 43113
                       ORDER BY block_number DESC LIMIT 1)
ORDER BY (block_number, log_index);
                                                                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=17446.73..17458.62 rows=4756 width=420) (actual time=95.494..121.037 rows=39707 loops=1)
   Sort Key: (ROW(logs.block_number, logs.log_index))
   Sort Method: external merge  Disk: 13352kB
   Buffers: shared hit=37678, temp read=1669 written=1674
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.55 rows=1 width=16) (actual time=0.064..0.064 rows=1 loops=1)
           Buffers: shared hit=5
           ->  Index Only Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..78406.39 rows=70136 width=16) (actual time=0.063..0.063 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '43113'::numeric)
                 Heap Fetches: 1
                 Buffers: shared hit=5
   ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..17154.69 rows=4756 width=420) (actual time=0.089..52.717 rows=39707 loops=1)
         Index Cond: ((evm_chain_id = '43113'::numeric) AND (address = '\x3e0df884042c21e83276abb368b4388a17f78a82'::bytea) AND (event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'::bytea) AND (block_number <= $0))
         Buffers: shared hit=37678
 Planning Time: 0.223 ms
 Execution Time: 130.901 ms
(16 rows)

ORDER BY block_number, log_index;

explain (analyze, buffers) SELECT * FROM evm.logs
WHERE evm_chain_id = 43113
  AND event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'
  AND address = '\x3e0df884042c21e83276abb368b4388a17f78a82'
  AND block_number <= (SELECT greatest(block_number - 1, 0)
                       FROM evm.log_poller_blocks
                       WHERE evm_chain_id = 43113
                       ORDER BY block_number DESC LIMIT 1)
ORDER BY block_number, log_index;
                                                                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Incremental Sort  (cost=5.80..17366.55 rows=4756 width=388) (actual time=0.175..51.199 rows=39707 loops=1)
   Sort Key: logs.block_number, logs.log_index
   Presorted Key: logs.block_number
   Full-sort Groups: 1241  Sort Method: quicksort  Average Memory: 41kB  Peak Memory: 41kB
   Buffers: shared hit=37678
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.43..1.55 rows=1 width=16) (actual time=0.064..0.065 rows=1 loops=1)
           Buffers: shared hit=5
           ->  Index Only Scan using idx_evm_log_poller_blocks_order_by_block on log_poller_blocks  (cost=0.43..78406.39 rows=70136 width=16) (actual time=0.063..0.063 rows=1 loops=1)
                 Index Cond: (evm_chain_id = '43113'::numeric)
                 Heap Fetches: 1
                 Buffers: shared hit=5
   ->  Index Scan using idx_evm_logs_ordered_by_block_and_created_at on logs  (cost=0.56..17154.69 rows=4756 width=388) (actual time=0.089..41.732 rows=39707 loops=1)
         Index Cond: ((evm_chain_id = '43113'::numeric) AND (address = '\x3e0df884042c21e83276abb368b4388a17f78a82'::bytea) AND (event_sig = '\xb04e63db38c49950639fa09d29872f21f5d49d614f3a969d8adf3d4b52e41a62'::bytea) AND (block_number <= $0))
         Buffers: shared hit=37678
 Planning Time: 0.235 ms
 Execution Time: 53.346 ms
(17 rows)

…tten

github-actions · 2024-04-29T10:23:07Z

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset as well as in the text include at least one of the following tags:

#added For any new functionality added.
#breaking_change For any functionality that requires manual action for the node to boot.
#bugfix For bug fixes.
#changed For any change to the existing functionality.
#db_update For any feature that introduces updates to database schema.
#deprecation_notice For any upcoming deprecation functionality.
#internal For changesets that need to be excluded from the final changelog.
#nops For any feature that is NOP facing and needs to be in the official Release Notes for the release.
#removed For any functionality/config that is removed.
#updated For any functionality that is updated.
#wip For any change that is not ready yet and external communication about it should be held off till it is feature complete.

…tten

0xnogo · 2024-04-29T11:57:51Z

Very nice improvement!
I'm curious if you have tried running the EXPLAIN ANALYZE command on the query both with and without the parentheses. The performance results you've provided already give a good indication of how these changes have helped. EXPLAIN ANALYZE can offer additional insights into what happens during the query execution (inefficient joins, if indexes are being used...).

Edit: just expanding the blockquote. Please omit this message.

0xnogo

LGTM!
Not in the scope of this PR, but I am seeing some SELECT *. It can also potentially be an offender of runtime performance. We could potentially check if we require all the columns.

reductionista

Great find!

reductionista · 2024-04-30T03:21:56Z

@0xnogo

Not in the scope of this PR, but I am seeing some SELECT *. It can also potentially be an offender of runtime performance. We could potentially check if we require all the columns.

We do need all columns. The api methods for LogPoller have to be generic enough to support any possible chainlink product's needs. So they return one data structure representing a Log and one representing a Block. The purpose of the logs and log_poller_blocks table is to allow products to search for a set of logs or blocks based on specific criteria. These data structs are not large, but LogPoller must return them each as a whole not individual fields (columns). These will soon be in entirely different repos on different release schedules, and run as separate executables so LogPoller must stay as general as possible and not make any assumptions about which fields the products may or may not need to use for a specific application. The filters table is to persist filters across node restarts--the only time it reads from that table is to load the entire table back into memory as the node is starting up.

cl-sonarqube-production · 2024-04-30T07:48:03Z

Quality Gate passed

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

…es are written (#13026) (#783) Cherry picking smartcontractkit/chainlink#13026

Boosting LogPoller performance by fixing how order by clauses are wri…

fbe1cb7

…tten

mateusz-sekara requested a review from a team as a code owner April 29, 2024 10:22

mateusz-sekara temporarily deployed to sdlc April 29, 2024 10:22 — with GitHub Actions Inactive

Boosting LogPoller performance by fixing how order by clauses are wri…

b48010a

…tten

mateusz-sekara mentioned this pull request Apr 29, 2024

CCIP-2206 Added support for retention in the contract_transmitter.go #12998

Merged

mateusz-sekara temporarily deployed to sdlc April 29, 2024 10:29 — with GitHub Actions Inactive

Merge branch 'develop' into better-queries

45a8e68

mateusz-sekara temporarily deployed to sdlc April 29, 2024 10:29 — with GitHub Actions Inactive

mateusz-sekara requested a review from reductionista April 29, 2024 10:43

mateusz-sekara changed the title ~~Boosting LogPoller performance by fixing how order by clauses are written~~ CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written Apr 29, 2024

mateusz-sekara requested a review from connorwstein April 29, 2024 10:53

jmank88 approved these changes Apr 29, 2024

View reviewed changes

0xnogo approved these changes Apr 29, 2024

View reviewed changes

reductionista approved these changes Apr 30, 2024

View reviewed changes

mateusz-sekara added this pull request to the merge queue Apr 30, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 30, 2024

mateusz-sekara added this pull request to the merge queue Apr 30, 2024

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Apr 30, 2024

Merge branch 'develop' into better-queries

3cabaf9

mateusz-sekara temporarily deployed to sdlc April 30, 2024 07:32 — with GitHub Actions Inactive

mateusz-sekara enabled auto-merge April 30, 2024 07:43

mateusz-sekara added this pull request to the merge queue Apr 30, 2024

Merged via the queue into develop with commit e21be2a Apr 30, 2024
104 checks passed

mateusz-sekara deleted the better-queries branch April 30, 2024 08:08

app-token-issuer-infra-releng bot mentioned this pull request Apr 30, 2024

[DO NOT MERGE] Changeset Release Preview - v2.12.0 #13042

Closed

mateusz-sekara mentioned this pull request Apr 30, 2024

CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written (#13026) smartcontractkit/ccip#783

Merged

mateusz-sekara added a commit to smartcontractkit/ccip that referenced this pull request Apr 30, 2024

CCIP-2207 Boosting LogPoller performance by fixing how order by claus…

f22ed2e

…es are written (#13026) (#783) Cherry picking smartcontractkit/chainlink#13026

This was referenced Apr 30, 2024

[DO NOT MERGE] Changeset Release Preview - v2.12.0 #13051

Closed

[DO NOT MERGE] Changeset Release Preview - v2.12.0 #13060

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026

CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026

mateusz-sekara commented Apr 29, 2024 •

edited

Loading

github-actions bot commented Apr 29, 2024

0xnogo commented Apr 29, 2024 •

edited

Loading

0xnogo left a comment

reductionista left a comment

reductionista commented Apr 30, 2024

cl-sonarqube-production bot commented Apr 30, 2024

CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026

CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026

Conversation

mateusz-sekara commented Apr 29, 2024 • edited Loading

Sorted ascending limit 1 (75ms -> 1ms)

Sorted ascending without limit (130ms -> 53ms)

github-actions bot commented Apr 29, 2024

0xnogo commented Apr 29, 2024 • edited Loading

0xnogo left a comment

Choose a reason for hiding this comment

reductionista left a comment

Choose a reason for hiding this comment

reductionista commented Apr 30, 2024

cl-sonarqube-production bot commented Apr 30, 2024

Quality Gate passed

mateusz-sekara commented Apr 29, 2024 •

edited

Loading

0xnogo commented Apr 29, 2024 •

edited

Loading