-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCIP-2207 Boosting LogPoller performance by fixing how order by clauses are written #13026
Conversation
I see you updated files related to
|
Very nice improvement! Edit: just expanding the blockquote. Please omit this message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Not in the scope of this PR, but I am seeing some SELECT *
. It can also potentially be an offender of runtime performance. We could potentially check if we require all the columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great find!
We do need all columns. The api methods for LogPoller have to be generic enough to support any possible chainlink product's needs. So they return one data structure representing a Log and one representing a Block. The purpose of the logs and log_poller_blocks table is to allow products to search for a set of logs or blocks based on specific criteria. These data structs are not large, but LogPoller must return them each as a whole not individual fields (columns). These will soon be in entirely different repos on different release schedules, and run as separate executables so LogPoller must stay as general as possible and not make any assumptions about which fields the products may or may not need to use for a specific application. The filters table is to persist filters across node restarts--the only time it reads from that table is to load the entire table back into memory as the node is starting up. |
Quality Gate passedIssues Measures |
…es are written (#13026) (#783) Cherry picking smartcontractkit/chainlink#13026
According to the Postgres EBNF, columns in the
order by
clause should be separated with,
without opening and closing parenthesis https://www.postgresql.org/docs/13/queries-order.html.That being said, sorting by
(column_a, column_b)
is syntactically incorrect. However, for our cases, it doesn't throw any errors but leads to poor execution in which Postgres fetches all matching tuples, loads them to memory, and sorts in memory using hpea/quick sort. The proper approach is to sort bycolumn_a, column_b
. Almost all LP queries sort byblock_number, log_index
. In most of the cases, Postgres has data already properly pre-sorted in the index, so it only needs to sort bylog_index
if there are conflicts.However, the most important change here is the performance when sorting by columns versus sorting by tuples. Please check the following examples
Used query, around 40k logs matching (expand for details)
Sorted ascending limit 1 (75ms -> 1ms)
ORDER BY (block_number, log_index) LIMIT 1;
ORDER BY block_number, log_index LIMIT 1;
Sorted ascending without limit (130ms -> 53ms)
ORDER BY (block_number, log_index);
ORDER BY block_number, log_index;