-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(jobsdb): latest job status query optimization #2693
Conversation
af21d1d
to
c0dfd30
Compare
Codecov ReportBase: 45.43% // Head: 45.39% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #2693 +/- ##
==========================================
- Coverage 45.43% 45.39% -0.04%
==========================================
Files 290 290
Lines 48101 48066 -35
==========================================
- Hits 21853 21821 -32
+ Misses 24861 24857 -4
- Partials 1387 1388 +1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
dab1167
to
4d0241a
Compare
a6aba26
to
aba00db
Compare
aba00db
to
beaeb0b
Compare
de53cee
to
d414a37
Compare
jd.dsListLock.WithLock(func(l lock.LockToken) { | ||
switch jd.ownerType { | ||
case Write, ReadWrite: | ||
jd.setupDatabaseTables(l, jd.clearAll) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: all gateways will need to be upgraded to the latest version so that the necessary schema migration will be performed on gw
datasets.
3af7c40
to
78368c5
Compare
78368c5
to
89981a2
Compare
1c8fb20
to
329e635
Compare
Description
When
jobsdb
queries the database for jobs, it needs to get the latest job status for each job. The current way of achieving this is through aSELECT MAX(id) GROUP BY job_id
query in the job status table.However, a specialized index
(job_id asc, id desc)
along with usingSELECT DISTINCT ON (job_id)
yields significant performance improvements over the current strategy (25% - 100% faster), see some example query plans: old vs new. See also the relevant thread.To simplify queries even further, a new database view is introduced for every job status table (e.g. table
rt_job_status_1
has viewv_last_rt_job_status_1
) abstracting away the actual query that provides this information.Note: This pull request shall be reviewed along with its counterpart in
rudder-migrator
: #32Bonus!
Added an easy-to-use function
unionjobsdb(prefix text, num int)
for quickly querying multiple jobsdb datasets for jobs along with their latest status. You no longer have to keep track of constantly changing jobsdb indexes...E.g. to query the first 5
rt
tables for Amplitude jobs, all you need to do now is:Notion Ticket
Link
Security