-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: Django introspection query performance regression (still) #100871
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
I think it's a GA-blocker until we triage further, cc @mgartner |
I believe we had previously narrowed this down to |
Unless changes have been made to the descriptor cache since 22.2, this might be due to some difference between how UDFs interact with the cache and how the internal executor does it. Either way, the fix might be to improve the behavior with many uncached descriptor ids. I loaded a database with 15000 tables similar to the example here. Running the query from this issue on that database took about 1.4s on my machine. I tried marking the builtin as volatile so it wouldn't be inlined, and that increased the time to ~3s. Running the query on 22.2.2 ranged from ~600ms to ~900ms on my machine. Fetching uncached descriptors in
to pessimistically scan for all IDs at once if too many were uncached. |
Nice findings @DrewKimball! let me check with others on the schema team about your patch. |
@DrewKimball I think we can use your patch. Do you plan to push it, or would your like our team to take over? |
I'm not sure I'd want to push it as-is - I didn't do any testing to figure out whether this was a good threshold to transition to |
@DrewKimball I think we can get away with dropping it a bit lower to about ~500 descriptors and treat those as reads that need every single descriptor. I believe this is a reasonable solution shorter term and long term we need some type of caching to fix these scenarios |
Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: cockroachdb#100871 Release note: None
Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: cockroachdb#100871 Release note: None
101937: catalog: utilize scans for a large number of descriptors r=fqazi a=fqazi Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: #100871 Release note: None 101943: build,release: provide way to inject build tag override, use in MAPB r=rail a=rickystewart When we build nightly builds, we build them in "release" configuration. Because we do this, the build tag reported by these binaries from `cockroach version` has been identical to that of an actual release binary, which is very confusing. Here we update the script to build these nightlies (`make-and-publish-build-artifacts.sh`) to inject an appropriate, identifiable build tag. It is probably wrong that we are building these nightlies as if they were "releases". We'll follow up with work to fix this and refine the build tags further. Closes #100532. Epic: None Release note (build change): Update reported `Build Tag` for nightly (non-release) builds 101944: sem/tree: remove TODO r=mgartner a=mgartner This commit removes a TODO that was addressed by #96045. Epic: None Release note: None 101952: ui: fix linegraph render for large clusters r=zachlite a=zachlite We were feeding `Math.min` and `Math.max` very large arrays (length equal to ~10e7). These functions take variadic arguments, and the runtime can't handle that many arguments. As a result we blow the stack, causing uncaught range errors. Instead, we'll use lodash min and max which are not variadic. Resolves #101377 Epic: None Release note: None Reproduction and Fix demonstrated on the command line: <img width="722" alt="Screenshot 2023-04-20 at 4 56 41 PM" src="https://user-images.githubusercontent.com/5423191/233486016-fd740a98-9a0d-4fef-a6d8-67e9cb4e318e.png"> Co-authored-by: Faizan Qazi <[email protected]> Co-authored-by: Ricky Stewart <[email protected]> Co-authored-by: Marcus Gartner <[email protected]> Co-authored-by: Zach Lite <[email protected]>
Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: #100871 Release note: None
Previously, the catalog descriptor would fetch descriptors via point lookups using Get when scanning large batches of descriptors. This was further extended to also look up ZoneConfigs and comments in a similar way. Recently, we we started seeing regression on the Django test suite involving the pg_catalog tables, which tend to do read large number of descriptors, likely linked to extra overhead linked to both comments and zone configs in 23.1. To address this, this patch ill now start using scans for runs of descriptor IDs for batch scans which reduces the overall cost of fetching a large number of descriptors hiding this cost. Fixes: #100871 Release note: None
Describe the problem
A regression is causing the following query (which Django's test suite runs quite often) to take 1-2 seconds (instead of milliseconds).
To Reproduce
Environment:
Build Time: 2023/04/06 20:05:50
Additional context
The same issue was reported in #93955 where performance was improved a little.
Related PRs:
The full Django test suite runtime is up from ~40 minutes with 22.2.x to ~1.5 hrs (though I'm not sure it's entirely attributable to this query).
Jira issue: CRDB-26663
The text was updated successfully, but these errors were encountered: