-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant Increase in Querying Time Due to Sqlparse #19567
Comments
Added a couple of reviewers (the folks from the Slack thread) as reviewers on the PR. Hopefully we can get this resolved! Thanks for the Issue and the Contribution! |
Thanks! The ticket and PR have been open for quite a while. The PR definitely needs to be revisited to see if it is still viable. There has also been a semi-recent release of the |
I would close this as stale (we're trying to clean house) but maybe we can get that PR across the finish line. Are you able to validate whether things have improved, as of Superset 3.x? |
I have not tried to validate in Superset 3.x, but could try and do so. |
That would be fantastic, thanks in advance! |
I can confirm that this issue still persists with the current main branch (commit 744f68d). I will work on bringing the PR up to date and re-request review. |
A SIP has been passed to replace Sqlparse, and the work is in progress. We can close this if you'd like, but for the moment, I'll leave it open as long as the work remains in progress. |
When increasing the number of keys in
IN
clauses, the runtime of the query is sgnificantly increased. This means that queries with large numbers of keys are running for a very long time (several seconds for 200 keys), even though a direct query on the underlying database will terminate within milliseconds.See discussion in Slack: https://apache-superset.slack.com/archives/C014LS99C1K/p1633448327074000
How to reproduce the bug
Precondition: Load an example dataset
IN
query with several keys, e.g. `name IN ("Liam", "James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin")IN
query with at hundrets of keys, e.g.,name IN ("Liam","James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin", "William", "Logan", "Mason", "Jack", "John", "Asher", "Elijah", "Daniel", "Henry", "Jacob", "Jaxon", "Michael", "Oliver", "Hunter", "David", "Levi", "Matthew", "Landon", "Aiden", "Isaac", "Jackson", "Caleb", "Ryan", "Elias", "Connor", "Evan", "Joshua", "Samuel", "Christian", "Jayden", "Jeremiah", "Cooper", "Eli", "Robert", "Ryder", "Christopher", "Colton", "Josiah", "Andrew", "Austin", "Carson", "Jaxson", "Jonathan", "Luke", "Malachi", "Nathan", "Owen", "Blake", "Lincoln", "Ezra", "Gavin", "Thomas", "Dylan", "Grayson", "Kai", "Ryker", "Zachary", "Anthony", "Isaiah", "Jase", "Jason", "Micah", "Sebastian", "Silas", "Titus", "Bentley", "Brody", "Cameron", "Carter", "Chase", "Gideon", "Jace", "Sawyer", "Tristan", "Tyler", "Weston", "Adam", "Charles", "Everett", "Wesley", "Xander", "Brandon", "Brayden", "Nathaniel", "Theodore", "Xavier", "Ashton", "Avery", "Dominic", "Easton", "Finn", "George", "Hudson", "Ian", "Jasper", "Kayden", "Marshall", "Max", "Maxwell", "Miles", "Orion", "Richard", "Timothy", "Abel", "Drake", "Garrett", "Jameson", "Jayce", "Joel", "Kenneth", "Maximus", "Nicholas", "Parker", "Travis", "Cody", "Dean", "Declan", "Elliot", "Ezekiel", "Karter", "Nolan", "Patrick", "Riley", "Seth", "Solomon", "Steven", "Victor", "Waylon", "Aaron", "August", "Bradley", "Braxton", "Bryce", "Calvin", "Camden", "Cayden", "Charlie", "Cole", "Damian", "Dawson", "Eric", "Greyson", "Jake", "Jeffrey", "Jesse", "Jonah", "Julian", "Kaiden", "Killian", "Kingston", "Maddox", "Matthias", "Maverick", "Odin", "Paul", "Peter", "Roman", "Trevor", "Zane", "Alex", "Archer", "Caden", "Collin", "Colt", "Edward", "Gage", "Gunner", "Harrison", "Ivan", "Jax", "Leo", "Lukas", "Marcus", "Paxton", "Soren", "Sullivan", "Tanner", "Trenton", "Troy", "Tucker", "Vincent", "Walter", "Warren", "Adrian", "Augustus", "Axel", "Beckett", "Cade", "Clayton", "Dante")
(see a list of baby names hereExpected results
Computation time is roughly similar, within a few milliseconds.
Actual results
In my local setup, the difference is:
Short
IN
query: 0.34 secLong
IN
query: 1.62 secEven longer
IN
query: 6.07 secUpon adding more clauses, the runtime increases in quadratic time.
Screenshots
Short
data:image/s3,"s3://crabby-images/85106/85106c312a66e589be97a68f977e97a618524c80" alt="image"
IN
query (20 keys):Long
data:image/s3,"s3://crabby-images/83509/83509a62a37a62b00ebb8df868d0fde510ecad90" alt="image"
IN
query (200 keys):Even longer
data:image/s3,"s3://crabby-images/16552/165527c4a7b702a38c64a065287981efbafaf8aa" alt="image"
IN
query (500 keys):Environment
Checklist
Make sure to follow these steps before submitting your issue - thank you!
Additional context
The source of this quadratic runtime of the query is caused by both the
sqlparse.parse
andsqlparse.format
functions called in numerous places (/models/core.py
,db_engine_specs/base.py
,connectors/sqla/models.py
, andcommon/query_object.py
)The text was updated successfully, but these errors were encountered: