Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant Increase in Querying Time Due to Sqlparse #19567

Open
dvchristianbors opened this issue Apr 6, 2022 · 7 comments · May be fixed by #19572
Open

Significant Increase in Querying Time Due to Sqlparse #19567

dvchristianbors opened this issue Apr 6, 2022 · 7 comments · May be fixed by #19572
Assignees
Labels
#bug Bug report

Comments

@dvchristianbors
Copy link

When increasing the number of keys in IN clauses, the runtime of the query is sgnificantly increased. This means that queries with large numbers of keys are running for a very long time (several seconds for 200 keys), even though a direct query on the underlying database will terminate within milliseconds.

See discussion in Slack: https://apache-superset.slack.com/archives/C014LS99C1K/p1633448327074000

How to reproduce the bug

Precondition: Load an example dataset

  1. Go to 'Data' and open "Explore" your example dataset (e.g., birth_names from the example data)
  2. Click on 'Query Mode' > 'Raw Records' and add at least one column
  3. In Filters, click the Plus-button to add a new Filter - Custom SQL
  4. Enter a short IN query with several keys, e.g. `name IN ("Liam", "James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin")
  5. Enter a long IN query with at hundrets of keys, e.g., name IN ("Liam","James", "Noah", "Wyatt", "Gabriel", "Lucas", "Ethan", "Alexander", "Joseph", "Benjamin", "William", "Logan", "Mason", "Jack", "John", "Asher", "Elijah", "Daniel", "Henry", "Jacob", "Jaxon", "Michael", "Oliver", "Hunter", "David", "Levi", "Matthew", "Landon", "Aiden", "Isaac", "Jackson", "Caleb", "Ryan", "Elias", "Connor", "Evan", "Joshua", "Samuel", "Christian", "Jayden", "Jeremiah", "Cooper", "Eli", "Robert", "Ryder", "Christopher", "Colton", "Josiah", "Andrew", "Austin", "Carson", "Jaxson", "Jonathan", "Luke", "Malachi", "Nathan", "Owen", "Blake", "Lincoln", "Ezra", "Gavin", "Thomas", "Dylan", "Grayson", "Kai", "Ryker", "Zachary", "Anthony", "Isaiah", "Jase", "Jason", "Micah", "Sebastian", "Silas", "Titus", "Bentley", "Brody", "Cameron", "Carter", "Chase", "Gideon", "Jace", "Sawyer", "Tristan", "Tyler", "Weston", "Adam", "Charles", "Everett", "Wesley", "Xander", "Brandon", "Brayden", "Nathaniel", "Theodore", "Xavier", "Ashton", "Avery", "Dominic", "Easton", "Finn", "George", "Hudson", "Ian", "Jasper", "Kayden", "Marshall", "Max", "Maxwell", "Miles", "Orion", "Richard", "Timothy", "Abel", "Drake", "Garrett", "Jameson", "Jayce", "Joel", "Kenneth", "Maximus", "Nicholas", "Parker", "Travis", "Cody", "Dean", "Declan", "Elliot", "Ezekiel", "Karter", "Nolan", "Patrick", "Riley", "Seth", "Solomon", "Steven", "Victor", "Waylon", "Aaron", "August", "Bradley", "Braxton", "Bryce", "Calvin", "Camden", "Cayden", "Charlie", "Cole", "Damian", "Dawson", "Eric", "Greyson", "Jake", "Jeffrey", "Jesse", "Jonah", "Julian", "Kaiden", "Killian", "Kingston", "Maddox", "Matthias", "Maverick", "Odin", "Paul", "Peter", "Roman", "Trevor", "Zane", "Alex", "Archer", "Caden", "Collin", "Colt", "Edward", "Gage", "Gunner", "Harrison", "Ivan", "Jax", "Leo", "Lukas", "Marcus", "Paxton", "Soren", "Sullivan", "Tanner", "Trenton", "Troy", "Tucker", "Vincent", "Walter", "Warren", "Adrian", "Augustus", "Axel", "Beckett", "Cade", "Clayton", "Dante") (see a list of baby names here
  6. Compare the computation times. Even though an in clause would return the data almost instantly

Expected results

Computation time is roughly similar, within a few milliseconds.

Actual results

In my local setup, the difference is:
Short IN query: 0.34 sec
Long IN query: 1.62 sec
Even longer IN query: 6.07 sec

Upon adding more clauses, the runtime increases in quadratic time.

Screenshots

Short IN query (20 keys):
image

Long IN query (200 keys):
image

Even longer IN query (500 keys):
image

Environment

  • browser type and version: Tested on Chrome: Version 100.0.4896.60 (Official Build) (64-bit)
  • superset version: master branch commit 03d3eaa
  • python version: 3.8.10
  • node.js version: v12.22.9
  • any feature flags active: None

Checklist

Make sure to follow these steps before submitting your issue - thank you!

Additional context

The source of this quadratic runtime of the query is caused by both the sqlparse.parse and sqlparse.format functions called in numerous places (/models/core.py, db_engine_specs/base.py, connectors/sqla/models.py, and common/query_object.py)

@rusackas
Copy link
Member

Added a couple of reviewers (the folks from the Slack thread) as reviewers on the PR. Hopefully we can get this resolved! Thanks for the Issue and the Contribution!

@dvchristianbors
Copy link
Author

Added a couple of reviewers (the folks from the Slack thread) as reviewers on the PR. Hopefully we can get this resolved! Thanks for the Issue and the Contribution!

Thanks! The ticket and PR have been open for quite a while. The PR definitely needs to be revisited to see if it is still viable. There has also been a semi-recent release of the sqlparse package (0.4.3 in Sep 2022), so it could also be evaluated if the performance issues have been resolved.

@rusackas
Copy link
Member

I would close this as stale (we're trying to clean house) but maybe we can get that PR across the finish line. Are you able to validate whether things have improved, as of Superset 3.x?

@dvchristianbors
Copy link
Author

I have not tried to validate in Superset 3.x, but could try and do so.

@rusackas
Copy link
Member

That would be fantastic, thanks in advance!

@dvchristianbors
Copy link
Author

I can confirm that this issue still persists with the current main branch (commit 744f68d). I will work on bringing the PR up to date and re-request review.

@rusackas
Copy link
Member

rusackas commented Jul 9, 2024

A SIP has been passed to replace Sqlparse, and the work is in progress. We can close this if you'd like, but for the moment, I'll leave it open as long as the work remains in progress.

#26786

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
#bug Bug report
Projects
None yet
4 participants