-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server side pagination for charts (tables) #7804
Comments
Trying to understand your use case better, are you trying to download that 25M rows to your laptop ultimately to do notebooky-type things? Comb through every record? Slice and dice the result set? Currently when operating in async mode, SQL Lab (on a celery worker) will dump the larger result set to a caching backend (we typically use S3) to allow for CSV exports. Two limits are applied, one to what the database server will return, one to what the browser will get. On the user side, one approach could be to
Alternatively maybe the web server could page when reading the CSV out of S3. |
To elaborate: We build an interactive dashboard for investigators. Those investigators are not able (knowledge wise) to create their own dashboards / queries for this particular investigation. So, we have two tables one with 25M transactions and another with 6M unique account numbers. A users needs to be able to filter the account number table to find a specific account number. Selecting this should then also filter the transaction table which can be a lot of transactions (easily > 70K) We have two options we have a filter box to select the account number this however seems to have a hard coded limit of much lower than the 6M we need. So this means missing account numbers when a user searches. Second option is to create another table which has a search box and can emit filter events. We can control the limit here, but 6M accounts will be sent to the browser as filtering happens clients side even if we use 'pagination'. If we set the limit to something the browser can handle the search will miss account numbers. On your comments. I think you are missing the mark, cause you seem to be talking about exports. We literally just need to be able to filter a large table on the front end without blowing up the browser and doing efficiently as multiple users will use the same tables. If needed we can have a video chat to show the issue. Fixing this would really help superset adoption (at least with us ;-) ) as it enables some really big dashboards. |
I might be overlooking something, but adding support for pagination in the backend should be fairly simple: Adding an |
I have been looking into this, and think I have it sorted on the backend side. However, I decided not to put in a PR yet, as the final backend design will most likely change slightly based on what works best in the frontend, so needs to be developed in parallel with the frontend (was thinking about starting with the table plugin as a POC https://github.com/apache-superset/superset-ui-plugins/tree/master/packages/superset-ui-legacy-plugin-chart-table ). So once the plugins repo opens up to community PRs I will be happy to work on this. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue |
While adding pagination to the table chart (especially ungrouped ones) should definitely be done, I think the table chart's search functionality would need to hit the backend or be part of the query for this to solve the use case brought up. I'm fairly certain the table charts search is local (ie, it's only searching over the data held in the client). Something this issue highlights is that the filter box runs one query to fetch it's data (with an upper limit) and just runs a local search over that data. A much better design would hit the backend while searching to fetch additional rows lost with the initial limit. This could mean a slow search experience, depending on the cardinality/database conditions, so a an option to enable/disable this behavior would probably make sense. |
Hi All, |
I think the backend request is best handled by simply adding the |
Hi All, |
As suggest by @villebro I applied offset parameter to query_obj by adding a offset-limit widget on table-chart. We are able to render the limited/specific( i.e every time, we have to provide static-offset-value) records based on provided offset value(static-offset-limit), but I could not reach @anandhurajan scenario(https://datatables.net/examples/data_sources/server_side) where offset value is dynamically created. Our scenario is to get dynamic offset value based on page_number and show_entries, for the same it should be called explore_json/any_api to get data_resource by running SQL_Query from back-end. Do you have any idea to manipulate/update on front-end to get dynamic offset-values( page-number*entries), just like as provided on client_side_pagination on table view. |
Hi All, Any plans on this? |
I'll probably be adding backend support for pagination to the new |
Hi All, Any update on this? |
@villebro do you have any idea when we're adding it to the table? |
can you please share your approach? I'm trying to get this feature on 0.36. Thanks in advance |
Some notes here:
Ultimately we could have "advanced pagination" options that let the user choose between server and client-side pagination options. I think that when people say "I want to browse through millions of rows" it's pointing to other problems. It very rarely makes sense for anyone to comb through millions of individual rows. |
Hi, Any solution for this? |
limit/offset is not an option for pagination: it is terribly inefficient as all the previous pages should be computed each time. Please rely on primary keys if available. Having a server-side cache would allow to have a coherent result (avoiding changes when navigating through the pages) and pave the way to fix big CSV export (#10953). |
@mistercrunch @frafra The typical way to solve this is to rely on a primary key or on a index and remember where you left off. In mariadb terms So what could be possible is to enable server side pagination for tables that have a primary key or an index and disable it for tables that don’t. |
Added this request to Superset roadmap, will get to it Q2-Q3. |
FYI @bolkedebruin this is currently being implemented here |
this was fully implemented lately: |
Hi , I'm still facing the issue in server side pagination ... I'm not able to navigate to different pages ... I have attached the video recording ... Screencast.2021-07-20.00_24_48.mp4Uploading Screencast 2021-07-20 00_24_48.mp4… |
Is your feature request related to a problem? Please describe.
As a analyst I need to pull in the entire resultset to do my work. There is no way to either catagorize or aggregate information before hand. If I pull the entire result set (25M rows) I crash the browser or if I enforce a limit then the subsequent filters don't work anymore.
It balloons memory usage on both the server, caches, and web client. This makes it close to impossible to use Superset for this use case.
Describe the solution you'd like
Pagination should happen on the server side rather than on the client side. This means I will pull in limited set of data to the client and when it is out of data it will reach out to the server to request the remaining data.
Describe alternatives you've considered
SQL lab does not cut this as the end users are not able to integrate this with their work flow.
Additional context
This was previously mentioned (paraphrased) in #1433 , #4037 without resolution.
@mistercrunch @kristw @john-bodley
The text was updated successfully, but these errors were encountered: