Server side pagination for charts (tables) #7804

bolkedebruin · 2019-07-01T12:45:40Z

Is your feature request related to a problem? Please describe.

As a analyst I need to pull in the entire resultset to do my work. There is no way to either catagorize or aggregate information before hand. If I pull the entire result set (25M rows) I crash the browser or if I enforce a limit then the subsequent filters don't work anymore.

It balloons memory usage on both the server, caches, and web client. This makes it close to impossible to use Superset for this use case.

Describe the solution you'd like

Pagination should happen on the server side rather than on the client side. This means I will pull in limited set of data to the client and when it is out of data it will reach out to the server to request the remaining data.

Describe alternatives you've considered

SQL lab does not cut this as the end users are not able to integrate this with their work flow.

Additional context

This was previously mentioned (paraphrased) in #1433 , #4037 without resolution.

@mistercrunch @kristw @john-bodley

mistercrunch · 2019-07-02T03:51:56Z

Trying to understand your use case better, are you trying to download that 25M rows to your laptop ultimately to do notebooky-type things? Comb through every record? Slice and dice the result set?

Currently when operating in async mode, SQL Lab (on a celery worker) will dump the larger result set to a caching backend (we typically use S3) to allow for CSV exports. Two limits are applied, one to what the database server will return, one to what the browser will get.

On the user side, one approach could be to CREATE TABLE AS first using the IDE, and then running some LIMIT / OFFSET statements against the result set. A similar approach could be used by the system, assuming that:

we and/or the user has the rights to CTAS on that database (or maybe another database serving that purpose, but that's starting to get complex)
the database's LIMIT OFFSET probably isn't super efficient and hopefully ordering is deterministic, it's rarely not homogenous across database engine, like with Presto you'd have to do some tricks around row_number()

Alternatively maybe the web server could page when reading the CSV out of S3.

bolkedebruin · 2019-07-02T08:31:16Z

To elaborate:

We build an interactive dashboard for investigators. Those investigators are not able (knowledge wise) to create their own dashboards / queries for this particular investigation.

So, we have two tables one with 25M transactions and another with 6M unique account numbers. A users needs to be able to filter the account number table to find a specific account number. Selecting this should then also filter the transaction table which can be a lot of transactions (easily > 70K)

We have two options we have a filter box to select the account number this however seems to have a hard coded limit of much lower than the 6M we need. So this means missing account numbers when a user searches. Second option is to create another table which has a search box and can emit filter events. We can control the limit here, but 6M accounts will be sent to the browser as filtering happens clients side even if we use 'pagination'. If we set the limit to something the browser can handle the search will miss account numbers.

On your comments. I think you are missing the mark, cause you seem to be talking about exports. We literally just need to be able to filter a large table on the front end without blowing up the browser and doing efficiently as multiple users will use the same tables.

If needed we can have a video chat to show the issue. Fixing this would really help superset adoption (at least with us ;-) ) as it enables some really big dashboards.

villebro · 2019-07-02T13:57:25Z

I might be overlooking something, but adding support for pagination in the backend should be fairly simple: Adding an offset element to query_obj and then adding that to the select statement in the SQLAlchemy model should be enough. However, I can't speak for how big an impact this change would require in the frontend. But I do see the value of this proposition, as it would make it possible to default to very low limits (e.g. 50), and then lazy loading more if necessary, if the DB handles offsets efficiently.

villebro · 2019-08-10T06:15:03Z

I have been looking into this, and think I have it sorted on the backend side. However, I decided not to put in a PR yet, as the final backend design will most likely change slightly based on what works best in the frontend, so needs to be developed in parallel with the frontend (was thinking about starting with the table plugin as a POC https://github.com/apache-superset/superset-ui-plugins/tree/master/packages/superset-ui-legacy-plugin-chart-table ). So once the plugins repo opens up to community PRs I will be happy to work on this.

stale · 2019-10-09T06:30:35Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. For admin, please label this issue .pinned to prevent stale bot from closing the issue.

nytai · 2019-12-18T08:30:08Z

While adding pagination to the table chart (especially ungrouped ones) should definitely be done, I think the table chart's search functionality would need to hit the backend or be part of the query for this to solve the use case brought up. I'm fairly certain the table charts search is local (ie, it's only searching over the data held in the client).

Something this issue highlights is that the filter box runs one query to fetch it's data (with an upper limit) and just runs a local search over that data. A much better design would hit the backend while searching to fetch additional rows lost with the initial limit. This could mean a slow search experience, depending on the cardinality/database conditions, so a an option to enable/disable this behavior would probably make sense.

anandhurajan · 2019-12-19T05:26:31Z

Hi All,
Has this been implemented in new version of superset?
Can we implement something like this for pagination?
https://datatables.net/examples/data_sources/server_side

villebro · 2019-12-19T06:56:54Z

I think the backend request is best handled by simply adding the offset parameter to the request, making it possible to paginate any type of query, whether they include groupings or not. I like the DataTables example provided by @anandhurajan , and I think a similar search-functionlity is best handled by either extending the filter box chart, or designing a new native component as discussed/proposed in #8452 .

anandhurajan · 2019-12-31T06:21:37Z

Hi All,
Any update on this?

sivasankarsani737 · 2020-01-07T16:55:38Z

As suggest by @villebro I applied offset parameter to query_obj by adding a offset-limit widget on table-chart. We are able to render the limited/specific( i.e every time, we have to provide static-offset-value) records based on provided offset value(static-offset-limit), but I could not reach @anandhurajan scenario(https://datatables.net/examples/data_sources/server_side) where offset value is dynamically created.

Our scenario is to get dynamic offset value based on page_number and show_entries, for the same it should be called explore_json/any_api to get data_resource by running SQL_Query from back-end. Do you have any idea to manipulate/update on front-end to get dynamic offset-values( page-number*entries), just like as provided on client_side_pagination on table view.

lexisstv · 2020-04-15T10:06:39Z

Hi All, Any plans on this?

villebro · 2020-04-15T13:28:09Z

I'll probably be adding backend support for pagination to the new /api/v1/chart/data endpoint in the coming weeks as part of SIP-38 work. However, I currently can't speak for when support will be added to individual visualizations, as the current SIP-38 effort aims to address some fundamental problems in the current viz architecture. But hopefully these types of features can more easily be added by the community in the future, as the viz plugin development process becomes less technically challenging.

kunal15145 · 2020-09-02T11:01:49Z

Hi All, Any update on this?

ispulkit · 2020-09-03T10:18:57Z

@villebro do you have any idea when we're adding it to the table?

nuevoleonkx · 2020-09-22T19:01:07Z

As suggest by @villebro I applied offset parameter to query_obj by adding a offset-limit widget on table-chart. We are able to render the limited/specific( i.e every time, we have to provide static-offset-value) records based on provided offset value(static-offset-limit), but I could not reach @anandhurajan scenario(https://datatables.net/examples/data_sources/server_side) where offset value is dynamically created.

Our scenario is to get dynamic offset value based on page_number and show_entries, for the same it should be called explore_json/any_api to get data_resource by running SQL_Query from back-end. Do you have any idea to manipulate/update on front-end to get dynamic offset-values( page-number*entries), just like as provided on client_side_pagination on table view.

can you please share your approach? I'm trying to get this feature on 0.36. Thanks in advance

mistercrunch · 2020-09-23T05:37:35Z

Some notes here:

if we're not caching the result set, this means every page switch means a database scan
pagination to be deterministic assumes ordering, meaning that in @bolkedebruin 's example of 25M rows with 100 row pages, skimming through 10 pages of 100 row each means scanning / sorting 25M row for each page
this seems like potentially a lot work for the database to serve little to the user

Ultimately we could have "advanced pagination" options that let the user choose between server and client-side pagination options.

I think that when people say "I want to browse through millions of rows" it's pointing to other problems. It very rarely makes sense for anyone to comb through millions of individual rows.

aman101097 · 2021-01-08T11:42:42Z

Hi, Any solution for this?

frafra · 2021-01-26T09:40:38Z

limit/offset is not an option for pagination: it is terribly inefficient as all the previous pages should be computed each time. Please rely on primary keys if available.

Having a server-side cache would allow to have a coherent result (avoiding changes when navigating through the pages) and pave the way to fix big CSV export (#10953).

bolkedebruin · 2021-01-26T11:35:03Z

@mistercrunch @frafra The typical way to solve this is to rely on a primary key or on a index and remember where you left off. In mariadb terms SELECT ... WHERE ... AND id < $left_off ORDER BY id DESC LIMIT 100. This allows the database to make use of the physical layout of the data on disk and is extremely efficient. There is probably no need even to cache the result in case of traditional rdbms (e.g. Postgres, Mysql; Hive/Presto you do want to cache due to query lead times)

So what could be possible is to enable server side pagination for tables that have a primary key or an index and disable it for tables that don’t.

junlincc · 2021-01-31T22:09:14Z

Added this request to Superset roadmap, will get to it Q2-Q3.

villebro · 2021-02-22T11:06:14Z

FYI @bolkedebruin this is currently being implemented here

amitmiran137 · 2021-03-30T18:15:14Z

this was fully implemented lately:
apache-superset/superset-ui#985
Lets start closing issues ;)

imsuresh97 · 2021-07-23T07:51:45Z

Hi ,

I'm still facing the issue in server side pagination ... I'm not able to navigate to different pages ... I have attached the video recording ...

Screencast.2021-07-20.00_24_48.mp4

Uploading Screencast 2021-07-20 00_24_48.mp4…

issue-label-bot bot added the enhancement:request Enhancement request submitted by anyone from the community label Jul 1, 2019

stale bot added the inactive Inactive for >= 30 days label Oct 9, 2019

stale bot closed this as completed Oct 16, 2019

mistercrunch reopened this Dec 18, 2019

stale bot removed the inactive Inactive for >= 30 days label Dec 18, 2019

mistercrunch added the .pinned Draws attention label Dec 18, 2019

apache deleted a comment from issue-label-bot bot Jan 31, 2021

junlincc mentioned this issue Jan 31, 2021

[chart & dashboard] add hook to re-trigger chart query with new parameters apache-superset/superset-roadmap#151

Closed

junlincc added enhancement:committed Enhancement planned / committed by a contributor and removed enhancement:request Enhancement request submitted by anyone from the community labels Jan 31, 2021

amitmiran137 closed this as completed Mar 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Server side pagination for charts (tables) #7804

Server side pagination for charts (tables) #7804

bolkedebruin commented Jul 1, 2019

mistercrunch commented Jul 2, 2019

bolkedebruin commented Jul 2, 2019 •

edited

Loading

villebro commented Jul 2, 2019

villebro commented Aug 10, 2019

stale bot commented Oct 9, 2019

nytai commented Dec 18, 2019

anandhurajan commented Dec 19, 2019

villebro commented Dec 19, 2019

anandhurajan commented Dec 31, 2019

sivasankarsani737 commented Jan 7, 2020 •

edited

Loading

lexisstv commented Apr 15, 2020

villebro commented Apr 15, 2020

kunal15145 commented Sep 2, 2020

ispulkit commented Sep 3, 2020

nuevoleonkx commented Sep 22, 2020

mistercrunch commented Sep 23, 2020

aman101097 commented Jan 8, 2021

frafra commented Jan 26, 2021 •

edited

Loading

bolkedebruin commented Jan 26, 2021 •

edited

Loading

junlincc commented Jan 31, 2021

villebro commented Feb 22, 2021

amitmiran137 commented Mar 30, 2021

imsuresh97 commented Jul 23, 2021

Server side pagination for charts (tables) #7804

Server side pagination for charts (tables) #7804

Comments

bolkedebruin commented Jul 1, 2019

mistercrunch commented Jul 2, 2019

bolkedebruin commented Jul 2, 2019 • edited Loading

villebro commented Jul 2, 2019

villebro commented Aug 10, 2019

stale bot commented Oct 9, 2019

nytai commented Dec 18, 2019

anandhurajan commented Dec 19, 2019

villebro commented Dec 19, 2019

anandhurajan commented Dec 31, 2019

sivasankarsani737 commented Jan 7, 2020 • edited Loading

lexisstv commented Apr 15, 2020

villebro commented Apr 15, 2020

kunal15145 commented Sep 2, 2020

ispulkit commented Sep 3, 2020

nuevoleonkx commented Sep 22, 2020

mistercrunch commented Sep 23, 2020

aman101097 commented Jan 8, 2021

frafra commented Jan 26, 2021 • edited Loading

bolkedebruin commented Jan 26, 2021 • edited Loading

junlincc commented Jan 31, 2021

villebro commented Feb 22, 2021

amitmiran137 commented Mar 30, 2021

imsuresh97 commented Jul 23, 2021

bolkedebruin commented Jul 2, 2019 •

edited

Loading

sivasankarsani737 commented Jan 7, 2020 •

edited

Loading

frafra commented Jan 26, 2021 •

edited

Loading

bolkedebruin commented Jan 26, 2021 •

edited

Loading