Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Expression][Perf] Take advantage of the new column JSON format #33290

Closed
monfera opened this issue Mar 15, 2019 · 4 comments
Closed

[Expression][Perf] Take advantage of the new column JSON format #33290

monfera opened this issue Mar 15, 2019 · 4 comments
Labels
enhancement New value added to drive a business result Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:large Large Level of Effort performance Team:Visualizations Visualization editors, elastic-charts and infrastructure

Comments

@monfera
Copy link
Contributor

monfera commented Mar 15, 2019

The column JSON format is provided by the solution for elastic/elasticsearch#37702

Instead of the often used row format:

{ rows: [ {alpha: 3, beta: 2}, {alpha: 5, beta: 6}, {alpha: 6, beta: 4}, ...] }

or

{ rows: [ [3, 2], [5, 6], [6, 4], ...], columns: ['alpha', 'beta'] }`

the column oriented format:

{ columns: { alpha: [3,5,6,7,4,2,....], beta: [2,6,4,3,2,1,...] } }

has these benefits:

  1. It yields a smaller payload, as the noise character count per record equals the column count (it's 3x as much for row arrays, and a lot more for row objects)
  2. It should compress better, because like data (eg. all integers) come sequentially, unbroken by type/domain changes and the noise (row) delimiters (sorted, low cardinality columns should compress even better)
  3. Many operators (here, in the pipeline expression language) can be implemented more efficiently with a columnar format
  4. Many 3rd party renderers (eg. Plotly, Highcharts), and performance-oriented renderers in general, work with column data; transposing large arrays is expensive, blocks the main thread and generates garbage that adds to jank (frame drops)
  5. It leads to a nice internal format (the array of arrays needs a prop per column, for the respective column names - why not put the data vector there in the first place)
  6. (...future) This format lends itself well to advanced compressions common in the industry, such as delta encoding and run-length encoding, assuming eventual server side support for these

To take advantage of the column oriented format:

  1. The data source functions should be able to query, via an argument (eg. format="column"), in the column format
  2. An automatic type conversion from column to row oriented format should be done for functions which only take one of the formats
  3. At least the functions benefitting from a column format most, should be changed such that they can take the columnar format directly, without transposing
  4. Apply a heuristic, that, in the absence of a query specifier arg in the query (row vs column) loops through the subsequent steps of the expression, and decides which format to use based on the first processing node it encounters that can take only one of the formats. Alternatively, execute the pipeline such that it switches to the other format as infrequently as possible, while preferring the "natural" orientation for each function
  5. (...future) a proper pipeline optimizer should decide, based on the expressions of interest, which format to query in, and how to further represent data across the subsequent data processing steps, but this could involve loop fusion, and reorderings eg. pushing down selections, pulling up join-like operations etc. based on equational semantics as in relational algebra

While the list gets daunting toward the end, there are lots of benefits to be had by just Item 1 - this lets us pipe column data directly from a query into a column-based renderer, and Item 2 lets everything work without breaking anything, even if the user starts with a columnar query but (some of the) rest of the node functions don't natively handle the column format. Item 3 is still straightforward and may speed up existing use cases.

@monfera monfera added performance enhancement New value added to drive a business result Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas labels Mar 15, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-canvas

@costin
Copy link
Member

costin commented Mar 15, 2019

Re the first 6 - ES and SQL already supports CBOR and SMILE, which are cross-platform and I would expect provide excellent compression.

@cqliu1 cqliu1 added the loe:large Large Level of Effort label Mar 15, 2019
@spalger spalger added v7.2.1 and removed v7.2.0 labels Jun 25, 2019
@nickpeihl nickpeihl added the impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. label Mar 9, 2023
@cqliu1 cqliu1 added Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) and removed Team:Presentation Presentation Team for Dashboard, Input Controls, and Canvas Feature:Canvas labels Apr 6, 2023
@botelastic botelastic bot added the needs-team Issues missing a team label label Apr 6, 2023
@cqliu1 cqliu1 changed the title [Canvas][Perf] Take advantage of the new column JSON format [Expression][Perf] Take advantage of the new column JSON format Apr 6, 2023
@cqliu1 cqliu1 added the Team:Visualizations Visualization editors, elastic-charts and infrastructure label Apr 6, 2023
@botelastic botelastic bot removed the needs-team Issues missing a team label label Apr 6, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-visualizations @elastic/kibana-visualizations-external (Team:Visualizations)

@stratoula
Copy link
Contributor

This is a request for canvas which supports SQL but there is no plans for adding more features in canvs so I am closing this. We can always reopen if this becomes relevant again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Feature:ExpressionLanguage Interpreter expression language (aka canvas pipeline) impact:low Addressing this issue will have a low level of impact on the quality/strength of our product. loe:large Large Level of Effort performance Team:Visualizations Visualization editors, elastic-charts and infrastructure
Projects
None yet
Development

No branches or pull requests

9 participants