-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cumulative sum expression function #80129
Conversation
Pinging @elastic/kibana-app-arch (Team:AppArch) |
Pinging @elastic/kibana-app (Team:KibanaApp) |
newRow[column] = Number(newRow[column]) + accumulatorValue; | ||
accumulators[bucketIdentifier] = newRow[column]; | ||
} else { | ||
newRow[column] = accumulatorValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thinking about this again, it does seem a bit weird that the comulative sum would overwrite the current value, or at least that being the only option.
could we add another argument, name
which would be the name of the new column ? (what map does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and i guess (what do you think @lukeelmers ) we can still replace the column if no name is provided ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it but didn't implement eventually because you can put that together quite easily yourself using mapColumn
like in the example in the description (trying to keep the function as single-purpose as possible). I'm fine with adding it if you think it's the right thing though, in that case what about input
and output
arguments (name
/column
isn't really descriptive anymore)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it again, what about the following API?
name
or unnamed arg are the target column and a required argument.from
is an optional argument and references the column to calculate the cumulative sum for - if it's not set, it defaults toname
Then you can do these:
demodata | cumulative_sum "price"
demodata | cumulative_sum "price" by="state"
demodata | cumulative_sum "cumulative_price" from="price" by="state"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured
Count
andCumulative sum of count
- we need to use the original column, and create a copy. -
There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column
For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to not overwriting the column (which it sounds like we can't do anyway)
There are no existing expression functions which are able to create a column with the fieldFormatter params
As an aside: I didn't realize mapColumn
doesn't copy the full meta, but perhaps it should be provided as an option there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we ever want to change formatter (looking at the issue linked above)? i think that's not a good idea. I think the resulting column should keep the same format information as original column had (why would doing a sum/derivative/ some other calculation affect the format ?)
also thinking about it again, seems overriding existing column:
- can't be done for various reasons where we need source column later
- could be an edge case, but we can also achieve this by just dropping original column later, so i think we should leave the option to replace the column out, which makes target column id/name required parameters (if name is not provided we can use name=id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we ever want to change formatter (looking at the issue linked above)?
I'm not sure about the use case for changing the formatter either, but it does feel like if we are making any expression functions that copy columns (currently just mapColumn
) we should at least be preserving the full meta, or as much of it as makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely see your point @lukeelmers, but mapColumn
is not necessarily mapping just a single column, it's acting on the whole row. Maybe a separate function would be justified. I will create a separate issue for discussing this (IMHO it doesn't block moving forward with this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ Yeah it shouldn't block moving forward with this, but point taken on it acting on a whole row.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't done a full review of this, but wanted to leave my API feedback first.
src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts
Show resolved
Hide resolved
newRow[column] = Number(newRow[column]) + accumulatorValue; | ||
accumulators[bucketIdentifier] = newRow[column]; | ||
} else { | ||
newRow[column] = accumulatorValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
We can't replace the column in Lens because the same inner column could be used for multiple metrics. Imagine that the user has configured
Count
andCumulative sum of count
- we need to use the original column, and create a copy. -
There are no existing expression functions which are able to create a column with the fieldFormatter params, or that can "copy" an existing column
For both these reasons, my preference is to use the API we last discussed here: #61775 (comment)
Thanks for pointing me to the derivative API discussion @wylieconlon - I will implement that, it looks like it nicely covers all cases. |
Adjusted the behavior:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the tests here very difficult to verify, so I submitted a PR to try to simplify them. Also did some performance testing using the expression debug mode using Canvas, and a dataset of 10k rows in Canvas. Baseline results with 10k rows was that this function executed in 16ms without grouping, and 19ms with a single grouping. Compared to mapColumn
with a fixed string value on the same 10k rows, which executes in 3800ms, this is a huge improvement.
}, | ||
{ inputColumnId: 'val', outputColumnId: 'output' } | ||
) | ||
).toMatchInlineSnapshot(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found these snapshot-based tests very difficult to glance through and to match up the input vs output expected values. In particular, the snapshots are asserting some extra information that I don't think we require tests for, like testing that the previous rows and columns are kept.
What do you think about changing the test style to the shortest-possible code?
expect(result.columns).toContainEqual({
id: 'output',
name: 'output',
meta: { type: 'number' },
});
expect(result.rows.map((row) => row.output)).toEqual([5, 12, 15, 17]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s a great idea @wylieconlon , feel free to push directly to this PR, if it’s not blocking any other efforts I can also clean it up on Friday.
src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts
Outdated
Show resolved
Hide resolved
import { ExecutionContext } from '../../../execution/types'; | ||
import { Datatable } from '../../../expression_types/specs/datatable'; | ||
|
||
describe('interpreter/functions#cumulative_sum', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a missing test case where the initial values are null, by reading through the elasticsearch test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
}, | ||
{ inputColumnId: 'val', outputColumnId: 'output' } | ||
) | ||
).toMatchInlineSnapshot(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ I was thinking the same thing, would be a bit easier to read if we maybe had just one snapshot test, but asserted the specific values on the others.
src/plugins/expressions/common/expression_functions/specs/cumulative_sum.ts
Show resolved
Hide resolved
@elasticmachine merge upstream |
Thanks for the review and for the test improvements! I cleaned up the rest of the comments. |
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code updates LGTM, pending a green build
@elasticmachine merge upstream |
💚 Build SucceededMetrics [docs]@kbn/optimizer bundle module count
distributable file count
page load bundle size
History
To update your PR or re-run it, just comment with: |
@wylieconlon Do you want to have another look at this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…lout-for-warm-and-cold-tier * 'master' of github.com:elastic/kibana: (126 commits) Add cumulative sum expression function (elastic#80129) [APM] Fix link to trace (elastic#80993) Provide url rewritten in onPreRouting interceptor (elastic#80810) limit renovate to npm packages Fix bug in logs UI link (elastic#80943) [Monitoring] Fix bug with setup mode appearing on pages it shouldn't (elastic#80343) [Security Solution][Detection Engine] Fixes false positives caused by empty records in threat list docs test (elastic#81080) Fixed alerts ui test timeout issue, related to the multiple server calls for delete all alerts, by reducing the number of alerts to the two and increasing retry timeout. (elastic#81067) [APM] Fix service map highlighted edge on node select (elastic#80791) Fix typo in toast, slight copy adjustment. (elastic#80843) [Security Solution] reduce optimizer limits (elastic#80997) [maps] 7.10 documentation updates (elastic#79917) [Workplace Search] Fix Group Prioritization route and clean up design (elastic#80903) [Enterprise Search] Added reusable HiddenText component to Credentials (elastic#80033) Upgrade EUI to v29.5.0 (elastic#80753) [Maps] Fix layer-flash when changing style (elastic#80948) [Security Solution] [Detections] Disable edit button when user does not have actions privileges w/ rule + actions (elastic#80220) [Enterprise Search] Handle loading state on Credentials page (elastic#80035) [Monitoring] Fix cluster listing page in how it handles global state (elastic#78979) ...
Summary
Related to #61776
This PR adds a "cumulative sum" expression function. It's intended to be used as part of the upcoming Lens operation, but can also be used within Canvas as a standalone function.
Example:
This PR doesn't add public documentation as part of the Canvas function reference - we have to decide whether we want to advertise the function like this.
The exact behavior (especially around edge cases) is described as a comment of the function implementation - this can serve as a basis for the public documentation.
https://github.com/elastic/kibana/pull/80129/files#diff-8b017bbbf9f7d6623f47dbc16f422007R40
Checklist