-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL: SUM of multiple 0 values returns NULL #45251
Comments
Pinging @elastic/es-search |
Pinging @elastic/es-analytics-geo |
I think this is only fixable if we start serializing counts (or at least an |
indeed this looks like it. The issue was raised since folks in Canvas bumped into this one and it took a while for all involved to figure out what's happening since the |
As a stop-gap solution in the mean time, could SQL embed a Not saying we shouldn't improve the internal agg representations, just that it'd be a 7.x+ improvement and older versions would still be stuck. :( |
Is there anything one can do to work around this? |
Answering my own question. I think this should work:
A bit overhead, but for my use case it's not so much of a change. I had
Since COUNT(0)/COUNT(0) cancel itself out, I have now:
|
Neither solution is ideal as it requires either two aggs or an approximation. |
@elastic/es-ql |
This is how PostgreSQL is behaving regarding SUM, for the following data:
For the query
Obviously, for the query that is explicitly looking at 0 values
|
I'm sold. I think it'd be pretty breaking for us to return |
@astefan and @nik9000 The given queries have (I think) nothing to do with the problem. SELECT SUM(val) FROM test WHERE val=123; summing no row of course should give null SELECT SUM(val) FROM test WHERE val=0; summing 0s of course should give 0 But try this: INSERT INTO public.test (val, gender) VALUES (NULL, 'M');
SELECT SUM(val) FROM test WHERE gender='M'; In elastic search this'll give NULL as one row contains NULL as a val while postgres will give 10000. |
@Skeeve the ticket is about how Elasticsearch
Not sure where you got that 10000 from. CREATE TABLE test (
val int,
gender varchar(40)
);
INSERT INTO test (val, gender) VALUES (NULL, 'M');
SELECT * FROM test;
SELECT val FROM test WHERE gender='M';
SELECT SUM(val) FROM test WHERE gender='M'; returns:
|
@costin Of course I was working with the previous example. You misse: INSERT INTO public.test (val, gender) VALUES (0, 'M');
INSERT INTO public.test (val, gender) VALUES (0, 'F');
INSERT INTO public.test (val, gender) VALUES (0, 'M');
INSERT INTO public.test (val, gender) VALUES (0, 'M');
INSERT INTO public.test (val, gender) VALUES (0, 'M');
INSERT INTO public.test (val, gender) VALUES (0, 'F');
INSERT INTO public.test (val, gender) VALUES (10000, 'F');
INSERT INTO public.test (val, gender) VALUES (10000, 'M');
INSERT INTO public.test (val, gender) VALUES (10000, 'F');
INSERT INTO public.test (val, gender) VALUES (10000, NULL); Issue is, afair: elasticsearch wrongly gets "null" as a sum when at leas one row has null. UPDATE: I think I mixed it up a bit… I seem to remember: A sum of Zeros in ES gave null… Sorry. |
I can't reproduce this - the issue is the opposite, |
mark |
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for elastic#45251 .
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue #45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed.
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue elastic#45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)
Previously the SUM(all zeroes) was `NULL`, but after this change the SUM SQL function call is automatically upgraded into a `stats` aggregation instead of a `sum` aggregation. The `stats` aggregation only results in `NULL` if the there were no rows, no values (all nulls) to aggregate, which is the expected behaviour across different SQL implementations. This is a workaround for the issue #45251 . Once the results of the `sum` aggregation can differentiate between `SUM(all nulls)` and `SUM(all zeroes`) the optimizer rule introduced in this commit needs to be removed. (cherry-picked from b74792a)
I think this has been addressed by #65796 and is fixed since 7.11. Given the data:
The query
Also, the queries with individual filters for |
@Luegg That's true but the issue is still open because it's currently addressed with a workaround on SQL side: https://github.com/elastic/elasticsearch/pull/65796/files#diff-f89f100e88759827c9f9cbceed83a4efb475521a3e42d489584917807df11190R987 but the ideal solution would be that this is fixed in the Aggs. |
To avoid confusion in the future, let's open an issue for the aggregation team (if one doesn't already exist), link it to this ticket for future reference and close this ticket down. |
+1 |
I've found the pretty recent #71582 covering the issue on the aggregation side. Since the change requires a migration path it seems to not be of high priority. Another possible alternative besides changing the |
Since the SQL `SUM` function behaves as expected, elastic#45251 can be closed. As soon as elastic#71582 is resolved, we can go back to using the `sum` aggregation instead of `stats`.
Given:
SELECT SUM(val) FROM test WHERE val IS NOT NULL GROUP BY name
val
values are 0 for a certainname
valueThe result of the above query will return NULL, whereas the result should be 0. This boils down to
AggregationInspectionHelper.hasValue(InternalSum)
that will returntrue
/false
if the sum is 0: https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/search/aggregations/support/AggregationInspectionHelper.java#L160We use these methods to decide if the result of aggregations has a value or not.
Relates to #36020. CC @polyfractal @costin
The text was updated successfully, but these errors were encountered: