Invalid plan for ORDER BY expression that refers to output column #4698

martint · 2016-03-03T01:28:16Z

The following query produces an incorrect plan. It resolves a in the ORDER BY clause with respect to the FROM clause, but it should be using the output column from the SELECT clause, instead.

SELECT 10/a AS a 
FROM (VALUES 1,2,3) t(a) 
ORDER BY 10/a

This is the resulting plan:

presto> explain select 10/a AS a FROM (VALUES 1,2,3) t(a) order by 10/a;
                    Query Plan                    
--------------------------------------------------
 - Output[a] => [expr:bigint]                     
         a := expr                                
     - Sort[expr ASC_NULLS_LAST] => [expr:bigint] 
         - Project => [expr:bigint]               
                 expr := (10 / "field")           
             - Values => [field:bigint]           
                     (1)                          
                     (2)                          
                     (3)

From the SQL spec, section "6.6 ":

i) If IC [identifier chain] is contained in an <order by clause> simply contained in a <query expression> QE, and the <select list> simply contained in QE directly contains a <derived column> DC whose explicit or implicit <column name> is equivalent to IC, then PIC1 [partial identifier chain] is a candidate basis, the scope of PIC1 is QE, and the referent of PIC1 is the column referenced by DC.

The text was updated successfully, but these errors were encountered:

Until now all the subqueries were converted (to some form of join) before applying filtering, aggregation, window or order by clauses, independently on where the subquery was placed. To enable subqueries for queries with aggregation above behaviour had to be changed. Depending on where the subquery is placed, it had to be applied at different processing moment in logical planner. Subquery in filter and group by clause is applied before aggregation. Subquery in having, project and order by clause is applied after aggregation. Hovewer if the same subquery expression is used in project, group by and/or having clause, then subquery have to be applied before aggregation and places where the subquery was used are replaced by the symbol of the subquery application.

kbajda · 2016-03-03T15:28:20Z

Btw. this is not directly related but we ran into this several time when porting SQL from other DB engines to Presto:

SELECT a*10 AS b 
FROM (VALUES 1,2,3) t(a) 
WHERE b>15;

Query 20160303_152228_00012_su828 failed: line 1:49: Column '"b"' cannot be resolved
SELECT a*10 AS b FROM (VALUES 1,2,3) t(a) WHERE b>15

It looks like you expect ORDER BY to handle aliases defined in SELECT but WHERE does not even see them.

martint · 2016-03-03T19:40:33Z

It looks like you expect ORDER BY to handle aliases defined in SELECT but WHERE does not even see them.

Yes, that's per the ANSI spec. The scope of the ORDER BY clause is the columns defined in the SELECT clause, but the scope of the WHERE clause is the columns derived from the FROM clause.

If the WHERE clause could refer to columns in the SELECT clause, the behavior of the following query would be undefined:

SELECT row_number() OVER () rn
FROM t
WHERE rn > 5

kbajda · 2016-03-03T20:30:32Z

That's fair. Obviously you can use a subquery to rename columns and use them later in WHERE. We just observed so many SQL engines (including commercial ones) leaning towards user convenience. I bet they just throw a parsing error when it is unclear/undefined as in your example.

kokosing · 2016-03-04T09:29:51Z

@martint I would like to fix this as it is related to correlated subqueries work (see https://docs.google.com/document/d/18HN7peS2eR8lZsErqcmnoWyMEPb6p4OQeidH1JP_EkA/edit#heading=h.sjurgitot2kp).

martint · 2016-03-15T01:06:04Z

@kokosing go for it!

kokosing · 2016-05-18T06:14:16Z

Strange thing. I checked the PSQL and it looks they have the same bug. I have run the following query:

SELECT a * -1 as a, b 
FROM (VALUES (-1,4), (0, 3), (1, 2)) t(a,b) 
ORDER BY a * -1 
LIMIT 1;

and it returned -1, 2. Which means that multiplication by -1 was applied only once on original column t.a.

Did I (we?) miss something, or PSQL has a bug which seems unrealistic to me.

kokosing · 2016-05-18T06:31:21Z

I am getting lost.

I have checked following queries in PSQL:

SELECT a * -1 as a, b 
FROM (VALUES (-1,4), (0, 3), (1, 2)) t(a,b) 
ORDER BY a 
LIMIT 1;

returns -1, -2, while query

SELECT a * -1 as a, b 
FROM (VALUES (-1,4), (0, 3), (1, 2)) t(a,b) 
ORDER BY a * 1
LIMIT 1;

returns 1, 4.

I checked the explain and it looks like when expression in ORDER BY contains only output column alias (a) then output column is used, but when a is used in an expression then table column is used. I will check that in MYSQL.

I am not a native English speaker so I could misunderstood the section copied from SQL specification. But from what I understood ORDER BY expression can access columns returned by SELECT list (projection), that would mean that output column should be used all the time for above queries.

kokosing · 2016-05-18T06:43:56Z

@martint The same thing with MYSQL. So it looks like your query was processed properly (at least according to MYSQL and PSQL), that would mean that is not a bug, that is a feature ;)

kokosing · 2016-05-18T07:04:41Z

In case I name output column uniquely then:

MYSQL works as I would expect. It is using output column reference in all the ORDER BY expressions
PGSQL is able to run ORDER BY when there is only a reference to output column. When output column is used in expression (like c * -1) then it raises that column c is unrecognized.

In regards what can do MYSQL and PGSQL they are consistent to each other.

I wonder which path we should take. Should we follow MYSQL or SQL spec?

kokosing · 2016-05-25T08:40:40Z

@martint ping. Can you please comment what you think is expected behavior for presto in regards of this bug, considering MYSQL and PSQL behavior?

martint · 2016-05-25T18:15:15Z

I'm re-reading that part of the spec. I could've have misinterpreted or missed something.

Also, I'm looking at the relevant parts in the "SQL:1999 - Understanding Relational Language Components" book. In the section about ORDER BY (in the context of cursors) it says:

[...] if your ORDER BY clauses use expressions -- even if the expressions reference only those columns that otherwise appear in the select list [...]

which would seem to imply that names should be resolved against the SELECT list, not the source table.

kokosing · 2016-05-30T11:37:32Z

Ok. Sounds good to me. However we need to be aware of that Presto will behave differently than MYSQL and PSQL. I am going to fix this on top of #5302 .

Fixes prestodb#4698.

kokosing · 2016-06-06T10:57:47Z

@martint I think I know why PSQL and MYSQL do not support this. I spent on this a week and this exploded in my hands. Here is my WIP code: https://github.com/Teradata/presto/tree/feature_gko_scopes_fix_order_by_bug

To fix the bug I wanted ORDER BY to use query output relation. This did not work as ORDER BY can access source relation columns as well. So I added source relation columns (which were not overridden by output relation) as hidden columns to output relation.This change required a lot of changes of the parser and the planner. Then I reached the problem is with aggregation. Currently all aggregation functions are collected from different parts of the query: select list, having, group by, order by. These aggregation can only use source relation columns. With my changes I would need to handle aggregations from ORDER BY separately and when output relation columns is for aggregation raise an exception. This would make below query to fail:

SELECT max(a) AS a FROM t ORDER BY min(a);

Handling this requires more changes. I am not sure if it is worth doing this as it is only about ORDER BY clause.

SQL standard specifies that ORDER BY expressions can reference projection outputs. For instance following query works: SELECT x as y FROM (VALUES(1) t(x)) ORDER BY y*-1; This commit adds support for referencing projection outputs from ORDER BY expressions. This fixes prestodb#4698

martint added the bug label Mar 3, 2016

martint assigned kokosing Apr 6, 2016

kokosing assigned martint and unassigned kokosing May 25, 2016

kokosing added a commit to Teradata/presto that referenced this issue Jun 6, 2016

Fix ORDER BY expression to use output columns

a323988

Fixes prestodb#4698.

kokosing mentioned this issue Dec 5, 2016

Support for complex ORDER BY expressions referencing projection output #6766

Closed

martint closed this as completed in ec2e897 Dec 12, 2016

martint added this to the 0.161 milestone Dec 12, 2016

martint mentioned this issue Dec 13, 2016

(intentional?) variable scope regression in ORDER BY #6867

Closed

ktmud mentioned this issue Mar 11, 2021

fix(query): order by adhoc metrics should trigger group by apache/superset#13434

Merged

6 tasks

ktmud mentioned this issue Mar 23, 2021

fix(sqla-query): order by aggregations in Presto and Hive apache/superset#13739

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid plan for ORDER BY expression that refers to output column #4698

Invalid plan for ORDER BY expression that refers to output column #4698

martint commented Mar 3, 2016

kbajda commented Mar 3, 2016

martint commented Mar 3, 2016

kbajda commented Mar 3, 2016

kokosing commented Mar 4, 2016

martint commented Mar 15, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 25, 2016

martint commented May 25, 2016

kokosing commented May 30, 2016

kokosing commented Jun 6, 2016

Invalid plan for ORDER BY expression that refers to output column #4698

Invalid plan for ORDER BY expression that refers to output column #4698

Comments

martint commented Mar 3, 2016

kbajda commented Mar 3, 2016

martint commented Mar 3, 2016

kbajda commented Mar 3, 2016

kokosing commented Mar 4, 2016

martint commented Mar 15, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 18, 2016

kokosing commented May 25, 2016

martint commented May 25, 2016

kokosing commented May 30, 2016

kokosing commented Jun 6, 2016