-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't ignore failed optimizer rules #4615
Comments
Make sense to me. It's a good discussion point. |
I think the original "ignore optimizer rule if error" was added (maybe by @andygrove or @avantgardnerio ) because at that time there was no way to distinguish between "Optimizer didn't error but could not handle the query" and "optimizer hit a real error" Now that |
Related #4619 |
The It could be good if we could extract the
If we could do this, one different idea of the return type of the optimizer is that I think the optimizer should return an
|
On application of failing during an optimization may also be to detect malformed plans early. E.g. while working on #4370 I've figured that we could reject malformed regex patterns that are available during planning as literal early. I'm wondering if this should then be done by failing the optimization (and by fixing this very issue) or by inserting "terminal"/"always error" expressions into the plan that are later picked up by the physical planner. |
I agree this would be a good change @liukun4515 made a similar observation / suggestion #3582
I recommend the inserting an "always error" expr node. The rationale being that you could have branches that are never actually hit in the query A trivial example is like CASE
WHEN FALSE THEN 1 / 0
WHEN TRUE THEN 5.0
END |
I would personally rather have an InternalError rather than a panic |
That's correct. Now that we have the |
My suggestion is to fail as early as possible if we have detected some errors in the logical plan.
Why do we cover up the error, instead of exposing them? Besides, we should not return a compile error at runtime, even if the branch won't be hit. No static typed language would do this (I guess SQL is not a dynamic typed language 🤔). |
I tried to remove the config |
Well I think the rationale / question is if SQL expression semantics allow / require eager evaluation or if delayed evaluation A more realistic example might be to use the CASE to explicitly guard against a divide by zero CASE
WHEN num <> 0 THEN value / num
ELSE 0.0
END So in that case evaluating |
These are the variants for runtime behavior, not related to compile time checking I think. But if the invalid expression can not be removed at compile time, a |
The "when to check errors" is highly dependent on the database. I've worked with MS cloud SQL and Exasol before and in some cases, type errors (e.g. due to incompatible operators or functions that only accept a certain type) were only raised if at least a single row was passed throw it (post joins and where selections). These errors happened at query execution (not planning). From a user's perspective this was highly confusing. I recommend we take a look at the SQL standard and if it allows for it we bail out of invalid queries as early as possible, no matter if there's any data to query. |
I believe that the SQL standard says it is "implementation defined":
...
|
We recently ran into this when implementing gap-filling rule
https://github.com/influxdata/influxdb_iox/issues/7330 In this case the rule is really performing "lowering" (compiler jargon) in the sense that we need to get rid of this UDF for the query to be valid. Any error that occurs should be fatal for the query, so we can return an appropriate message to the user. So I just want to voice that |
I updated the description with a workaround (to set |
FWIW @HaoYang670 's ticket #4685 describes the work needed to make sure all queries pass with |
I hope to look into this sometime over the next two weeks |
@jackwener has a PR up to fix this ❤ #6265 |
Describe the bug
Some logical optimizer rules like type coercion are actually essential for correct query execution, but are currently silently ignored when they failed.
To Reproduce
TBD
Expected behavior
Don't skip failed rules (this probably requires us to remove the non-failing
OptimizerRule::optimize
method).Workaround
This behavior is controlled by a config option (that defaults to
true
meaning to silently skip failed errors).You can see this in the datafusion cli:
Datafusion can be configured to fail hard if an optimizer rule fails like
Additional context
See this discussion.
The text was updated successfully, but these errors were encountered: