Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for non-fallthrough null operations in map/flatMap/exists #1302

Merged
merged 1 commit into from
Feb 10, 2019

Conversation

deusaquilus
Copy link
Collaborator

@deusaquilus deusaquilus commented Jan 21, 2019

Fixes for:
#1053

Required prior to:
#1295

Problem

As outlined in #1053, Quill's Option[T].map, and flatMap operations are fundementally broken because they rely on ANSI null-fallthrough behavior (i.e. null || 'foo' is null) which does not work for all cases e.g:

// optionStr:Option[String]
optionStr.map(str => if (str == "value") "foo" else "bar").getOrElse("baz")

will become:

select case when 
(case when str = 'value' then 'foo' else 'bar' end) is not null then
(case when str = 'value' then 'foo' else 'bar' end) 
else 'baz' end

Which will never be 'baz' since when str is null the conditional does not "fall through" and become null like the string concatonation calse.

Oracle makes this problem from tolerable to intractable because it's concat operation is not ansi compliant. This is to say, null || 'foo' is foo as opposed to null. That means that if you translate

optionStr.map(str => str + "-suffix").getOrElse("otherwise")

Into:

select case when 
(str || '-suffix') is not null then
(str || '-suffix') 
else 'otherwise' end

Then str will never be the value 'otherwise' since in oracle null || '-suffix' becomes '-suffix'. Unlike for conditionals where this kind of behavior is annoying but bearable, simple concatonation breaking fundemental assumptions of map and flatMap is too inconsistent to allow.

These issues combined, force us to introduce explicit null checking into Quill SQL statements whnever map, flatMap or exists are used.

Solution

Add conditional checking to FlattenOptionOperation. Also due to issues outlined in #1053, a new AST for table-specific optional map and flatMap has to be introduced. Also, a SimplifyNullChecks normalization was introduced in order to reduce overly complex null checks that can occour.

Checklist

  • Unit test all changes
  • Update README.md if applicable
  • Add [WIP] to the pull request title if it's work in progress
  • Squash commits that aren't meaningful changes
  • Run sbt scalariformFormat test:scalariformFormat to make sure that the source files are formatted

@getquill/maintainers

@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jan 22, 2019

It's interesting to think about the potential issues with AST replacement and simplification. Say that I have a statement like this:

query[Holder].map(h => h.value.map(_ + "foo").map(_ + "bar"))

Let's have a look at the particular statement:

h.value.map(_ + "foo")

Which should become:

-- AST: if(h.value != null) h.value + "foo" else null
CASE WHEN h.value IS NOT NULL THEN h.value + 'foo' ELSE null

(Note that SQL concat is typically not + but I'm using it for clarity)

Then when you add the additional .map statement, this entire clause (at least naively) has to be wrapped with another if(... != null) ... else null. That means that as a result of the nesting, you get a behemoth like this:

-- AST: if(
-- if(h.value != null) h.value + "foo" else null != null) 
-- if(h.value != null) h.value + "foo" else null 
-- + "bar" 
-- else null
CASE WHEN 
CASE WHEN h.value IS NOT NULL THEN h.value || 'foo' ELSE null END IS NOT NULL THEN 
CASE WHEN h.value IS NOT NULL THEN h.value || 'foo' ELSE null END 
+ 'bar'
ELSE null END 

It's merely the same pattern recursed. This is also how Slick behaves.

Now, I introduced the "Right Hand Rule" in SimplifyNullChecks which reduces it to a slighly more manageable:

SELECT 
CASE WHEN h.value IS NOT NULL AND (h.value || 'foo') IS NOT NULL THEN 
CASE WHEN h.value IS NOT NULL THEN h.value || 'foo' ELSE null END 
+ 'bar' 
ELSE null END 
FROM Holder h

Unfortunately however, the repetition of CASE WHEN .... still occours.
It is important to mention that "Right Hand Rule", "Left Hand Rule", and "Center Rule" can easily be proven. On the other hand to do a transformation like this:

if(
if(h.value != null) h.value + "foo" else null != null) 
/*arbitrary ast*/ if(h.value != null) h.value + "foo" else null /*arbitrary ast*/
else null
// (transform to) =>
if(
if(h.value != null) h.value + "foo" else null != null) 
h.value + "foo"
else null

... is very, very difficult to pull-off since multiple layers of nesting, binary operators, etc... can all be holding values of if(h.value != null) h.value + "foo" else null != null) and potentially with multiple repetitions (i.e. since map(r => ...) can reuse r multiple times).

Due to all this, I will not implement "if statement simplification" beyond what I have done already unless there is significant user demand because of the potential complexity of going any further, and the potential brittleness that can result as a consequence.

Just verified that Slick behaves exactly the same way as this, which seems to be all the more reason to move forward with this approach.

Edit: Comment about Slick

@deusaquilus deusaquilus force-pushed the option_fix branch 2 times, most recently from 0780d34 to 27f033b Compare January 25, 2019 05:07
@deusaquilus deusaquilus changed the title [WIP] Fix for non-fallthrough null operations in map/flatMap/exists Fix for non-fallthrough null operations in map/flatMap/exists Jan 27, 2019
@deusaquilus
Copy link
Collaborator Author

deusaquilus commented Jan 27, 2019

Hey @fwbrasil. This is ready to review now. The build is failing because of #1307 which is an unrelated build issue. #1310 is the same as this commit with some additional tests.

I changed the way Option[T].map/flatMap/exists/forall work so that they are completely safe and cross-database compatiable. However, since their new behavior can make queries slow (especially since Option[T].map/flatMap get expressed as CASE IF ...), I decided to add a Option[T].mapUnchecked/flatMapUnchecked/existsUnchecked/forallUnchecked which still have the old optional behavior in case you really need it for performance.

@@ -0,0 +1,74 @@
package io.getquill.ast

import io.getquill.ast.{ BinaryOperation => B }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to avoid these aliases, they make the code harder to follow

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@deusaquilus deusaquilus force-pushed the option_fix branch 4 times, most recently from 835a2b1 to 9161729 Compare January 29, 2019 06:12
.gitignore Outdated
@@ -12,8 +12,11 @@ Bug.scala
*.gz
quill-jdbc/io/
quill-sql/io/
/io/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes necessary?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah. Not really. I can take out.

case OptionIsEmpty(ast) => stmt"${ast.token}.isEmpty"
case OptionNonEmpty(ast) => stmt"${ast.token}.nonEmpty"
case OptionIsDefined(ast) => stmt"${ast.token}.isDefined"
case UncheckedOptionFlatMap(ast, alias, body) => stmt"${ast.token}.flatMapUnchecked((${alias.token}) => ${body.token})"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should have the *Unchecked methods. A better way of doing this would be allowing users to use option.get, but I think we should avoid it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned with potential performance issues that might be introduced from forcing our users to have case statements in their where clauses whereever there are nullable fields. Say that they just want to do:

case class Person(name:String, age:Option[Int])
query[Person].filter(p => p.age.map(_ + 1).exists(_ > 18))

The way it is now it's just:

select ... from Person p where p.age + 1 > 18

Now with this change it would be:

select ... from Person p where 
((case when p.age is not null then p.age + 1 else null) is not null) &&
((case when p.age is not null then p.age + 1) > 18)

This is significantly more complex for a query analyzer to optimize and the query itself is much less readable (Btw, slick currently does something like this!).
Thus, in this particular query, using the old way (i.e. the way it is now) does not pose any risk of non-ANSI behavior in any database and as a user, in certain places I would definitely prefer it.
This is why I introduced the Unchecked* methods.
So that our users can control what happens to their queries.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'd need only option.get to achieve that, no?

Copy link
Collaborator Author

@deusaquilus deusaquilus Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work but wouldn't users expect a NoSuchElementException to be thrown in many of those cases? Say you have something like this:

case class Person(name:Option[String])
query[Person].map(p => Some(p.name.get + " M.D."))

Normally, if p.name is None, we would expect a NoSuchElementException to be thrown but in our case, the value returned from the query would be None. I think this is pretty inconsistent and might confuse users.

I think we forbade Option.get for a good reason, it's too hard to guarentee consistency with the regular scala API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get is as unsafe as the *Unsafe methods, I'd prefer to have it instead of adding new methods to the DSL. Maybe we could call it unsafeGet?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, much better! I wonder if we should add it for now, though. I'd wait for users to push for it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, well if we do it in the valueParser like we do Some.apply, Option.apply etc... it's practically trivial. Just do:

case q"$o.orNull" if is[Option[Any]](o) => astParser(o)

We don't need additional AST elements to make it work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AST should match the original code as much as possible. It's the case that orNull has this behavior for SQL but it might not be the case in other target languages. That's why we flatten the options in the sql module

Copy link
Collaborator Author

@deusaquilus deusaquilus Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. If really you want to go this direction, let's do it right. I suggest we replace this:

    case q"scala.None"                   => NullValue
    case q"scala.Option.empty[$t]"       => NullValue
    case q"scala.Some.apply[$t]($v)"     => astParser(v)
    case q"scala.Option.apply[$t]($v)"   => astParser(v)

With this:

    case q"scala.None"                   => OptionNone
    case q"scala.Option.empty[$t]"       => OptionNone // or maybe even have OptionEmpty?
    case q"scala.Some.apply[$t]($v)"     => OptionSome(astParser(v))
    case q"scala.Option.apply[$t]($v)"   => OptionApply(astParser(v))
    case q"$o.orNull[$t]($v)" if is[Option[Any]](o) => OptionOrNull(astParser(o))

That way in the future, when we want a typed-AST, we have a decent chance of actually making optionals work properly. Wdyt?

I have an active issue for this actually: #1059. I think we should either solve this whole issue of option container-types here and knock out #1059, or we should just let orNull be treated the same way as Some and Option.apply. I don't think we should just make a container for orNull, that's a very disjointed approach.

Copy link
Collaborator Author

@deusaquilus deusaquilus Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot! orNull does not work for Int, Double, etc... How about let's use orNullValue in situations where orNull won't work.

}
}

object Exist {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wdyt about renaming these to IsNotNullCheck/isNullCheck/etc

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NP. Will do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, how about just IsNotNull/isNotNull? Also Exist would become IsNull/isNull.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be confusing as well. If I see IsNull(ast), it gives me the impression that it checks if the ast is null or the AST contains a null literal constant

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The only other thing I can think of is IsNotNullPhrase so it sounds more like a "kind of statement" instead of a "kind of action". Or do you like IsNotNullCheck more?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer IsNotNullCheck but don't feel strongly about it. Few free to choose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel that strongly about it either. We can go with IsNotNullCheck

@@ -13,6 +13,13 @@ private[dsl] trait QueryDsl {
@compileTimeOnly(NonQuotedException.message)
def querySchema[T](entity: String, columns: (T => (Any, String))*): EntityQuery[T] = NonQuotedException()

implicit class NullableColumnExtensions[A](o: Option[A]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we keep these methods, they should be @compileTimeOnly and throw NonQuotedException()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. Will do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're all removed

import io.getquill.norm.BetaReduction

object Implicits {
implicit class AstOpsExt(body: Ast) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid these implicits and use the objects directly? I'm afraid people will see the method available and use it even though it doesn't work for all ast nodes. Also, we could restrict the objects to support only BinaryOperation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we take them out, we would have statements like this:

apply(BinaryOperation((IsNotNullCheck(ast), BinaryOpertor.`||`, BinaryOperation((IsNotNullCheck(ast), BinaryOperator.`&&`, reduction)))): Ast)

That's really hard for me to parse unless I break it down into multiple lines and even then. At some point I might want to reduce some even more sophisticated simplification rules for null check statements which would make these even more complex.

I'm good with restricting to BinaryOperation though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting removing only the implicits and calling the objects in this file directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can certainly remove reduce, that makes sense, especially because BetaReduction is a noun.

What about the +||+ etc.. methods though? Then it would turn into:

apply(`+||+`((IsNotNullCheck(ast), `+&&+`((IsNotNullCheck(ast), reduction)))): Ast)

That's still pretty confusing to me so I'd rather leave them. Wdyt?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, let's keep them 👍

Copy link
Collaborator Author

@deusaquilus deusaquilus Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, will do.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make +||+ and +&&+ limited to BinaryOperation because of this:

      case OptionExists(ast, alias, body) =>
        val reduction = BetaReduction(body, alias -> ast)
        apply((IsNotNullCheck(ast) +&&+ reduction): Ast)

Since the AST is untyped, there's no way to know that the AST inside OptionExists must yield a BinaryOperation instance.

import io.getquill.ast._
import io.getquill.ast.Implicits._

/**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for this documentation! :)

* rules as well. Note how we are force to null-check both `h.value` as well as `(h.value || 'foo')` because
* a user may use `Option[T].flatMap` and explicitly transform a particular value to `null`.
*/
object SimplifyNullChecks extends StatelessTransformer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a side note: we're fixing bugs and evolving the SQL generation in quill-sql. Unfortunately quill-cassandra doesn't share this code, so we probably should review the modules and try to share more code in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is in quill-core I had in mind to add it to CqlNormalize as well. Will do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I assumed it was in quill-sql

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added SimplifyNullChecks to CqlNormalize

ast match {

// Center rule
case IfExist(
Copy link
Collaborator

@fwbrasil fwbrasil Feb 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that these transformations could be implemented in a generic way using on If nodes? We could move them to quill-core if it's possible

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is already in quill-core :)

case q"$o.flatMap[$t]({($alias) => $body})" if is[Option[Any]](o) =>
OptionFlatMap(astParser(o), identParser(alias), astParser(body))
if (isOptionRowType(o) || isOptionEmbedded(o))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you clarify what this condition does?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, if it's a row type (e.g. case class, tuple, or embedded), always use Unchecked since since you can't null-check these types in sql. I'll add a comment.

@deusaquilus deusaquilus force-pushed the option_fix branch 3 times, most recently from f4e561b to 982bc47 Compare February 6, 2019 08:54
* this is needed in order to be able to detect if there are conditionals inside of optional statements
* since they could be nested in the body's subtree.
*/
class SubtreeSearch(val state: Boolean) extends StatefulTransformer[Boolean] {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, looks like I've re-implemented the wheel here. Will change.

@deusaquilus deusaquilus force-pushed the option_fix branch 2 times, most recently from a30814c to ea35c4b Compare February 8, 2019 19:11
@deusaquilus
Copy link
Collaborator Author

Whoops, I forgot to change orNullValue to getOrNull. Will do that

@deusaquilus
Copy link
Collaborator Author

Done

Copy link
Collaborator

@fwbrasil fwbrasil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to merge once the build is fixed 👍

Thanks for addressing my concerns

@deusaquilus deusaquilus force-pushed the option_fix branch 6 times, most recently from a12613c to 701d7ea Compare February 10, 2019 20:47
@deusaquilus deusaquilus merged commit bfd9151 into zio:master Feb 10, 2019
@deusaquilus
Copy link
Collaborator Author

Merged. Added some documentation too.

@deusaquilus
Copy link
Collaborator Author

I occours to me that in cases where .map, .flatMap, .exists, or .foreach don't have any of the following:

  • If clauses (i.e. no case statements inside)
  • || operations (technically this is for oracle it's on the level of the parser where context can't easily be overridden
  • Infixes (i.e. even outside of Oracle's non-ANSI CONCAT, UDFs can definitely break the null fallthrough)
    When none of these things occour, we can revert to our previous null-fallthrough behavior.
    I think this is a good idea because it's fairly simple to implement using CollectAST and would simplify many, many queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants