Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overconstrained GADT bounds in presence of bad bounds #11545

Open
mario-bucev opened this issue Feb 26, 2021 · 19 comments · Fixed by #11549 or #12000
Open

Overconstrained GADT bounds in presence of bad bounds #11545

mario-bucev opened this issue Feb 26, 2021 · 19 comments · Fixed by #11549 or #12000
Assignees
Labels
area:gadt itype:bug itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException)
Milestone

Comments

@mario-bucev
Copy link
Contributor

mario-bucev commented Feb 26, 2021

Compiler version

3.0.0-RC1 (Scastie)

Minimized code

@main def test: Unit = {
  trait S[A]
  trait Inv[A]

  class P[X] extends S[Inv[X] & Inv[String]]

  def patmat[A, Y](s: S[Inv[A] & Y]): A = s match {
    case p: P[x] =>
      "Hello"
  }
  
  val got: Int = patmat[Int, Inv[String]](new P) // ClassCastException: String cannot be cast to Integer
}

Output

Click to expand
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
	at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99)
	at main$package$.test(main.scala:11)
	at test.main(main.scala:1)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at sbt.Run.invokeMain(Run.scala:115)
	at sbt.Run.execute$1(Run.scala:79)
	at sbt.Run.$anonfun$runWithLoader$4(Run.scala:92)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at sbt.util.InterfaceUtil$$anon$1.get(InterfaceUtil.scala:10)
	at sbt.TrapExit$App.run(TrapExit.scala:257)
	at java.lang.Thread.run(Thread.java:748)

Expectation

The code should be rejected, but it is unclear what part is problematic:

  1. The instantiation of new P due to introducing the bad bounds S[Inv[Int] & Inv[String]]
  2. The class declaration of P with the extends S[Inv[X] & Inv[String]] clause, which may lead to bad bounds if X != String
  3. Deducing that x = A = String

I think that the problem comes from 1 (and not necessarily from 3). I believe, however, that this is not sufficient, as it seems that bad bounds can still be crafted.

Translating the above into pDOT shows that we can allow 3. if we require a witness value for Inv[A] & Y. Then, the above would not cause a ClassCastException since providing a value for Inv[Int] & Inv[String] is not possible.

@LPTK
Copy link
Contributor

LPTK commented Feb 27, 2021

The culprit is clearly 3. In Scala/pDOT, you can't escape type members with members that themselves end up having bad bounds (as you say, bad bounds can still be crafted). So we have to live with bad bounds.

In the pattern match, we know Inv[x] & Inv[String] =:= Inv[A] & Y. It is utterly wrong to decompose that constraint into Inv[x] =:= Inv[A] and Inv[String] =:= Y, which is what Dotty seems to be doing. The latter implies the former, so it's okay for approximating type inference, but the former does not imply the latter.

In DOT-like terms, we have Inv[x | String .. x & String] =:= Inv[A] & Y.
So we do have Inv[x | String .. x & String] <: Inv[A] and thus A <: x | String and x & String <: A.
We also have Inv[A] & Y <: Inv[x | String .. x & String], from which we can't deduce anything (think that Y could be bottom).
None of these implies String <: A, which is needed to make the pattern match type check.

@LPTK
Copy link
Contributor

LPTK commented Mar 1, 2021

@liufengyun the issue is not truly solved, as you can see in this slight variaton:

class test0 {
  type AA
  trait S[A]
  trait Inv[A]

  class P[X] extends S[Inv[X] & AA]

}

@main def test: Unit =
  new test0:
    type AA = Inv[String]
    
    def patmat[A, Y](s: S[Inv[A] & Y]): A = s match {
      case p: P[x] =>
        "Hello"
    }
    
    val got: Int = patmat[Int, AA](new P) // ClassCastException: String cannot be cast to Integer

Should it be reopened?

I'm aware that @odersky said in #11549 there were still holes to be fixed by the community, but this particular problem is never going to be fixed completely. The theory of DOT shows we can't avoid bad bounds, so we have to be sound in their presence, and GADT reasoning is currently not.

@liufengyun
Copy link
Contributor

@LPTK In the example, it seems to me type inference (GADT reasoning) is doing the right thing:

P[x] <:< S[Inv[A] & Y]
=> S[Inv[x] & String] <: S[Inv[A] & Y]
=> Inv[x] & Inv[String] =:= Inv[A] & Y
=> Inv[A] <:< (Inv[x] & Inv[String])
=> Inv[A] <:< Inv[x] and Inv[A] <:< Inv[String]
=> x =:= A =:= String

If I understand correctly, handling bad bounds in type inference is more convoluted and has performance penalties.

Another possibility is to add the inferred GADT bounds that are valid to the method type parameters (if they are valid in the method scope). This needs to propagate F-bounds outward and do the join. I'm not sure how practical it is.

/cc: @abgruszecki

@LPTK
Copy link
Contributor

LPTK commented Mar 1, 2021

@liufengyun

...
=> Inv[x] & Inv[String] =:= Inv[A] & Y
=> Inv[A] <:< (Inv[x] & Inv[String])
...

How do you justify that step? Similar to what I've said above, Inv[A] & Y <: Inv[x] & Inv[String] does not imply Inv[A] <: Inv[x] & Inv[String]. You can't just remove an intersected term on the left.

This is not about handling bad bounds in type inference, this is about the soundness of type checking GADTs. I wonder why you think the compiler is doing the right thing, despite the fact that executing the program leads to a ClassCastException 🙂

@liufengyun
Copy link
Contributor

How do you justify that step? Similar to what I've said above, Inv[A] & Y <: Inv[x] & Inv[String] does not imply Inv[A] <: Inv[x] & Inv[String]. You can't just remove an intersected term on the left.

Indeed it does not hold generally. I read your previous comment but mistakenly thought there's a way out that only resorts to subtyping.

Nevertheless, given that Inv is invariant, the step seems to be the only possibility to make =:= hold, thus a valid reasoning step for this particular case.

@LPTK
Copy link
Contributor

LPTK commented Mar 1, 2021

given that Inv is invariant, the step seems to be the only possibility to make =:= hold

That's not true. Again, referring to my message above, taking Y = Nothing is enough to make Inv[A] & Y <: Inv[x] & Inv[String] hold. If you want to justify a subtyping relationship, you have to make it according to the rules of the type system!

PS: To specifically answer your argument about =:= (and not just <:<) – there is no rule AFAIK to say that Inv[x] & Inv[String] =:= Inv[A] & Nothing does not hold. In fact, if x and String happen to be different, then Inv[x] & Inv[String] is indeed functionally equivalent to Nothing.

@liufengyun
Copy link
Contributor

there is no rule AFAIK to say that Inv[x] & Inv[String] =:= Inv[A] & Nothing does not hold.

If there is no rule says that they hold, they don't hold ;-)

@LPTK
Copy link
Contributor

LPTK commented Mar 1, 2021

You have it backwards. The burden of proof is on you to show that it cannot possibly hold, otherwise this GADT reasoning is unsound, as we can see from the ClassCastException this program raises.

@abgruszecki
Copy link
Contributor

I'm reopening this. @LPTK correctly observed that we're still unsound. The correct solution seems to be to check whether we're in GADT constraint inference mode before (approximately) branching on &.

@abgruszecki abgruszecki reopened this Mar 12, 2021
@abgruszecki abgruszecki self-assigned this Mar 12, 2021
@odersky
Copy link
Contributor

odersky commented Apr 6, 2021

The theory of DOT shows we can't avoid bad bounds, so we have to be sound in their presence, and GADT reasoning is currently not.

That's not quite true. DOT says that good bounds are only guaranteed for members of values. It puts the burden of guaranteeing good bounds on when we construct values. So that's what goes wrong here. Type parameters of classes are treated like member types. Non-variant type parameters are treated like type aliases and we need to have guarantee good bounds for them.

Meanwhile #11995 shows that my fix caused important code out there to fail. So that does not seem to work either.

It looks like an important and urgent problem and it would be good if the people discussing here could invest themselves. How can we let real world code compile and still ensure soundness? If you have some time any contribution to this issue and #11995 would be much appreciated.

@LPTK
Copy link
Contributor

LPTK commented Apr 6, 2021

That's not quite true. DOT says that good bounds are only guaranteed for members of values.

Yes, of course. I meant that we can't avoid bad bounds in general, more specifically in the members of types for which we don't have values around. My point stands that the problem is never going to be solved by filling holes which are only special cases of a fundamentally unavoidable property.

How can we let real world code compile and still ensure soundness?

By removing the ad-hoc "fixes" and confronting the reality that we can't infer String <: A from Inv[x] & Inv[String] =:= Inv[A] & Y, because that's just not sound. The constraint solver used for GADT reasoning needs to be fixed so it always remains conservative in the conclusions it draws from inferred constraints. Unlike @abgruszecki and the students he supervises, I am not (and have no plans to become) familiar with that part of the code at all, so the several people currently working on it should take it up themselves.

@mario-bucev
Copy link
Contributor Author

Correct me if I'm wrong, but I think that this issue should not be closed by PR #12000 (GitHub seems to have linked it because of the sentence "need to check to fix #11545")

@abgruszecki
Copy link
Contributor

This was closed accidentally.

Looking at the issue, I think what is happening is that we take a wrong step in TypeComparer. I'll have to investigate where exactly. I'd guess that we do something that is correct as far as type inference goes, but gives wrong results for inferring GADT constraints. Specifically, the GADT logic should run the following type comparison : Inv[A] & Y =:= Inv[X] & Inv[String], and we simply shouldn't infer from this comparison that A = X = String.

@abgruszecki abgruszecki reopened this Apr 7, 2021
@abgruszecki abgruszecki added this to the 3.0.0 milestone Apr 12, 2021
@abgruszecki abgruszecki linked a pull request Apr 13, 2021 that will close this issue
@odersky odersky removed this from the 3.0.0-RC3 milestone Apr 14, 2021
@odersky
Copy link
Contributor

odersky commented Apr 14, 2021

I think whatever we do, this should not be part of 3.0.0. We had two previous soundness patches that we had to fix or revert for RC3. Restricting the range of programs that are compiled needs to be well considered, and we need time for the community to experiment.

@LPTK
Copy link
Contributor

LPTK commented Apr 14, 2021

I don't think this is a language design issue that the community needs to experiment with. Instead, this is a clear-cut constraint solving bug. Here is what I gather the issue is:

  1. Some constraints can't be solved directly, and need to be approximated.

  2. To approximate a constraint C0 in type inference, we need to find a constrain C1 that implies C0. It's fine if we pick something too strong, as that just means we will infer a type that is not as general.

  3. To approximate a constraint C0 in GADT reasoning, we need to find a constrain C1 that is implied by C0. It's fine if we pick a constraint that is too weak, as that just means we will not leverage all the subtyping information present in the currently considered pattern match.

What Dotty currently does is to approximate a GADT constraint in the wrong direction, as if it was a type inference constraint.

Looking at @abgruszecki's PR, it is clear that the culprit is the use of baseType on the LHS of a constraint: baseType returns one possible base type of a given type, so that for instance Inv[A] & B <: C is approximated to Inv[A] <: C, because Inv[A] is considered a base type of Inv[A] & B.
In essence, the reasoning reduces the constraint (C0) Inv[A] & B <: C to just (C1) Inv[A] <: C, which is perfectly sound for type inference, but as we've seen is not sound for GADT reasoning: while (C1) implies (C0), the reverse is not true! For instance, if B is later instantiated to Nothing, it is clear that (C0)+(B := Nothing), i.e., Nothing <: C, does NOT imply (C1).

@LPTK
Copy link
Contributor

LPTK commented Apr 14, 2021

As a meta-remark: I'm very suspicious about the use of the type comparer to deal with GADT constraints. Every place where constraints are approximated during type comparison is a potential soundness hole, unless it has specific logic to avoid the approximation or do it in the other direction in case we are in GADT mode. I'm pretty sure everything would be a lot simpler and, more importantly, less bug-prone, if the GADT constraint solving algorithm was completely separate. In particular, there are many complexities and heuristics needed for type inference that are unnecessary or plain unsound when used for GADT reasoning.

@smarter
Copy link
Member

smarter commented Apr 14, 2021

unless it has specific logic to avoid the approximation or do it in the other direction in case we are in GADT mode

We do have that logic, isSubType has three parameters: tp1: Type, tp2: Type, a: ApproxState, the ApproxState is set when we do an approximation and the GADT logic checks that. Of course it's hard to guarantee that this is always done correctly.

@dwijnand
Copy link
Member

the ApproxState is set when we do an approximation and the GADT logic checks that

Also Mode.GadtConstraintInference and TypeComparer#useNecessaryEither, at least, I think.

@dwijnand
Copy link
Member

There's also

  def testSubType(tp1: Type, tp2: Type): CompareResult =
    GADTused = false
    if !topLevelSubType(tp1, tp2) then CompareResult.Fail
    else if GADTused then CompareResult.OKwithGADTUsed
    else CompareResult.OK

which seems excitedly on point for what one might want in order to use the subtyping algorithm, biased towards GADT reasoning (testSubType(..) != CompareResult.Fail) but, at least for the usage I was trying to make of it, it gave the wrong answer. 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment