Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unsound compilation of local classes in conjunction with exceptions #22051

Closed
cpitclaudel opened this issue Nov 29, 2024 · 18 comments · Fixed by #22099
Closed

Unsound compilation of local classes in conjunction with exceptions #22051

cpitclaudel opened this issue Nov 29, 2024 · 18 comments · Fixed by #22099
Assignees
Labels
area:desugar Desugaring happens after parsing but before typing, see desugar.scala itype:bug itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException)
Milestone

Comments

@cpitclaudel
Copy link

Compiler version

Scala CLI version: 1.5.4
Scala version (default): 3.5.2

Minimized code

def boundary[T](body: (T => RuntimeException) => T): T =
  case class Break(value: T) extends RuntimeException
  try body(Break.apply)
  catch case Break(t) => t

@main def main =
  boundary[Int]: EInt =>
    val v: String = boundary[String]: EString =>
      throw EInt(3)
    v.length

Output

Exception in thread "main" java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
	at crash$package$.main$$anonfun$1(crash.scala:8)
	at crash$package$.boundary(crash.scala:3)
	at crash$package$.main(crash.scala:7)
	at main.main(crash.scala:6)

Expectation

The program should either be rejected at compilation, or not crash. The problem seems to be that there's only one Break class at runtime, but the typechecker thinks that Break must contain a T.

@cpitclaudel cpitclaudel added itype:bug stat:needs triage Every issue needs to have an "area" and "itype" label labels Nov 29, 2024
@dwijnand
Copy link
Member

What's EString and EInt?

@cpitclaudel
Copy link
Author

cpitclaudel commented Nov 29, 2024

They're just local names for the constructor of the exception type that's defined inside boundary. You could name them anything:

def boundary[T](body: (T => RuntimeException) => T): T =
  case class Break(value: T) extends RuntimeException
  try body(Break.apply)
  catch case Break(t) => t

@main def main =
  boundary[Int]: foo =>
    val v: String = boundary[String]: bar =>
      throw foo(3)
    v.length

@noti0na1
Copy link
Member

We can manually create a tag for each boundary execution:

def boundary[T](body: (T => RuntimeException) => T): T =
  val tag: AnyRef = new Object
  case class Break(value: T, tag: AnyRef) extends RuntimeException
  try body(Break.apply(_, tag))
  catch case Break(t, _: tag.type) => t

def test() =
  boundary[Int]: EInt =>
    val v: String = boundary[String]: EString =>
      throw EInt(3)
    v.length

test()

@bracevac
Copy link
Contributor

@noti0na1's solution is correct. The problem is erasure and how classes are handled. Break being defined in the body of method boundary does not mean there will be a unique Break class & instance per invocation of boundary. The class will be lifted out of boundary instead. It becomes clear when we inspect the program after one of the very late compiler phases, e.g., scalac -Xprint:repeatableAnnotations <filename>.scala.

So I am not sure whether there is anything to fix here.

@noti0na1
Copy link
Member

noti0na1 commented Nov 29, 2024

Well, I believe Java has the same behaviour, so I'm not sure whether we should "fix" this or not?

import java.util.function.Function;

public class BoundaryExample {
    public static <T> T boundary(Function<Function<T, RuntimeException>, T> body) {
        class Break extends RuntimeException {
            final T value;
            Break(T value) {
                this.value = value;
            }
        }
        try {
            return body.apply(value -> new Break(value));
        } catch (Break e) {
             return e.value;
        }
    }
    public static void main(String[] args) {
        Integer result = boundary(EInt -> {
            String v = boundary(EString -> {
                throw EInt.apply(3);
            });
            return v.length();
        });
        System.out.println(result);
    }
}
Exception in thread "main" java.lang.ClassCastException: class java.lang.Integer cannot be cast to class java.lang.String (java.lang.Integer and java.lang.String are in module java.base of loader 'bootstrap')
	at BoundaryExample.lambda$main$2(main.java:19)
	at BoundaryExample.boundary(main.java:12)
	at BoundaryExample.main(main.java:18) (exit status 1)

@sjrd
Copy link
Member

sjrd commented Nov 29, 2024

Gosh, if Java has the same issue, you might be able to get a paper out of it. 😯

That said in Scala we should try to emit a warning on the extractor, the same way we do for path-dependent classes for which we cannot statically enforce the right prefix.

@noti0na1
Copy link
Member

Some interesting find:

  • Kotlin and Swift disallows nested classes inside generic classes/functions that inherit from Throwable/Error to violate type-safety issue;
  • Go, python and js can produce correct behaviours (returning 3 at the end).

@cpitclaudel
Copy link
Author

[@noti0na1] We can manually create a tag for each boundary execution

Indeed! But to be clear, I'm not asking for help with implementing boundary/break (I'm happy with the implementation in the standard library).

[@bracevac] So I am not sure whether there is anything to fix here.

I'm too new to Scala to know what the policy is on unsoundness, but as a user it was surprising to me to have a well-typed program go wrong.

[@noti0na1] Well, I believe Java has the same behaviour

:'(

@bracevac
Copy link
Contributor

[@bracevac] So I am not sure whether there is anything to fix here.

I'm too new to Scala to know what the policy is on unsoundness, but as a user it was surprising to me to have a well-typed program go wrong.

The problem is the well-known limitation of the JVM that generics are erased, i.e., there will only be Break values carrying Objects as payload, and the inner boundary[String] invocation will catch the Break value with the integer and perform a cast at run time which will fail. In general, the best we can do is emit a warning at compile time, e.g., if we lift Break out of the function body like so:

case class Break[T](value: T) extends RuntimeException

def boundary[T](body: (T => RuntimeException) => T): T =
  try body(Break.apply)
  catch case Break[T](t) => t

// rest stays the same

We'll get the warning

-- [E092] Pattern Match Unchecked Warning: boundary3.scala:5:13 ----------------
5 |  catch case Break[T](t) => t
  |             ^
  |the type test for Break[T] cannot be checked at runtime because its type arguments can't be determined from Throwable
  |
  | longer explanation available when compiling with `-explain`
1 warning found

and perhaps we should emit it as well when Break's definition resides in the body of boundary.

As a side remark, the definition of boundary does not make sense unless the expectation is that each boundary invocation basically generates a fresh private copy of the Break class. We'd have to do runtime class generation or a more intricate compilation scheme that implicitly adds the call-specific branding as in @noti0na1's solution...

@sjrd
Copy link
Member

sjrd commented Nov 30, 2024

The problem is the well-known limitation of the JVM that generics are erased, i.e., there will only be Break values carrying Objects as payload, and the inner boundary[String] invocation will catch the Break value with the integer and perform a cast at run time which will fail.

This is the explanation of what's happening, but it's no excuse to produce a CCE out of a well-typed program without user-level cast. If we can't do the type test at run-time as specified, then we must not let it go through the typechecker. At least not without an "unchecked" warning somewhere.

Other cases of "I would need to do a type test that I cannot correctly implement" either emit compile errors or unchecked warnings. The fact that this instance goes through without any message is definitely a soundness bug, which we should fix one way or another.

As a side remark, the definition of boundary does not make sense unless the expectation is that each boundary invocation basically generates a fresh private copy of the Break class. We'd have to do runtime class generation or a more intricate compilation scheme that implicitly adds the call-specific branding as in @noti0na1's solution...

There are ways to do branding without run-time class generation (which is not really possible anyway). We can add a hidden field of type Object, instantiate a new Object for every logical incarnation of the class, and store it its instances. We can then do the type test by testing that the hidden field contains exactly the object we expect.

@noti0na1
Copy link
Member

noti0na1 commented Nov 30, 2024

One simple approach we can do is to wrap every local class into an object during typer. For example:

def boundary[T](body: (T => RuntimeException) => T): T =
  case class Break(value: T) extends RuntimeException
  try body(Break.apply)
  catch case Break(t) => t

// becomes

def boundary[T](body: (T => RuntimeException) => T): T =
  object local:
    case class Break(value: T) extends RuntimeException
  try body(local.Break.apply)
  catch case local.Break(t) => t

The pattern match case case local.Break(t) will check the prefix correctly. We don't even need to modify any later phase.

For optimization, we can also reuse the module object if we make some effort.

However, will existing code break because of this change?

@sjrd
Copy link
Member

sjrd commented Nov 30, 2024

No code will break, but the run-time cost is heavy. My suggestion is less heavy but still a large price to pay. TBH my recommendation is to emit an unchecked warning here, not to actually fix the type test.

@noti0na1
Copy link
Member

No code will break,

If the user relies on the unsound behaviour, then it will break. 😂

but the run-time cost is heavy. My suggestion is less heavy but still a large price to pay. TBH my recommendation is to emit an unchecked warning here, not to actually fix the type test.

I agree we should emit a warning. But I don't think the runtime cost will become more heavy, since we are already generate the module object, the extra work cost is only checking the outer reference.

@cpitclaudel
Copy link
Author

cpitclaudel commented Dec 1, 2024

[@bracevac] As a side remark, the definition of boundary does not make sense unless the expectation is that each boundary invocation basically generates a fresh private copy of the Break class.

Right, that was my expectation when I wrote that code. This is the model that the other languages that I use follow, I think. Here's OCaml:

let boundary (type t) (body : (t -> exn) -> t) : t =
  let module BreakModule = struct exception Break of t end in
  try body (fun x -> BreakModule.Break x)
  with BreakModule.Break t -> t

let _ = 
  let result =
  boundary (fun (eint : int -> exn) ->
      let v = boundary (fun (estr : string -> exn) ->
          raise (eint 3))
      in String.length v)
  in Printf.printf "Result: %d\n" result

Another note is that the ClassCastException only happens if I bind v; otherwise it silently runs the inner boundary's handler without attempting the cast:

boundary[Int]: EInt =>
  boundary[String]: EString =>
    throw EInt(3) // Caught by the boundary[string]
  0 // returns 0

@noti0na1 noti0na1 added itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException) area:desugar Desugaring happens after parsing but before typing, see desugar.scala and removed stat:needs triage Every issue needs to have an "area" and "itype" label labels Dec 1, 2024
@cpitclaudel
Copy link
Author

Look what @zaclegarssure found: https://2024.splashcon.org/details/unsound-2024-papers/3/Java-Method-Local-Inner-Classes-are-Unsound

@bracevac
Copy link
Contributor

bracevac commented Dec 2, 2024

[@cpitclaudel] Another note is that the ClassCastException only happens if I bind v; otherwise it silently runs the inner boundary's handler without attempting the cast:

boundary[Int]: EInt =>
  boundary[String]: EString =>
    throw EInt(3) // Caught by the boundary[string]
  0 // returns 0

Yes, that's consistent with erasure. The erasure of boundary[String] will return an Object, and as long as you only call the methods of Object on it (or just not use it at all), there is no need to cast. In your initial example, you ascribe String to the binding which necessitates the cast.

We can also see this with the example from the slides translated to Scala:

class Foo[X](val x: X)

def broken[X](other: AnyRef, x: X): Foo[X] =
  case class MyFoo() extends Foo[X](x)
  other match
    case foo:MyFoo => foo // <- 3.5.2 compiler gives a warning, good!
    case _ => MyFoo()

@main
def test = 
  val r1: Foo[Int] = broken[Int]("anything", 5)
  val r2: Foo[String] = broken[String](r1, "Hi")
  println(r1.x)
  println(r2.x) // <- note: prints 5, NO classcast exception
  //println(r2.x.length) <- classcast exception

Note that the line println(r2.x) does not yield an exception as claimed on the slides! Why should it? We have an erased Foo class that carries an Object, and toString can be called on it.

As yet another side note, given 30+ years on research on the formal metatheory programming languages, including those with control effects like exceptions, I cannot fully agree that a well-typed program resulting in an exception is necessarily "unsound", since type safety theorems for those kinds of languages explicitly permits those. I would be more convinced if the outcome is a genuine crash of the JVM. But given the lack of a fully formal spec of the entirety of Scala/Java that would permit a type safety proof, it’s a weak foundation to rest the argument on. So I guess we have to remain vague on what we consider "unsoundness" here. Happy to discuss further offline.

But yeah, a warning should be the minimum, and I would very much like local classes to be "generative" in the sense we discussed.

@sjrd
Copy link
Member

sjrd commented Dec 2, 2024

In Scala our definition of unsoundness is, and has always been: a well-typed program that does not contain any user-written asInstanceOf, nor any @unchecked annotation, must not cause a ClassCastException.

Unsoundness for crashing the JVM is soundness of the JVM type system. That's a different level. No matter what Scala or Java the languages do, the JVM must not crash, because it validates its bytecode at its type system level. In that type system, a CCE is not considered a violation of soundness, obviously.

@noti0na1 noti0na1 self-assigned this Dec 3, 2024
@noti0na1
Copy link
Member

noti0na1 commented Dec 3, 2024

It seems we did have a warning for pattern matching on local classes, but it is suspended by #6551 when the pattern is an unapply.

I'm working on a fix to bring the warning back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:desugar Desugaring happens after parsing but before typing, see desugar.scala itype:bug itype:soundness Soundness bug (it lets us compile code that crashes at runtime with a ClassCastException)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants