Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect overload selected for Array(Array(T)) vs Array(T) #8973

Open
asterite opened this issue Mar 29, 2020 · 23 comments
Open

Incorrect overload selected for Array(Array(T)) vs Array(T) #8973

asterite opened this issue Mar 29, 2020 · 23 comments
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:generics topic:compiler:semantic

Comments

@asterite
Copy link
Member

def foo(a : Array(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = [5].as(Array(Array(Int32)) | Array(Int32))
foo(z)

Prints "two" but should print "one". I'm still not exactly sure why this happens.

@asterite asterite added kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:semantic labels Mar 29, 2020
@asterite
Copy link
Member Author

Without the dependency on Array:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = Gen(Int32).new.as(Gen(Gen(Int32)) | Gen(Int32))

foo(z)

@stevensonmt
Copy link

Not sure if it's relevant, but the bug appears inconsistent.

def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end

def foo(a : T) forall T
  p "second call for #{a} of class #{a.class}"
  p a
end  

foo([1, [2, [[3]], [4, [[5]]]]])

"first call for 1 of class Int32"
"second call for 1 of class Int32"
1
"first call for [2, [[3]], [4, [[5]]]] of class Array(Array(Array(Array(Int32)) | Int32) | Array(Array(Int32)) | Int32)"
"first call for 2 of class Int32"
"second call for 2 of class Int32"
2
"first call for [[3]] of class Array(Array(Int32))"
"second call for [[3]] of class Array(Array(Int32))"
[[3]]
"first call for [4, [[5]]] of class Array(Array(Array(Int32)) | Int32)"
"first call for 4 of class Int32"
"second call for 4 of class Int32"
4
"first call for [[5]] of class Array(Array(Int32))"
"first call for [5] of class Array(Int32)"
"first call for 5 of class Int32"
"second call for 5 of class Int32"
5

@stevensonmt
Copy link

I'm not enough of a mathematician to do this correctly, but bear with me.
So numbers belong to the set N.
[[3]] and [[5]] belong to the same set or type A for N ⊂ A.
[2, [[3]]] and [4, [[5]]] would also belong to the same set or type B for which (A | N) ⊂ B.
So the array [2, [[3]], [4, [[5]]] is a collection C which is B(N | B). When the compiler sees this pattern it determines that N | B is the unit type T. When calling the first method on C it will call the second method on any B that is not B(B). Because [[3]] is A and is not B, the second method is called. Because [4, [[5]]] is B, the first method is called. For this instance the unit type T is no longer N | B, but rather N | A. Because A is defined as containing N but not itself, N is the unit type T for the first method called on A.

If any of that makes sense and is factual, the compiler's actions are surprising but correct. The expected output would only be possible if the compiler waits to define the unit type T until all possible recursion is done.

@asterite
Copy link
Member Author

It might be a correct guess but there's no recursion going on.

It works wrong because we have a bug in our code.

@stevensonmt
Copy link

Recursion might be the wrong term for the algorithm process, but it does seem to have something to do with the recursive union type.
B(A|B) triggers it when C(A|B) does not.

@jgaskins
Copy link
Contributor

Doing some old-fashioned puts debugging in the compiler and discovered this when I ran it against @stevensonmt's example code:

Crystal::Matches(
 @cover=true,
 @matches=
  [#<Crystal::Match:0x10cf32330
    @arg_types=[Array(Array(Int32))],
    @context=
     #<Crystal::MatchContext:0x10cf52900
      @def_free_vars=["T"],
      @defining_type=main,
      @free_vars={"T" => Array(Int32)},
      @instantiated_type=main,
      @strict=false>,
    @def=
     def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p("first call for #{a} of class #{a.class}")
    foo(a)
  end
end,
    @named_arg_types=nil>,
   #<Crystal::Match:0x10cf32300
    @arg_types=[(Array(Array(Int32)) | Int32)],
    @context=
     #<Crystal::MatchContext:0x10cf52800
      @def_free_vars=["T"],
      @defining_type=main,
      @free_vars={"T" => (Array(Array(Int32)) | Int32)},
      @instantiated_type=main,
      @strict=false>,
    @def=
     def foo(a : T) forall T
  p("second call for #{a} of class #{a.class}")
  p(a)
end,
    @named_arg_types=nil>],
 @owner=main,
 @success=true)

Crystal::Type#lookup_matches_without_parents is correctly matching Array(Array(Int32)) (the [[3]]) to the first method but also lumping it in with Int32 in a second match. I haven't figured out how that match is used yet, but it seems to be choosing the second match somewhere.

@faustinoaq
Copy link
Contributor

Interesting bug, if I try this: https://carc.in/#/r/8t59

def foo(a : Array(T)) forall T
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end
 
def foo(a : Array(Array(Int32)))
  a.each_with_index do |a, i|
    p "first call for #{a} of class #{a.class}"
    foo(a)
  end
end
 
def foo(a : T) forall T
  p "second call for #{a} of class #{a.class}"
  p a
end  
 
foo([1, [2, [[3]], [4, [[5]]]]])

Then I get this:

"first call for 1 of class Int32"
"second call for 1 of class Int32"
1
"first call for [2, [[3]], [4, [[5]]]] of class Array(Array(Array(Array(Int32)) | Int32) | Array(Array(Int32)) | Int32)"
"first call for 2 of class Int32"
"second call for 2 of class Int32"
2
"first call for [[3]] of class Array(Array(Int32))"
"first call for [3] of class Array(Int32)"
"first call for 3 of class Int32"
"second call for 3 of class Int32"
3
"first call for [4, [[5]]] of class Array(Array(Array(Int32)) | Int32)"
"first call for 4 of class Int32"
"second call for 4 of class Int32"
4
"first call for [[5]] of class Array(Array(Int32))"
"first call for [5] of class Array(Int32)"
"first call for 5 of class Int32"
"second call for 5 of class Int32"
5

@jgaskins
Copy link
Contributor

Looks like the bug is somewhere in this file. Haven't found it yet. I'll look some more after work today.

@asterite
Copy link
Member Author

The problem is that this is a bug where the problem is not the code. We need to define how we want it to work first.

Here's the code again:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  puts "one"
end

def foo(a)
  puts "two"
end

z = Gen(Int32).new.as(Gen(Gen(Int32)) | Gen(Int32))

foo(z)

The type of z is a union of:

  • Gen(Gen(Int32))
  • Gen(Int32)

When the compiler tries to match that union type against Gen(T) it will do this:

  • Take the first type in the union: Gen(Gen(Int32)). It matches against Gen(T). T becomes Gen(Int32).
  • The the second type in the union: Gen(Int32). Try to match it against Gen(T). However, we said T is Gen(Int32). So it's matching against Gen(Gen(Int32)). No match.

The only match is the first one, Gen(Gen(Int32)), where T is Gen(Int32).

Then the next overload is considered, and the entire union matches.

That's why we are seeing that behavior.

How it should work? Should T become the union of those types? But if T is Gen(Int32) | Int32 it would mean we had Gen(Gen(Int32) | Int32) in the first place, which is not the case.

We basically need to dispatch the union type to two different instantiations of the same overload. But the code never does that in any place of the compiler.

So this is a huge change, if we want to implement it.

Or... other ideas of how this should match?

@jgaskins
Copy link
Contributor

This makes me question whether I understand forall. My mental model is that it instantiates a new overload for every single type it was called with, including splitting unions. For example:

def foo(a : T) forall T
end

foo [1, "b"].first

In this example, it seems that there would be 2 overloads for foo — one for Int32 and one for String. Is that correct or is there just one for Int32 | String?

@RX14
Copy link
Member

RX14 commented Mar 31, 2020

@jgaskins there is just one, for the union type.

It works exactly the same way without forall. forall doesn't change which instantiations occur, it just increases the expressiveness of type restrictions.

@asterite
Copy link
Member Author

Exactly. forall just lets you talk about that T and use it in code later on. But it doesn't change how things match.

That said, in my mind forall should also expand union types and provide one overload for each, but that's not the case. And maybe we shouldn't do it. Should we do it for each subclass of a type hierarchy too?

Another thing to consider:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  p T
end

foo(Gen(Int32).new || Gen(Char).new)

Currently fails:

Error: no overload matches 'foo' with type (Gen(Char) | Gen(Int32))

Overloads are:
 - foo(a : Gen(T))
Couldn't find overloads for these types:
 - foo(a : Gen(Int32))

It's because Gen(Char) matches, T becomes Char, then Gen(Int32) doesn't match and there's no overload to cover that.

The entire situation is very confusing.

My advice is to avoid using forall if not needed. In the example in the forums the T wasn't needed at all.

@stevensonmt
Copy link

stevensonmt commented Mar 31, 2020

The type of z is a union of:

  • Gen(Gen(Int32))
  • Gen(Int32)

When the compiler tries to match that union type against Gen(T) it will do this:

  • Take the first type in the union: Gen(Gen(Int32)). It matches against Gen(T). T becomes Gen(Int32).
  • The the second type in the union: Gen(Int32). Try to match it against Gen(T). However, we said T is Gen(Int32). So it's matching against Gen(Gen(Int32)). No match.

The only match is the first one, Gen(Gen(Int32)), where T is Gen(Int32).

Then the next overload is considered, and the entire union matches.

That's why we are seeing that behavior.

How it should work? Should T become the union of those types? But if T is Gen(Int32) | Int32 it would mean we had Gen(Gen(Int32) | Int32) in the first place, which is not the case.

We basically need to dispatch the union type to two different instantiations of the same overload. But the code never does that in any place of the compiler.

So this is a huge change, if we want to implement it.

Or... other ideas of how this should match?

Thanks for the clear delineation of the problem. I never expected to uncover such a fundamental bug. How important is this corner case?

@asterite
Copy link
Member Author

asterite commented Mar 31, 2020

How important is this corner case?

There's lucky framework, amber, athena and thousand of shards out there. We have a compiler. There's the standard library with a lot of features.

It's a bug but it's not preventing people from creating wonderful things. I think this is a very low priority bug. It's pretty rare to have to match against Array(T) and wanting to get that T. It's a bit leaky. And if you really need to do that, you can do it by doing typeof(array.first) or something like that too. So there are workarounds.

@jgaskins
Copy link
Contributor

It works exactly the same way without forall. forall doesn't change which instantiations occur, it just increases the expressiveness of type restrictions.

Exactly. forall just lets you talk about that T and use it in code later on. But it doesn't change how things match.

Ahh, okay, similar to the idea of &block vs yield (purposely ignoring the block inlining for yield for this analogy), where they both match the exact same way but the &block version gives you a reference to the block itself but there's only one instantiation of the &block overload?

@HertzDevil
Copy link
Contributor

Another thing to consider:

class Gen(T)
end

def foo(a : Gen(T)) forall T
  p T
end

foo(Gen(Int32).new || Gen(Char).new)

Currently fails:

Error: no overload matches 'foo' with type (Gen(Char) | Gen(Int32))

Overloads are:
 - foo(a : Gen(T))
Couldn't find overloads for these types:
 - foo(a : Gen(Int32))

If we dispatch here, the same can be said about a plain free var restriction:

def foo(x : T) forall T
  p T
end

foo(1 || 'a') # => Int32
foo('a' || 1) # => Char

as dynamic dispatch would generate separate overloads for T = Int32 and T = Char. IMO this is undesirable as it leads to semantic consequences like the following:

def foo(x : T) forall T
  [x]
end

# without dynamic dispatch:
foo(1 || 'a').class   # => Array(Char | Int32)
foo('a' || 1).class   # => Array(Char | Int32)
typeof(foo(1 || 'a')) # => Array(Char | Int32)
typeof(foo('a' || 1)) # => Array(Char | Int32)

# with dynamic dispatch:
foo(1 || 'a').class   # => Array(Int32)
foo('a' || 1).class   # => Array(Char)
typeof(foo(1 || 'a')) # => (Array(Char) | Array(Int32))
typeof(foo('a' || 1)) # => (Array(Char) | Array(Int32))

So I am inclined towards not allowing dynamic dispatch here; when one writes Gen(T) forall T, the static type of the argument must never be a union. Instead there is the following way to perform dynamic dispatch on a static union type:

class Gen(T)
end

def foo_impl(a : Gen(T)) forall T
  p T
end

def foo(x : T) forall T
  {% begin %}
    case x
      {% for t in T.union_types %}
      in {{ t }} then foo_impl(x)
      {% end %}
    end
  {% end %}
end

foo(Gen(Int32).new || Gen(Char).new) # => Int32
foo(Gen(Char).new || Gen(Int32).new) # => Char

foo(1) # Error: no overload matches 'foo_impl' with type Int32

Perhaps we could extract this idiom into a pseudo-method. A regular method probably wouldn't work; if we replace foo_impl(x) with yield x, the static type of the block argument would still be a union of Gens.

@asterite
Copy link
Member Author

Great analysis, thank you! And I agree with you.

@j8r
Copy link
Contributor

j8r commented Jan 30, 2021

This issue is probably the same as mine, which does not involve generics:

record Child, str : String = "bar"
record Parent, str : String = "foo", nested : Child = Child.new

def serialize(obj : T, de_unionize = false) forall T
  String.build do |io|
    each_ivar obj do |ivar|
      io << (de_unionize ? de_unionize(ivar) : serialize(ivar))
    end
  end
end

def serialize(str : String)
  str
end

def each_ivar(object : O, &) forall O
  {% for ivar in O.instance_vars %}
    yield object.@{{ivar}}
  {% end %}
end

def de_unionize(object : U) forall U
  {% for u in U.union_types %}
    return serialize object if object.is_a? {{u}}
  {% end %}
end

p serialize Parent.new                    # => "foo"
p serialize Parent.new, de_unionize: true # => "foobar"

As @HertzDevil suggested, I did have to implement a method to de-unionize.

@HertzDevil
Copy link
Contributor

So if Gen isn't equivalent to Gen(T) forall T in type restrictions, which one should Gen(_) be equivalent to?

@HertzDevil
Copy link
Contributor

There is also the following:

def foo(x : Array(Array(T))) forall T
  puts "1, #{T}"
end

def foo(x : Array(T)) forall T
  puts "2, #{T}"
end

def foo(x : T) forall T
  puts "3, #{T}"
end

foo([[1]] || [1]) # => 1, Int32
foo([1] || [[1]]) # => 3, (Array(Array(Int32)) | Array(Int32))

If we follow the no dispatches approach, both calls should print 3, because Array(Array(T)) forall T and Array(T) forall T never match the static union type. But if the first overload doesn't involve a free var:

def foo(x : Array(Array(Int32)))
  puts "1, #{Int32}"
end

def foo(x : Array(T)) forall T
  puts "2, #{T}"
end

foo([[1]] || [1]) # Error: no overload matches 'foo' with type (Array(Array(Int32)) | Array(Int32))
foo([1] || [[1]]) # Error: no overload matches 'foo' with type (Array(Array(Int32)) | Array(Int32))

then the calls should print 1 and 2 respectively; that is, we still want to perform union dispatch in the same way the following is supported:

def foo(x : Array(Array(Int32)) | Array(T)) forall T
  puts "#{x}, #{T}"
end

foo([1] || [[1]]) # [1], Int32
foo([[1]] || [1]) # [[1]], Int32

@HertzDevil
Copy link
Contributor

HertzDevil commented Sep 1, 2021

We could perform dispatch over free variables only when the usual unification fails. More specifically, suppose we have:

def foo(x : Array(T), y : Array(T)) forall T
end

If a type like Array(Int32) | Array(String) is used then unification will fail. In this case, we expand each restriction to a fictitious Union of copies of the original restriction, with the free variables in each copy renamed:

def foo(
  x : Array(T_11) | Array(T_12) | Array(T_13) | ... ,
  y : Array(T_21) | Array(T_22) | Array(T_23) | ...
) forall T_ij...
end

These Unions are matched against the argument types and every T_*j produces a different overload to be dispatched, with the additional requirement that if the same free variable is used in multiple parameters, only the common matches that appear on all such parameters (i.e. T_1* & T_2* & T_3* & ...) count as successful substitutions:

# T_1* = {Int32}
# T_2* = {Int32}
# T = {Int32}
foo([1], [1]) # note that normal unification should also succeed here

# T_1* = {Int32, String}
# T_2* = {Int32, String}
# T = {Int32, String}
# missing overloads for `foo([1], [""])` and `foo([""], [1])`
foo([1] || [""], [1] || [""])

# T_1* = {Int32, Char}
# T_2* = {Int32, String}
# T = {Int32}
# missing overloads for `foo([1], [""])`, `foo(['a'], [1])`, and `foo(['a'], [""])`
foo([1] || ['a'], [1] || [""])

# T_1* = {Int32 | String}
# T_2* = {String}
# T = ø
# missing overloads for everything
foo([1 || ""], [""])

def bar(x : Array(T), y : Array(U)) forall T, U
end

# T_1* = {Int32, String}
# U_1* = {Int32, String}
# T × U = {(Int32, Int32), (Int32, String), (String, Int32), (String, String)}
bar([1] || [""], [1] || [""])

# T_1* = {Int32 | String}
# U_1* = {String}
# T × U = {((Int32 | String), String)}
bar([1 || ""], [""])

At this stage any bare free variable restrictions expand union types to their power sets: (a similar treatment could be done for Tuple(T) and NamedTuple(a: T) provided they don't appear within other generics)

def foo(x : T, y : Array(T)) forall T
end

# T_1* = {NoReturn, Int32}
# T_2* = {Int32 | String}
# T = ø
# missing overloads for everything
foo(1, [1 || ""])

# T_1* = {NoReturn, Int32, String, Int32 | String}
# T_2* = {Int32 | String}
# T = {Int32 | String}
foo(1 || "", [1 || ""])

# T_1* = {NoReturn, Int32, String, Int32 | String}
# T_2* = {Int32}
# T = {Int32}
# missing overloads for `foo("", [1])`
foo(1 || "", [1])

These unions and power sets are only needed for a formal definition; the compiler probably won't create them.

I would like to add that the following should unify normally for any pairs of argument types (it currently doesn't):

def foo(x : T, y : T) forall T
  T
end

foo(1, "") # => Int32 | String

The standard library, to my knowledge, doesn't do this outside the Atomic primitives, but these also have Pointer(T) restrictions so matching would remain unaffected.

@straight-shoota
Copy link
Member

Hm, I'm wondering if this would actually be correct.

I would interpret def foo(x : T, y : T) forall T that both arguments are supposed to have identical types. And not that the compiler should construct a union type to fit different types.

On the other hand, you can also just cast both arguments to Int32 | String explicitly. Maybe having the compiler do that implicitly wouldn't be too bad. But it still seems unintuitive.

What's the point in restricting both arguments to the same type then, if any pair of types would be acceptable?

@HertzDevil
Copy link
Contributor

Revisiting this issue and you're probably right that T should restrict exactly to the argument type. In that case we have:

def foo(x : T, y : Array(T)) forall T
end

# T_1* = {Int32}
# T_2* = {Int32 | String}
# T = ø
# missing overloads for everything
foo(1, [1 || ""])

# T_1* = {Int32 | String}
# T_2* = {Int32 | String}
# T = {Int32 | String}
foo(1 || "", [1 || ""])

# T_1* = {Int32 | String}
# T_2* = {Int32}
# T = ø
# missing overloads for everything
foo(1 || "", [1])

# T_1* = {Int32 | String}
# T_2* = {Int32, String}
# T = ø
# missing overloads for everything
foo(1 || "", [1] || [""])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. topic:compiler:generics topic:compiler:semantic
Projects
None yet
Development

No branches or pull requests

8 participants