Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Require constructors and convert to return objects of stated type? #42372

Open
timholy opened this issue Sep 24, 2021 · 9 comments
Open

Require constructors and convert to return objects of stated type? #42372

timholy opened this issue Sep 24, 2021 · 9 comments
Labels
compiler:inference Type inference types and dispatch Types, subtyping and method dispatch
Milestone

Comments

@timholy
Copy link
Member

timholy commented Sep 24, 2021

(This is a Julia 2.0 proposal, as it is breaking. It is surely not novel, but I don't see another such issue open.) We currently treat outer constructors and convert method definitions as any other generic function, but there's an argument to be made that they should be special-cased by requiring that they always return an object of the stated type. Motivations:

  1. At present, it's essentially a bug to write fallback methods like this:
foo(x::Int) = 1
foo(x) = foo(Int(x))

Why? Because someone might define

struct Foo end
Int(f::Foo) = f

in which case you get

julia> foo(Foo())
ERROR: StackOverflowError:
Stacktrace:
 [1] foo(x::Foo) (repeats 79984 times)
   @ Main ./REPL[3]:1

and StackOverflowErrors are to be avoided: they can take forever to resolve in some cases, they can at least in principle trash your session, etc.

The only good way to write such a fallback method is

foo(x) = foo(Int(x)::Int)

but I will bet that the vast majority of developers do not know this or bother with it, and probably everyone finds it ugly and annoying.

Some might complain about idempotent operators and involutions, for example one might be tempted to define

struct Not
    x
end
Not(x::Not) = x.x   # just strip the `Not` wrapper rather than returning `Not(Not(5))`

But a solution is to separately introduce not from Not, and allow this behavior for not but not Not (Not must always wrap in another Not layer).

  1. In addition to surprising the developer, failing to enforce this is much harder on inference and the compiler, sometimes with dramatically increased risk of invalidation. Consider the case of Expr constructor sometimes returns a Symbol, Bool, etc julia-vscode/CSTParser.jl#308. For the implementation in https://github.com/julia-vscode/CSTParser.jl/blob/778212a77d2977b94a531240877eacff00700111/src/conversion.jl#L133-L203 it is impossible to predict the outcome. For certain package combinations with the SciML ecosystem, this constructor alone is responsible for thousands of invalidations. Such problems can only be detected by devoted and sophisticated sleuths, but designing the language to be bullet-proof against this solves it neatly with very little cost (when you want to return something different, just check before you call the constructor, not after).
@timholy timholy added this to the 2.0 milestone Sep 24, 2021
@Seelengrab
Copy link
Contributor

Seelengrab commented Sep 24, 2021

How would this work for e.g.

struct Parametrized{T}
   field::Vector{T}
end
Parametrized(x::Vector) = foo!(x) # returns a vector

? Would that also be restricted?

I'm thinking of this because there's a lazy shuffle in a package of mine, which tries to shuffle iterators around. For Vectors though, the specialized version already just calls shuffle!, avoiding having to allocate new memory, preserving shape etc. I guess I could change the interface though, as it's internal anyway.

@timholy
Copy link
Member Author

timholy commented Sep 24, 2021

Yes, under this proposal that would be a bad constructor. You could define parametrized(x) which might or might not return a Parametrized (good design suggests it should be "equivalent" to a Parametrized for all expected uses), but under this proposal Parametrized must return a Parametrized.

@JeffBezanson
Copy link
Member

Very good idea. There are legitimate uses of this anti-pattern, but I think they're limited to things like compatibility shims. It's also important to mention that the front-end is already allowed to assume this property of convert; i.e. whenever we generate a call to it we also put a typeassert on it. We (I?) had to do that because otherwise lots of type declarations the user took pains to write would simply have no effect, which would be really frustrating. So making this rule universal would be simpler.

@tkf
Copy link
Member

tkf commented Sep 24, 2021

It'd be great if we can remove or rephrase this in the manual

Constructors that don't return instances of their own type

In very rare cases it might make sense for the constructor T(x) to return an object not of type T.

-- https://docs.julialang.org/en/v1.6.3/manual/conversion-and-promotion/#Constructors-that-don't-return-instances-of-their-own-type

I don't think it's OK in any cases for new APIs. Keeping !(T(x) isa T) for compatibility is good, but I think it'd be sensible to recommend removing such API in the next breaking release in the existing packages.

This is an actionable item within the 1.x time frame.


There are legitimate uses of this anti-pattern, but I think they're limited to things like compatibility shims.

I'm curious what other legitimate uses. My belief has been there's practically none. Can't you do a compatibility shim with a factory function? I guess you'd need a type for T{...}(...) call signature but I think you can use some combination of Union and abstract type for compatibility. Maybe there's some corner cases that there's no way around this. But I'd hypothesize that you'd need to start with a very badly designed API.

@timholy
Copy link
Member Author

timholy commented Jul 25, 2022

I've finally discovered a case where I plan to exploit the current behavior; I still think we should change this for Julia 2.0, but the counterexample is interesting and worth knowing about. In an attempt to resolve #44538 (comment) and what to do about that PR in general, I am planning to "hamstring" Static.jl and see what breaks (as a way of figuring out where we may be getting constprop failures, and thus how badly we actually need StaticInt):

diff --git a/src/Static.jl b/src/Static.jl
index 82ab248..0762c33 100644
--- a/src/Static.jl
+++ b/src/Static.jl
@@ -35,10 +35,10 @@ A statically sized `Int`.
 Use `StaticInt(N)` instead of `Val(N)` when you want it to behave like a number.
 """
 struct StaticInt{N} <: StaticInteger{N}
-    StaticInt{N}() where {N} = new{N::Int}()
-    StaticInt(N::Int) = new{N}()
+    StaticInt{N}() where {N} = N # new{N::Int}()
+    StaticInt(N::Int) = N # new{N}()
     StaticInt(@nospecialize N::StaticInt) = N
-    StaticInt(::Val{N}) where {N} = StaticInt(N)
+    StaticInt(::Val{N}) where {N} = N # StaticInt(N)
 end

As you can see, this causes the StaticInt constructor to return an Int which will force us to rely on constprop.

I am thus only using this for investigation/debugging; I would never want to release code like this. So I'm still in favor of this proposal, but one wonders if one would like to be able to start Julia in a sloppier mode. Of course keeping such rarely-used functionality working would be its own challenge, so I am very unsure about whether that's really practical.

Anyway, food for thought if/when we do get around to considering a Julia 2.0 release.

@Seelengrab
Copy link
Contributor

Oh, that's a good find! In my endless optimism I'd like to interpret that as an argument for having the staticness/require-a-compile-const be a seperate feature from any specific type, but maybe that's just my friendliness towards Zig comptime leaking :)

@Seelengrab
Copy link
Contributor

Seelengrab commented Jul 3, 2024

You could define parametrized(x) which might or might not return a Parametrized (good design suggests it should be "equivalent" to a Parametrized for all expected uses), but under this proposal Parametrized must return a Parametrized.

I think I've found a usecase where guaranteeing not to return a Parametrized is required. Basically, not returning an object of that type allows a form of RAII to exist in Julia. I've written a bit about this on my blog, but the core of the matter is that having a struct like

# define a struct to act as a handle for our resource
struct CommunicationBus
    mutex::Semaphore
    # override the default constructors & require a Semaphore
    function CommunicationBus(f)
        # acquire the relevant mutex before we can construct an instance
        # ignoring error handling for the example etc.
        acquire(BUS_MUTEX)
        bus = new(BUS_MUTEX)
        try
            # pass the instance into our function, guaranteeing that
            # f has exclusive access to the specific resource
            f(bus)
        finally
            # release the mutex, no matter how we happen to exit f
            # and as soon as we do exit f
            release(bus.mutex)
        end
        nothing
    end
end

which explicitly doesn't return a CommunicationBus allows treating the call f(bus) as having exclusive access to a resource represented by that struct. If the constructor were required to return a CommunicationBus, we wouldn't be guaranteed to have exclusive access to the bus anymore, since the instance may have leaked outside of the callee of f, and something else would be able to misuse the object while we hold the mutex.

This would then be used like

function foo()
     CommunicationBus() do bus
          # communicate over the bus object, we have exclusive access!
          write(bus, 0x00)
     end
end

The advantages of this over e.g. a finalizer is that the mutex is guaranteed to be released as soon as possible (and not deferred to whenever GC happens to run), as well as not having to incur a heap allocation due to not being mutable.

One possibility if constructors were required to return an object of their stated type could be to rely on not providing an actual constructor at all. More or less changing the above struct to

function acquire_bus end

struct CommunicationBus
    mutex::Base.Semaphore
    function acquire_bus(f)
        acquire(BUS_MUTEX)
        bus = new(BUS_MUTEX)
        try
            f(bus)
        finally
            release(bus.mutex)
        end
        nothing
    end
end

so that CommunicationBus can only be instantiated through acquire_bus. This also doesn't currently work, since the method for acquire_bus defined in the struct is not accessible from the outside, as far as I can tell. If this RAII-like pattern is worth preserving under this proposal, it would be good to find a way to make acquire_bus callable 🤔

@N5N3
Copy link
Member

N5N3 commented Jul 3, 2024

This also doesn't currently work, since the method for acquire_bus defined in the struct is not accessible from the outside

You just need global acquire_bus for this case, see reinterpret and Base.ReinterpretArray for example.

@Seelengrab
Copy link
Contributor

Right, that's the trick! It's not really a pretty solution, but it's definitely a usable workaround. For completeness:

struct _RAIIBus
    mutex::Base.Semaphore

    global CommunicationBus
    function CommmunicationBus(f)
        acquire(BUS_MUTEX)
        bus = new(BUS_MUTEX)
        try
            f(bus)
        finally
            release(bus.mutex)
        end
        nothing
    end
end

It's such a small difference though that the benefits of requiring the actual constructor to return a proper object vastly outnumber the upsides of keeping this around for RAII!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

6 participants