Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

Closed
timholy opened this issue Jun 23, 2020 · 2 comments · Fixed by #37193
Closed

Julep: attempting to measure (& test?) "absolute" risk of invalidation #36393

timholy opened this issue Jun 23, 2020 · 2 comments · Fixed by #37193
Labels
compiler:inference Type inference compiler:latency Compiler latency

Comments

@timholy
Copy link
Member

timholy commented Jun 23, 2020

Short version: as protection against invalidation (and possibly "code quality" in general), perhaps we should consider some kind of nanosoldier test on the count of backedges of poorly-inferred MethodInstances. Also an update on progress (& regressions) from 1.4 to master.

There are two things that bother me about my recent work on improving inferrability of Base, Pkg, REPL, etc.:

  • it's all package-specific: I care about PkgX, so I search for and stomp on invalidations that it triggers, but that may or may not help users of PkgY. Worse, is there any risk it could make things worse for PkgY?
  • it's easy to imagine regressions. The fixes I've been making only rarely have a consequence that can be caught by @inferred, as many methods return a known type even if their internals are not well-inferred.

For these reasons, I've begun to wonder whether we can & should add some tests that look for "problematic" MethodInstances and tally their number of backedges. The goal would be to develop an absolute measure of our vulnerability to invalidation: at-risk MethodInstances that have the most backedges can cause the most invalidation. If we can reliably measure our vulnerability, then we could also introduce some "hold the line" tests, which prevent our vulnerability from increasing above a certain level. I recognize that there are serious downsides in adding such tests---someone wants to add cool feature X but it's blocked because it's not easy to do that inferrably---but there are also quality-concern risks with not having such tests. So, I thought I'd begin by at least trying to come up with some kind of overall measure of our vulnerability.

At the bottom of this I present a couple of scripts that seem useful. What they try to do is pull out MethodInstances that seem possibly problematic, in the sense that they fit patterns that I've seen while studying invalidations. Overwhelmingly, they involve abstract types, but many methods with abstract types seem reasonably OK, so I've tried to accommodate that in some specific ways. I should say right from the outset that I'm not at all confident that I've made the selection well, but at least it does pull out a lot of MethodInstances that have featured heavily in my invalidation work (as well as many that haven't). For more detail you should really just open up the code below.

The output of the analysis is a scatter plot, comparing where we were on Julia 1.4.2 vs where we are now on master. Each dot represents one possibly-problematic MethodInstance signature. Magenta is for exported names, green for non-exported names:

method_scatter_log

You can see that overall there seems to have been some improvement, at least if you restrict yourself to exported names: there's a strong tendency for magenta dots to be near or below the diagonal. Until I saw this, I had no idea how far we were towards "done," or even whether it's possible to ask that question in a reasonable way. While I've put a lot of time into hunting these invalidations down, overall this result makes me think that efforts to improve our code-quality are worth it, and that we're mostly not living in a world where #35844 is our only hope (CC @c42f). Time will tell...

It's worth noting that the scripts below generate an interactive plot, where you can click on individual dots and it will print the corresponding signature to the REPL. So running it locally can be an informative exercise. How informative it actually is comes down to how well I've selected "problematic" MethodInstances, of course, so take all this with a big grain of salt. (Quite a few of the dots on the plot are nothing to worry about, e.g., setindex!(::Vector{Any}, ::Any, ::Int) is perfectly fine. Others, like notify(::Base.GenericCondition{Base.Threads.SpinLock},::Any,::Bool,::Bool) seem to be at low risk for being extended by packages.)

Some cases where we've made a lot of progress are in various ==, !=, and isequal methods, a small handful of convert methods, some obscure yet surprisingly-important methods like +(::Ptr{Nothing},::Integer), and several string-generation and manipuluation methods (e.g., repr, thisind, lastindex, isempty(::AbstractString), getindex(::String, ::Any), ^(::String, ::Integer), Base.isidentifier, and Base._show_default). Interestingly, there are also a handful of significant regressions since 1.4, mostly in path-handling (we've had a huge increase in backedges for basename(::AbstractString), isfile(::Any)) but also a small number of others like in(::Any, ::Tuple{Symbol}), +(::Int, ::Any, ::Any), and Base.split_sign(::Integer). It is possible that some of these are consequences of inference fixes (e.g., basename(::AbstractString) seems better than basename(::Any)), but these would be worth digging into.

Based on this result I'm beginning to think that it probably would be worth introducing some "hold the line" tests in Base. But I'd be very curious about other folks' thoughts on this matter.

This script saves the data to a log file. Run once on each Julia version you want to analyze:

using MethodAnalysis

# Analyze MethodInstance signatures and select those that seem at risk for being invalidated.
function atrisktype(@nospecialize(typ))
    # signatures like `convert(Vector, a)`, `foo(::Vararg{Synbol,N}) where N` do not seem to pose a problem
    isa(typ, TypeVar) && return false
    # isbits parameters are not a problem
    isa(typ, Type) || return false
    if isa(typ, UnionAll)
        typ = Base.unwrap_unionall(typ)
    end
    # Exclude signatures with Union{}
    typ === Union{} && return false
    isa(typ, Union) && return atrisktype(typ.a) | atrisktype(typ.b)
    # Type{T}: signatures like `convert(::Type{AbstractString}, ::String)` are not problematic, so mark Type as OK
    typ <: Type && return false
    if typ <: Tuple && length(typ.parameters) >= 1
        p1 = typ.parameters[1]
        # Constructor calls are not themselves a problem (any `convert`s they trigger might be, but those are covered)
        isa(p1, Type) && p1 <: Type && return false
        # convert(::Type{T}, ::S) where S<:T is not problematic
        if p1 === typeof(Base.convert) || p1 === typeof(Core.convert) || p1 === typeof(Core.Compiler.convert)
            p2, p3 = typ.parameters[2], typ.parameters[3]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                if isa(p2, DataType) && length(p2.parameters) === 1
                    T = p2.parameters[1]
                    isa(p3, Type) && isa(T, Type) && p3 <: T && return false
                end
            end
        # `getindex`, `length`, etc are OK for various Tuple{T1,T2,...}
        elseif (p1 === typeof(Base.getindex) || p1 === typeof(Core.Compiler.getindex)) ||
               (p1 === typeof(Base.length)  || p1 === typeof(Core.Compiler.length)) ||
               (p1 === typeof(Base.isempty) || p1 === typeof(Core.Compiler.isempty)) ||
               (p1 === typeof(Base.iterate) || p1 === typeof(Core.iterate) || p1 === typeof(Core.Compiler.iterate))
            p2 = typ.parameters[2]
            if isa(p2, Type)
                p2 = Base.unwrap_unionall(p2)
                p2 <: Tuple && return false
            end
        # show(io::IO, x) is OK as long as typeof(x) is safe
        elseif p1 === typeof(Base.show)
            atrisktype(typ.parameters[2]) && return true
            length(typ.parameters) == 3 && atrisktype(typ.parameters[3]) && return true
            return false
        end
    end
    # Standard DataTypes
    isconcretetype(typ) && return false
    # ::Function args are excluded
    typ === Function && return false
    !isempty(typ.parameters) && (any(atrisktype, typ.parameters) || return false)
    return true
end

# A few tests
@assert  atrisktype(Tuple{typeof(==),Any,Any})
@assert  atrisktype(Tuple{typeof(==),Symbol,Any})
@assert  atrisktype(Tuple{typeof(==),Any,Symbol})
@assert !atrisktype(Tuple{typeof(==),Symbol,Symbol})
@assert !atrisktype(Tuple{typeof(convert),Type{Any},Any})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},AbstractString})
@assert !atrisktype(Tuple{typeof(convert),Type{AbstractString},String})
@assert  atrisktype(Tuple{typeof(convert),Type{String},AbstractString})
@assert !atrisktype(Tuple{typeof(map),Function,Vector{Any}})
@assert !atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Union{String,Int}})
@assert  atrisktype(Tuple{typeof(getindex),Dict{Union{String,Int},Any},Any})
@assert !atrisktype(Tuple{Type{BoundsError},Any,Any})
@assert  atrisktype(Tuple{typeof(sin),Any})
@assert !atrisktype(Tuple{typeof(length),Tuple{Any,Any}})

function collect_mis(m::Method)
    list = Core.MethodInstance[]
    visit(m) do item
        if isa(item, Core.MethodInstance)
            push!(list, item)
            return false
        end
        return true
    end
    return list
end

const mis = Dict{Method,Vector{Core.MethodInstance}}()
visit() do item
    if item isa Method
        m = item
        mis[m] = collect_mis(m)
        return false
    end
    return true
end

# Count # of backedges for MethodInstances with abstract types
const becounter = Dict{Core.MethodInstance,Int}()
visit() do item
    if item isa Core.MethodInstance
        if atrisktype(item.specTypes)
            becounter[item] = length(all_backedges(item))
        end
        return false
    end
    return true
end

prs = sort!(collect(becounter); by=last)

open("/tmp/methdata_$VERSION.log", "w") do io
    for (mi, c) in prs
        c == 0 && continue
        println(io, mi.specTypes=>c)
    end
end

This script generates the interactive plot. You'll probably have to modify some file names.

using PyPlot

function parseline(line)
    m = match(r"(.*) => (.*)", line)
    sigstr, count = m.captures[1], parse(Int, m.captures[2])
    sig = try
        ex = Meta.parse(sigstr)
        eval(ex)
    catch
        return nothing
    end
    return sig, count
end

function parsedata(filename)
    lines = readlines(filename)
    sigcount = IdDict{Any,Int}()
    for line in lines
        ret = parseline(line)
        ret === nothing && continue
        sig, count = ret
        sigcount[sig] = count
    end
    return sigcount
end

function split_comparable(sigc1, sigc2)
    c1, c2, sigs = Int[], Int[], Any[]
    for (sig, c) in sigc1
        push!(sigs, sig)
        push!(c1, sigc1[sig])
        push!(c2, get(sigc2, sig, 0))
    end
    for (sig, c) in sigc2
        if !haskey(sigc1, sig)
            push!(sigs, sig)
            push!(c1, 0)
            push!(c2, c)
        end
    end
    return sigs, c1, c2
end

sigc16 = parsedata("/tmp/methdata_$VERSION.log")
sigc14 = parsedata("/tmp/methdata_1.4.3-pre.0.log")

sigs, c1, c2 = split_comparable(sigc14, sigc16)
mx1, mx2 = maximum(c1), maximum(c2)
isexported(sig) = (ft = Base.unwrap_unionall(sig).parameters[1]; isdefined(Main, ft.name.mt.name))
colors = [isexported(sig) ? "magenta" : "green" for sig in sigs]

function on_click(event)
    x, y = event.xdata, event.ydata
    normsqrdist(pr) = ((pr[1]-x)/mx1)^2 + ((pr[2]-y)/mx2)^2
    idx = argmin(normsqrdist.(zip(c1, c2)))
    println(sigs[idx])
end
begin
    fig, ax = plt.subplots()
    ax.scatter(c1 .+ 1, c2 .+ 1, c=colors)  # + 1 for the log-scaling
    ax.set_xlabel("# backedges + 1, 1.4")
    ax.set_ylabel("# backedges + 1, 1.6")
    ax.set_xscale("log")
    ax.set_yscale("log")
    fig.canvas.callbacks.connect("button_press_event", on_click)
    fig
end

Note: in the plotting script, if you copy/paste the begin...end block to the REPL and execute it, you can use the zoom features of matplotlib. For reasons I don't understand, they don't show up when you just include the script.

@timholy timholy added compiler:inference Type inference compiler:latency Compiler latency labels Jun 23, 2020
@timholy
Copy link
Member Author

timholy commented Jun 30, 2020

Here's a version of the plot above that combines all MethodInstances for a single Method:

image

It's a bit less cluttered and likely closer to something we'd want to test: shifting something from ==(::String, ::Any) to ==(::String, ::AbstractString) is still probably worth considering as progress, or neutral at worst, and by aggregating all instances this kind of change would be considered neutral.

I hadn't commented on the ones at the left edge of the plot, which may be particularly concerning. Omitting ones in Core.Compiler, the Method signatures with 10 or more backedges are:

Tuple{typeof(startswith),AbstractString,Union{AbstractChar, Tuple{Vararg{AbstractChar,N} where N}, Set{var"#s69"} where var"#s69"<:AbstractChar, AbstractArray{var"#s70",1} where var"#s70"<:AbstractChar}}
Tuple{typeof(sort!),AbstractArray{T,1} where T,Integer,Integer,Base.Sort.InsertionSortAlg,Base.Order.Ordering}
Tuple{Base.var"##pipeline#593",Any,Any,Any,Bool,typeof(pipeline),Base.AbstractCmd}
Tuple{typeof(Base.StringVector),Integer}
Tuple{typeof(Base.cconvert),Type{var"#s4"} where var"#s4"<:Ptr,Any}
Tuple{typeof(sort!),AbstractArray{T,1} where T,Integer,Integer,Base.Sort.MergeSortAlg,Base.Order.Ordering}
Tuple{typeof(ismutable),Any}
Tuple{Base.var"##string#334",Integer,Integer,typeof(string),Integer}
Tuple{typeof(Base.Meta.isexpr),Any,Any}
Tuple{typeof(Base.Meta.isexpr),Any,Symbol,Int64}
Tuple{typeof(getproperty),Base.AbstractPipe,Symbol}

Other than cconvert and maybe startswith, none of these seem likely to be specialized for anything that might overlap those signatures. Conversely, the ones along the bottom edge (which have all poorly-inferred specializations eliminated) are:

Tuple{typeof(similar),AbstractArray,Type{T},Vararg{Union{Integer, AbstractUnitRange},N} where N} where T
Tuple{typeof(similar),Type{T},Vararg{Union{Integer, AbstractUnitRange},N} where N} where T<:AbstractArray
Tuple{typeof(+),Ptr,Integer}
Tuple{typeof(isempty),AbstractString}
Tuple{typeof(lastindex),AbstractString}
Tuple{typeof(pointer_from_objref),Any}
Tuple{typeof(convert),Type{SubString{S}},AbstractString} where S<:AbstractString
Tuple{typeof(get!),Dict{K,V},Any,Any} where V where K
Tuple{typeof(get!),Union{Function, Type},Dict{K,V},Any} where V where K
Tuple{typeof(merge),NamedTuple,Any}
Tuple{typeof(Base.ensure_indexable),Tuple{Any,Vararg{Any,N} where N}}
Tuple{typeof(to_indices),Any,Tuple}
Tuple{typeof(iterate),Dict,Any}
Tuple{typeof(length),Dict}

and several others. A lot of these are more likely to be specialized.

Overall this paints a picture of considerable progress, a few regressions, and lots still to address.

timholy added a commit to timholy/MethodAnalysis.jl that referenced this issue Jun 30, 2020
@timholy
Copy link
Member Author

timholy commented Aug 24, 2020

Time for an update! In the data below, by 1.6 I'm including all my still-unmerged PRs, but if most of that gets merged then this will be closer to the real truth anyway...

First, the scatter plot against 1.5:

scatter

Very few are above the diagonal; here's a list of some I clicked on and an assessment of what would cause them to be invalidated:

Signature Potential invalidation trigger risk
Tuple{typeof(getproperty),Base.LibuvStream,Symbol} new getproperty methods for LibuvStream subtypes low (this would essentially be piracy)
Tuple{typeof(isequal),Char,AbstractChar} new isequal methods for new subtypes of AbstractChar moderate
Tuple{Base.var"##pipeline#585",Nothing,Pipe,Nothing,Bool,typeof(pipeline),Base.AbstractCmd} new pipeline methods for new AbstractCmds low
Tuple{typeof(readuntil),Base.AbstractPipe,Char} new readuntil methods for (new) AbstractPipe subtypes low
Tuple{typeof(isopen),Base.LibuvStream} new isopen methods for (new) LibuvStream subtypes low
Tuple{typeof(Base.CoreLogging.env_override_minlevel),Symbol,Any} meths for new logger types low to moderate (this is not typically specialized or part of the documented extension API)
Tuple{typeof(haskey),Dict{Char,Any},Any} new haskey specializations for Dict low (this would be piracy)
Tuple{typeof(replace),AbstractString,Pair{Regex,String}} new replace specializations for new strings high
Tuple{typeof(print),IO,Any,Char} low (people should specialize show, not print)

But what may tell the story even more directly is a comparison of histograms. This tallies the number of backedges for each MethodInstance with an "at risk" signature, meaning one that my analysis suggests might be vulnerable to invalidation:

hist

This clearly reveals a lot of progress---problematic signatures on 1.6 are likely to have no backedges (the signatures, it should be said, are merged from 1.5 and 1.6, so the ones in the "zero bin" are probably not even MethodInstances on 1.6). Here are the most problematic remaining cases:

                                                                                                            
                                                         MethodInstance for string(::String, ::Any, ::String) => (nmethods = 19, nbackedges = 275)
                                                  MethodInstance for string(::String, ::Any, ::String, ::Any) => (nmethods = 19, nbackedges = 278)
                                                    MethodInstance for iterate(::Base.Iterators.Pairs, ::Any) => (nmethods = 223, nbackedges = 304)
                       MethodInstance for string(::String, ::Type{Pkg.TOML.Table}, ::String, ::Any, ::String) => (nmethods = 19, nbackedges = 317)
                                         MethodInstance for string(::String, ::Type, ::Vararg{Any,N} where N) => (nmethods = 19, nbackedges = 323)
                                           MethodInstance for setindex!(::IdDict{Symbol,Int64}, ::Any, ::Any) => (nmethods = 89, nbackedges = 363)
                                   MethodInstance for abspath(::AbstractString, ::String, ::String, ::String) => (nmethods = 3, nbackedges = 366)
                                  MethodInstance for joinpath(::AbstractString, ::String, ::String, ::String) => (nmethods = 2, nbackedges = 367)
                                                MethodInstance for convert(::Type{T} where T<:Tuple, ::Tuple) => (nmethods = 184, nbackedges = 409)
                                                          MethodInstance for get(::IOBuffer, ::Symbol, ::Any) => (nmethods = 25, nbackedges = 417)
                                                 MethodInstance for convert(::Type{String}, ::AbstractString) => (nmethods = 184, nbackedges = 426)
                                                                 MethodInstance for isempty(::AbstractString) => (nmethods = 41, nbackedges = 530)
                                                             MethodInstance for *(::AbstractString, ::String) => (nmethods = 369, nbackedges = 545)
                                                        MethodInstance for string(::AbstractString, ::String) => (nmethods = 19, nbackedges = 550)
                                                               MethodInstance for lastindex(::AbstractString) => (nmethods = 13, nbackedges = 596)
                                                        MethodInstance for thisind(::AbstractString, ::Int64) => (nmethods = 4, nbackedges = 599)
                                                        MethodInstance for rem(::AbstractChar, ::Type{UInt8}) => (nmethods = 145, nbackedges = 772)
                                                           MethodInstance for isequal(::Char, ::AbstractChar) => (nmethods = 22, nbackedges = 773)
                                                                MethodInstance for <=(::AbstractChar, ::Char) => (nmethods = 56, nbackedges = 826)
                                                                 MethodInstance for <(::AbstractChar, ::Char) => (nmethods = 75, nbackedges = 827)
                                                            MethodInstance for isless(::AbstractChar, ::Char) => (nmethods = 41, nbackedges = 828)
                                                                MethodInstance for ==(::Char, ::AbstractChar) => (nmethods = 157, nbackedges = 830)
                                                                MethodInstance for ==(::AbstractChar, ::Char) => (nmethods = 157, nbackedges = 845)
             MethodInstance for notify(::Base.GenericCondition{Base.Threads.SpinLock}, ::Any, ::Bool, ::Bool) => (nmethods = 4, nbackedges = 1572)

It's striking how much of this is string-processing, an area I've not really focused on because I don't tend to load a lot of string-focused packages. While I've made a few contributions to this area already, on balance I suspect this will be a good area for others to contribute to reducing our vulnerability.

As fair warning: once my remaining PRs are merged, I plan to contribute a test that is along the lines of capturing the progress in that histogram so that this progress gets "locked in."

timholy added a commit that referenced this issue Aug 25, 2020
timholy added a commit that referenced this issue Nov 6, 2020
timholy added a commit that referenced this issue Nov 8, 2020
timholy added a commit that referenced this issue Nov 8, 2020
timholy added a commit that referenced this issue Nov 8, 2020
timholy added a commit that referenced this issue Nov 8, 2020
timholy added a commit that referenced this issue Nov 9, 2020
timholy added a commit that referenced this issue Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant