-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor @avx to rewrite loops into a @generated function. #14
Comments
We will need to ensure that, whatever approach we take, the compiler can compile it all away so that it does not introduce any runtime overhead.
There are 3 basic steps:
Once you have the All we need for a new "frontend" is When taking the generated function approach (like in broadcasting), we can subdivide it into two parts: You can see an example of The functions they call to add operations to the Regarding julia> dotq = :(for i ∈ eachindex(a,b)
s += a[i]*b[i]
end)
:(for i = eachindex(a, b)
#= REPL[46]:2 =#
s += a[i] * b[i]
end)
julia> dump(dotq)
Expr
head: Symbol for
args: Array{Any}((2,))
1: Expr
head: Symbol =
args: Array{Any}((2,))
1: Symbol i
2: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol eachindex
2: Symbol a
3: Symbol b
2: Expr
head: Symbol block
args: Array{Any}((2,))
1: LineNumberNode
line: Int64 2
file: Symbol REPL[46]
2: Expr
head: Symbol +=
args: Array{Any}((2,))
1: Symbol s
2: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol *
2: Expr
3: Expr Perhaps something like: struct For{T<:Tuple} end
struct Block{T<:Tuple} end
struct Equal{T<:Tuple} end
struct Call{T<:Tuple} end
struct ArrayRef{T<:Tuple} end
# implement macro that transforms `dotq` into:
looptype = For{Tuple{Equal{Tuple{:i,Call{Tuple{:eachindex,:a,:b}}()}}(),Block{Tuple{Equal{Tuple{:s,Call{Tuple{:muladd,ArrayRef{Tuple{:a,:i}}(),ArrayRef{Tuple{:b,:i}}(),:s}}()}}()}}()}}()
# implement a generated function that can transform the above into
lsdot = LoopVectorization.LoopSet(dotq); The generated function would be called on looptype, and also receive all variables referenced (in this case, just We would define a canonical ordering of these variables (perhaps Would you like to give it a go, or wait until I get around to it? A couple parting notes:
|
Okay, so I played around with this today and here's what I've got so far: struct Ex{T, Tup} end
function to_type!(ex::Expr, vars::Vector{Symbol}, ivars::Vector{Symbol})
if ex.head == :(=) && ex.args[1] isa Symbol
push!(ivars, ex.args[1])
end
Ex{ex.head, Tuple{to_type!.(ex.args, Ref(vars), Ref(ivars))...}}
end
function to_type!(x, vars::Vector{Symbol}, ivars::Vector{Symbol})
x
end
to_type!(::LineNumberNode, vars::Vector{Symbol}, ivars::Vector{Symbol}) = nothing
function to_type!(x::Symbol, vars::Vector{Symbol}, ivars::Vector{Symbol})
x ∉ vars && x ∉ ivars && push!(vars, x)
x
end
function to_expr(ex::Type{Ex{Head, Tup}}) where {Head, Tup}
Expr(Head, (to_expr(x) for x in Tup.parameters)...)
end
to_expr(x) = x
nt(keys, vals) = NamedTuple{keys, typeof(vals)}(vals)
macro avx_parse(ex)
vars = Symbol[]
ivars = Symbol[]
type_ex = to_type!(ex, vars, ivars)
tvars = Tuple(vars)
qtivars = QuoteNode(Tuple(ivars))
quote
$(QuoteNode(type_ex)), nt($(QuoteNode(tvars)), (tuple($(vars...)))), $qtivars
end |> esc
end julia> a = rand(5); b = randn(5);
julia> ex_t, vars, ivars = @avx_parse begin
s = 0.0
for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
end;
julia> ex_t
Ex{:block,Tuple{nothing,Ex{:(=),Tuple{:s,0.0}},nothing,Ex{:for,Tuple{Ex{:(=),Tuple{:i,Ex{:call,Tuple{:eachindex,:a,:b}}}},Ex{:block,Tuple{nothing,Ex{:+=,Tuple{:s,Ex{:call,Tuple{:*,Ex{:ref,Tuple{:a,:i}},Ex{:ref,Tuple{:b,:i}}}}}}}}}}}}
julia> vars
(eachindex = eachindex, a = [0.864891273566464, 0.016615952609847495, 0.24528618468148156, 0.9045614698725608, 0.24968365320302022], b = [0.10359624603037468, 0.7595643433938144, -0.8973765174693641, 0.20134830886960015, 0.49462379671835727], * = *)
julia> ivars
(:s, :i)
julia> to_expr(ex_t)
quote
nothing
s = 0.0
nothing
for i = eachindex(a, b)
nothing
s += a[i] * b[i]
end
end Currently, I'm just converting As a high level summary, the above code can take (I believe) a completely arbitrary expression, convert it into a type, and then output a named tuple matching variables from the expression, excluding the variables which are assigned within the expression (such as My plan is then that this macro would then pass it's three outputs to a generated function which would call The functions such as What are your thoughts on this approach so far? Any questions or concerns? |
Okay, so I played around a bit more and implemented a very stupid early version of
using MacroTools: prewalk
using LoopVectorization: LoopVectorization, LoopSet, lower
#----------------------------------------------------------------------------------------------------
struct Ex{T, Tup} end
function to_type(ex::Expr)
Ex{ex.head, Tuple{to_type.(ex.args)...}}
end
to_type(x) = x
to_type(::LineNumberNode) = nothing
#----------------------------------------------------------------------------------------------------
to_expr(ex::Type{Ex{Head, Tup}}) where {Head, Tup} = Expr(Head, (to_expr(x) for x in Tup.parameters)...)
to_expr(x) = x
#----------------------------------------------------------------------------------------------------
function find_vars_and_gensym!(ex::Expr, vars::Dict{Symbol, Symbol}, ivars::Vector{Symbol})
if ex.head == :(=) && ex.args[1] isa Symbol
push!(ivars, ex.args[1])
elseif ex.head == :call
push!(ivars, ex.args[1])
end
ex
end
function find_vars_and_gensym!(x::Symbol, vars::Dict{Symbol, Symbol}, ivars::Vector{Symbol})
if (x ∉ keys(vars)) && (x ∉ ivars)
gx = gensym(x)
push!(vars, x => gx)
gx
else
x
end
end
find_vars_and_gensym!(x, vars::Dict{Symbol, Symbol}, ivars::Vector{Symbol}) = x
#----------------------------------------------------------------------------------------------------
nt(keys, vals) = NamedTuple{keys, typeof(vals)}(vals)
macro _avx(ex)
D = Dict{Symbol, Symbol}()
ivars = Symbol[]
gex = prewalk(x -> find_vars_and_gensym!(x, D, ivars), ex)
type_ex = to_type(gex)
tvars = Tuple(keys(D))
tgvars = Tuple(values(D))
quote
kwargs = nt($(QuoteNode(tgvars)), $(Expr(:tuple, tvars...)))
$(Expr(:tuple, tvars...)) = _avx($(QuoteNode(type_ex)), kwargs)
end |> esc
end
@generated function _avx(::Type{ex_t}, var_nt::NamedTuple{keys, var_types}) where {ex_t <: Ex, keys, var_types}
ex = to_expr(ex_t)
var_defs = Expr(:block)
for k in keys
push!(var_defs.args, :($k = var_nt[$(QuoteNode(k))]))
end
quote
$var_defs
$(lower(LoopSet(ex)))
$(Expr(:tuple, keys...))
end
end It technically does work, though something is causing a big performance bug: function mydot(a, b)
s = 0.0
for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
s
end
function mydotavx(a, b)
s = 0.0
@_avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end
s
end
a = rand(10); b = rand(10);
@btime mydot($a, $b) # 8.521 ns (0 allocations: 0 bytes)
@btime mydotavx($a, $b) # 2.321 μs (30 allocations: 1.03 KiB)
@show mydotavx(a, b) ≈ mydot(a, b) # true I think the problem is that I'm causing arrays to be reallocated a bunch, since the performance cost is scaling linearly with the size of The reason I need the above weird convoluted design is that unlike a macro, variables defined outside a @_avx for i ∈ eachindex(a,b)
s += a[i]*b[i]
end into a, b, s = _avx(#= a type encoding the expression =#, (_a=a, _b=b, _s=s)) and the generated function Yikes. I'm really feeling like I'm shoving a square peg in a round hole. After being quite gung-ho about the generated function approach, I'm starting to have concerns. |
Thanks for this. I'm preoccupied with unfortunate personal affairs until Sunday, so it will be a little while until I can give this a serious look (although I'm still trying to fix the other recent issue as well at the moment). I assume the I would only return the 30 is a lot of allocations though. More than I'd expect from just constructing a tuple with two arrays in it. What happens if you inline |
No problem, take your time.
Actually, it doesn't. Strangely, julia> @code_warntype mydotavx(a, b)
Variables
#self#::Core.Compiler.Const(mydotavx, false)
a@_2::Array{Float64,1}
b@_3::Array{Float64,1}
kwargs::NamedTuple{(Symbol("##a#402"), Symbol("##b#403"), Symbol("##s#404")),Tuple{Array{Float64,1},Array{Float64,1},Float64}}
@_5::Int64
s::Any
a@_7::Array{Float64,1}
b@_8::Array{Float64,1}
Body::Any
1 ─ (b@_8 = b@_3)
│ (a@_7 = a@_2)
│ (s = Main.zero(Main.Float64))
│ %4 = Core.tuple(a@_7, b@_8, s::Core.Compiler.Const(0.0, false))::Core.Compiler.PartialStruct(Tuple{Array{Float64,1},Array{Float64,1},Float64}, Any[Array{Float64,1}, Array{Float64,1}, Core.Compiler.Const(0.0, false)])
│ (kwargs = Main.nt($(QuoteNode((Symbol("##a#402"), Symbol("##b#403"), Symbol("##s#404")))), %4))
│ %6 = Main._avx($(QuoteNode(Ex{:for,Tuple{Ex{:(=),Tuple{:i,Ex{:call,Tuple{:eachindex,Symbol("##a#402"),Symbol("##b#403")}}}},Ex{:block,Tuple{nothing,Ex{:+=,Tuple{Symbol("##s#404"),Ex{:call,Tuple{:*,Ex{:ref,Tuple{:a,:i}},Ex{:ref,Tuple{:b,:i}}}}}}}}}})), kwargs)::Tuple{Array{Float64,1},Array{Float64,1},Any}
│ %7 = Base.indexed_iterate(%6, 1)::Core.Compiler.PartialStruct(Tuple{Array{Float64,1},Int64}, Any[Array{Float64,1}, Core.Compiler.Const(2, false)])
│ (a@_7 = Core.getfield(%7, 1))
│ (@_5 = Core.getfield(%7, 2))
│ %10 = Base.indexed_iterate(%6, 2, @_5::Core.Compiler.Const(2, false))::Core.Compiler.PartialStruct(Tuple{Array{Float64,1},Int64}, Any[Array{Float64,1}, Core.Compiler.Const(3, false)])
│ (b@_8 = Core.getfield(%10, 1))
│ (@_5 = Core.getfield(%10, 2))
│ %13 = Base.indexed_iterate(%6, 3, @_5::Core.Compiler.Const(3, false))::Core.Compiler.PartialStruct(Tuple{Any,Int64}, Any[Any, Core.Compiler.Const(4, false)])
│ (s = Core.getfield(%13, 1))
└── return s If I make it put in type assertions on all the variables before they're returned, the code_warntype for
Hm, so that will require some analysis of the loop prior to the generated function then so that we know what variables need to be splatted into. I guess determining which bindings change doesn't require type info so I guess it'd be okay to hoist that analysis to the macro. I think the discussion here is starting to go beyond an issue. I'll open a draft PR. |
I rewrote everything, but I don't have the impression that it actually helped compile times at all.
I have added support for specialization on adjoint/transpose, as well as using size information. using StaticArrays, LoopVectorization
LoopVectorization.maybestaticsize(::MArray{<:Tuple{M,Vararg}}, ::Val{1}) where {M} = LoopVectorization.Static{M}()
LoopVectorization.maybestaticsize(::MArray{<:Tuple{M,N,Vararg}}, ::Val{2}) where {M,N} = LoopVectorization.Static{N}() If you defined loopbounds as The static size parameters must be placed on loop bounds. If you defined variables that the macro can't see, it naturally won't be able to pick them up. I will close this, so that future issues on this theme can focus more on specific features / was to take advantage of this. |
@chriselrod mentioned on Discourse that the fact that
@avx
is a macro imposes some codegen difficulties since macros don't see type information, and mentioned generated functions as an alternative. I agree that the best solution would be to gut all the complicated parts out of@avx
and put them into@generated function avx
and then make@avx
just rewrite the input loops into appropriate data structures to be passed to theavx
function.This would cleanly separate two the distinct functionalities which are currently conflated: syntax transformations and complicated, heavily datatype dependant code generation. Furthermore, it'd make it much easier to expose an interface for other packages to write methods and make their types compatible with
@avx
.This won't be news to Chris, but I figured it'd be good to have an issue here for the benefit of other readers.
I'd be happy to help with the generated functions and macro rewrite if you're looking for help, though it will take me a while to go through and digest the code. If you could point out places where the syntax transformations occur, that would accelerate the process.
The text was updated successfully, but these errors were encountered: