Skip to content

Commit

Permalink
Add top-level compile time to at-time report.
Browse files Browse the repository at this point in the history
As discussed in the comments for JuliaLang#37678, the `% compilation time` metric
added to `@time` is not an accurate measurement of the total compilation
time. It is in fact excluding any top-level compilation time, which may
in fact be quite expensive if the body of your block is fully type
stable and inferrable.

This change adds a final print to the report for the top-level
compilation time that was spent compiling your block. For example:

```julia
julia> @eval function f(a)
           $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end)
           $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end)
           return a
       end
f (generic function with 1 method)

julia> @time f(2)
  0.000000 seconds + 0.474445 seconds top-level compilation time
-2.53151813758841e205

julia> x = 2
2

julia> @time f(x)  # Inferring with a non-const is slower, but does not happen at top-level, since x is a global whose type cannot be known until runtime.
  1.212882 seconds (129.51 k allocations: 4.811 MiB, 100.00% compilation time)
-2.53151813758841e205

julia> @time f(x)  # But now, subsequent invocations have no compilation time since it's already compiled.
  0.000006 seconds (1 allocation: 16 bytes)
-2.53151813758841e205
```
Without this change, the `@time f(2)` call, would have reported
`0.0 seconds` with no other context.

The following example shows both types of compilation time reported:
```julia
julia> @time peakflops()
  0.803850 seconds (2.26 M allocations: 181.699 MiB, 4.90% gc time, 84.81% compilation time) + 0.017840 seconds top-level compilation time
1.5840888414049457e11
```

----
This change is fairly complex. Here's why:

The trouble with the `@time` macro is that the user's code is compiled
as _part of_ compiling the enclosing block, so the naive implementation
of this change would report _every invocation_ as having some
compilation time.

To address this, we use a heuristic: we record the time to compile an
_empty block_ that simply times an empty statement, and subtract that
empty block compilation time from the time to compile the user's block,
to get the true user compilation time. We also subtract the inner,
dynamic compilation time (the number used to report the percentage), to
get the top-level compilation time. Finally, we only report the
top-level compilation time if it is suffiently larger than the empty
block's compilation time by a ratio threshold, to prevent reporting
compilation time due to noisy variance.

It's annoyingly tricky, but we think it works, and achieves the best of
all the above worlds regarding reporting these metrics. :)

The user won't be fooled by `@time` reporting a much shorter time than
the actual wall time because we actually spent 5 min inferring the code
(this has happened to me in real life, and confused me for days). And
the user _also_ won't be confused to see top-level compilation time
where they don't expect it. And the total runtime reported remains only
the time to execute their function, which is what they likely expect.
It's complex, but we think this is a nice balance. :)

Co-Authored-By: Valentin Churavy <[email protected]>
  • Loading branch information
NHDaly and vchuravy committed Nov 28, 2020
1 parent 1f70c06 commit 194e4b4
Showing 1 changed file with 44 additions and 9 deletions.
53 changes: 44 additions & 9 deletions base/timing.jl
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ function format_bytes(bytes) # also used by InteractiveUtils
end
end

function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0)
function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0, outer_compile_time=0)
timestr = Ryu.writefixed(Float64(elapsedtime/1e9), 6)
length(timestr) < 10 && print(" "^(10 - length(timestr)))
print(timestr, " seconds")
Expand Down Expand Up @@ -142,6 +142,10 @@ function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0)
print(Ryu.writefixed(Float64(100*compile_time/elapsedtime), 2), "% compilation time")
end
parens && print(")")
if outer_compile_time > 0
print(" + ")
print(Ryu.writefixed(Float64(outer_compile_time/1e9), 6), " seconds top-level compilation time")
end
nothing
end

Expand Down Expand Up @@ -195,21 +199,52 @@ julia> @time begin
"""
macro time(ex)
quote
# First, measure how long it takes to compile the @time block without a body
local empty_body_compiletime = cumulative_compile_time_ns()
$(_time_expr(nothing))
empty_body_compiletime = cumulative_compile_time_ns() - empty_body_compiletime

# Then, run the real code, and measure the compilation time there, and subtract out
# the time for just the overhead.
local outer_compiletime = cumulative_compile_time_ns()
elapsedtime, inner_compiletime, gcdiff, val = $(_time_expr(ex))
outer_compiletime = cumulative_compile_time_ns() - outer_compiletime

# Subtract out the inner (dynamic dispatch) compilation time to get toplevel + overhead.
outer_compiletime = outer_compiletime - inner_compiletime
# Threshold to display compilation time (because computers have hella variance)
overhead_ratio = outer_compiletime / empty_body_compiletime
if overhead_ratio > 2.0 # three standard deviations of noise in our experiments
# Subtract out the overhead, so we reflect only the true compiletime for user code.
toplevel_compiletime = outer_compiletime - empty_body_compiletime
else
toplevel_compiletime = 0
end

time_print(elapsedtime, gcdiff.allocd, gcdiff.total_time,
gc_alloc_count(gcdiff), inner_compiletime, toplevel_compiletime)

println()
val
end
end

function _time_expr(ex)
Expr(:toplevel, quote
while false; end # compiler heuristic: compile this block (alter this if the heuristic changes)
local stats = gc_num()
local compile_elapsedtime = cumulative_compile_time_ns()
local compiletime = cumulative_compile_time_ns()
local elapsedtime = time_ns()
local val = $(esc(ex))
elapsedtime = time_ns() - elapsedtime
compile_elapsedtime = cumulative_compile_time_ns() - compile_elapsedtime
local diff = GC_Diff(gc_num(), stats)
time_print(elapsedtime, diff.allocd, diff.total_time,
gc_alloc_count(diff), compile_elapsedtime)
println()
val
end
compiletime = cumulative_compile_time_ns() - compiletime
local gcdiff = GC_Diff(gc_num(), stats)
elapsedtime, compiletime, gcdiff, val
end)
end



"""
@timev
Expand Down

0 comments on commit 194e4b4

Please sign in to comment.