Add top-level compile time to at-time report.

As discussed in the comments for JuliaLang#37678, the `% compilation time` metric added to `@time` is not an accurate measurement of the total compilation time. It is in fact excluding any top-level compilation time, which may in fact be quite expensive if the body of your block is fully type stable and inferrable. This change adds a final print to the report for the top-level compilation time that was spent compiling your block. For example: ```julia julia> @eval function f(a) $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end) $(quote $([:(a = a + a * 10 + (a > 0 ? $i : 0.2)) for i in 1:100]...) end) return a end f (generic function with 1 method) julia> @time f(2) 0.000000 seconds + 0.474445 seconds top-level compilation time -2.53151813758841e205 julia> x = 2 2 julia> @time f(x) # Inferring with a non-const is slower, but does not happen at top-level, since x is a global whose type cannot be known until runtime. 1.212882 seconds (129.51 k allocations: 4.811 MiB, 100.00% compilation time) -2.53151813758841e205 julia> @time f(x) # But now, subsequent invocations have no compilation time since it's already compiled. 0.000006 seconds (1 allocation: 16 bytes) -2.53151813758841e205 ``` Without this change, the `@time f(2)` call, would have reported `0.0 seconds` with no other context. The following example shows both types of compilation time reported: ```julia julia> @time peakflops() 0.803850 seconds (2.26 M allocations: 181.699 MiB, 4.90% gc time, 84.81% compilation time) + 0.017840 seconds top-level compilation time 1.5840888414049457e11 ``` ---- This change is fairly complex. Here's why: The trouble with the `@time` macro is that the user's code is compiled as _part of_ compiling the enclosing block, so the naive implementation of this change would report _every invocation_ as having some compilation time. To address this, we use a heuristic: we record the time to compile an _empty block_ that simply times an empty statement, and subtract that empty block compilation time from the time to compile the user's block, to get the true user compilation time. We also subtract the inner, dynamic compilation time (the number used to report the percentage), to get the top-level compilation time. Finally, we only report the top-level compilation time if it is suffiently larger than the empty block's compilation time by a ratio threshold, to prevent reporting compilation time due to noisy variance. It's annoyingly tricky, but we think it works, and achieves the best of all the above worlds regarding reporting these metrics. :) The user won't be fooled by `@time` reporting a much shorter time than the actual wall time because we actually spent 5 min inferring the code (this has happened to me in real life, and confused me for days). And the user _also_ won't be confused to see top-level compilation time where they don't expect it. And the total runtime reported remains only the time to execute their function, which is what they likely expect. It's complex, but we think this is a nice balance. :) Co-Authored-By: Valentin Churavy <[email protected]>
NHDaly · Nov 28, 2020 · 194e4b4 · 194e4b4
1 parent 1f70c06
commit 194e4b4
Showing 1 changed file with 44 additions and 9 deletions.
diff --git a/base/timing.jl b/base/timing.jl
@@ -114,7 +114,7 @@ function format_bytes(bytes) # also used by InteractiveUtils
     end
 end
 
-function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0)
+function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0, outer_compile_time=0)
     timestr = Ryu.writefixed(Float64(elapsedtime/1e9), 6)
     length(timestr) < 10 && print(" "^(10 - length(timestr)))
     print(timestr, " seconds")
@@ -142,6 +142,10 @@ function time_print(elapsedtime, bytes=0, gctime=0, allocs=0, compile_time=0)
         print(Ryu.writefixed(Float64(100*compile_time/elapsedtime), 2), "% compilation time")
     end
     parens && print(")")
+    if outer_compile_time > 0
+        print(" + ")
+        print(Ryu.writefixed(Float64(outer_compile_time/1e9), 6), " seconds top-level compilation time")
+    end
     nothing
 end
 
@@ -195,21 +199,52 @@ julia> @time begin
 """
 macro time(ex)
     quote
+        # First, measure how long it takes to compile the @time block without a body
+        local empty_body_compiletime = cumulative_compile_time_ns()
+        $(_time_expr(nothing))
+        empty_body_compiletime = cumulative_compile_time_ns() - empty_body_compiletime
+
+        # Then, run the real code, and measure the compilation time there, and subtract out
+        # the time for just the overhead.
+        local outer_compiletime = cumulative_compile_time_ns()
+        elapsedtime, inner_compiletime, gcdiff, val = $(_time_expr(ex))
+        outer_compiletime = cumulative_compile_time_ns() - outer_compiletime
+
+        # Subtract out the inner (dynamic dispatch) compilation time to get toplevel + overhead.
+        outer_compiletime = outer_compiletime - inner_compiletime
+        # Threshold to display compilation time (because computers have hella variance)
+        overhead_ratio = outer_compiletime / empty_body_compiletime
+        if overhead_ratio > 2.0  # three standard deviations of noise in our experiments
+            # Subtract out the overhead, so we reflect only the true compiletime for user code.
+            toplevel_compiletime = outer_compiletime - empty_body_compiletime
+        else
+            toplevel_compiletime = 0
+        end
+
+        time_print(elapsedtime, gcdiff.allocd, gcdiff.total_time,
+                   gc_alloc_count(gcdiff), inner_compiletime, toplevel_compiletime)
+
+        println()
+        val
+    end
+end
+
+function _time_expr(ex)
+    Expr(:toplevel, quote
         while false; end # compiler heuristic: compile this block (alter this if the heuristic changes)
         local stats = gc_num()
-        local compile_elapsedtime = cumulative_compile_time_ns()
+        local compiletime = cumulative_compile_time_ns()
         local elapsedtime = time_ns()
         local val = $(esc(ex))
         elapsedtime = time_ns() - elapsedtime
-        compile_elapsedtime = cumulative_compile_time_ns() - compile_elapsedtime
-        local diff = GC_Diff(gc_num(), stats)
-        time_print(elapsedtime, diff.allocd, diff.total_time,
-                   gc_alloc_count(diff), compile_elapsedtime)
-        println()
-        val
-    end
+        compiletime = cumulative_compile_time_ns() - compiletime
+        local gcdiff = GC_Diff(gc_num(), stats)
+        elapsedtime, compiletime, gcdiff, val
+    end)
 end
 
+
+
 """
     @timev