-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inference] limit inference timing recording to NativeInterpreter
only
#49391
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM
Alternatively we can do something like: diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index 1eec73d043..eb88a2081e 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -205,8 +205,7 @@ __set_measure_typeinf(onoff::Bool) = __measure_typeinf__[] = onoff
const __measure_typeinf__ = fill(false)
# Wrapper around _typeinf that optionally records the exclusive time for each invocation.
-function typeinf(interp::AbstractInterpreter, frame::InferenceState)
- interp = switch_from_irinterp(interp)
+function typeinf(interp::NativeInterpreter, frame::InferenceState)
if __measure_typeinf__[]
Timings.enter_new_timer(frame)
v = _typeinf(interp, frame)
@@ -216,6 +215,8 @@ function typeinf(interp::AbstractInterpreter, frame::InferenceState)
return _typeinf(interp, frame)
end
end
+# disable recording timings for external `AbstractInterpreter`s
+typeinf(interp::AbstractInterpreter, frame::InferenceState) = _typeinf(interp, frame)
function finish!(interp::AbstractInterpreter, caller::InferenceResult)
# If we didn't transform the src for caching, we may have to transform
@@ -242,6 +243,7 @@ function finish!(interp::AbstractInterpreter, caller::InferenceResult)
end
function _typeinf(interp::AbstractInterpreter, frame::InferenceState)
+ interp = switch_from_irinterp(interp)
typeinf_nocycle(interp, frame) || return false # frame is now part of a higher cycle
# with no active ip's, frame is done
frames = frame.callers_in_cycle ? |
I thought about that, but I actually want to be able to see how much time it took to infer CUDA code. |
GPUCompiler can implement its own measurement mechanism for that? I'm not opposed to the current approach, but it seems somewhat risky to leave |
The major reason would be that it would be great if SnoopCompile could be extended to custom abstract interpreters. |
I don't know why that would be the case? I would say it is correct? You might have arbitrary interleavings? |
It's possible that there are type unstable parts within some external |
Yeah, I agree. But would it be cleaner if external const gpu_timings = CC.Timings.Timing[]
function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
[ push timing into gpu_timings ]
@invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)
end
macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
function _snoopi_deep_gpu(cmd::Expr)
return quote
start_gpu_deep_timing()
try
$(esc(cmd))
finally
stop_gpu_deep_timing()
end
CC.Timings.InferenceTimingNode(gpu_timings[1])
end
end |
You would want to do both at the same time though...
…On Tue, Apr 18, 2023, 11:56 Shuhei Kadowaki ***@***.***> wrote:
The major reason would be that it would be great if SnoopCompile could be
extended to custom abstract interpreters.
For me that's the next frontier in TTFX. Oceananigans has 10+ minutes of
compilation time.
Yeah, I agree. But would it be cleaner if external AbstractInterpreter
uses SnoopCompile utilities like this?
const gpu_timings = CC.Timings.Timing[]
function CC.typeinf(interp::GPUInterpreter, frame::InferenceState)
[ push timing into gpu_timings ]
@invoke CC.typeinf(interp::AbstractInterpreter, frame::InferenceState)end
macro snoopi_deep_gpu(ex) _snoopi_deep_gpu(ex) end
function _snoopi_deep_gpu(cmd::Expr)
return quote
start_gpu_deep_timing()
try
$(esc(cmd))
finally
stop_gpu_deep_timing()
end
CC.Timings.InferenceTimingNode(gpu_timings[1])
end end
—
Reply to this email directly, view it on GitHub
<#49391 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABDO2WTCOU2UCQKTAQCSTTXB22R3ANCNFSM6AAAAAAXBT4PGE>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
c25de89
to
aff54d5
Compare
Leaving a comment from slack for future reference: Shuhei Kadowaki E.g. julia/base/compiler/typeinfer.jl Lines 165 to 166 in a34261f
_timings[end] , and currently we may get an inference graph like:
, which seems to be a bit tricky to handle? If we isolate
and
|
Allows consumers to separate native inference from inference coming from a custom AbstractInterpreter. Co-authored-by: Shuhei Kadowaki <[email protected]>
This reverts commit 062fa0a.
aff54d5
to
c2411ce
Compare
NativeInterpreter
NativeInterpreter
NativeInterpreter
only
…nly (#49391) The logic of `Core.Compiler.Timings` assumes that the whole recorded inference graph is constructed by the same interpreter, thus we should limit the inference timing recording to `NativeInterpreter` only. External `AbstractInterpreter` can implement its own recording logic, likely by reusing existing `Core.Compiler.Timings` utilities, in a way that does not interfere with the recording for native compilation pipeline. --------- Co-authored-by: Shuhei Kadowaki <[email protected]> (cherry picked from commit 3db036e)
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools.
…reters (#54069) Partially reverts #49391 PrecompileTools uses the timing infrastructure to snoop on the inference process. The reason for #49391 was that this could lead to accidental pollution of the caches with foreign results (timholy/SnoopCompile.jl#338) After #52233 and especially #53336 we now filter results by cache owner and don't try to cache foreign code using the native pipeline. Motivated by JuliaGPU/GPUCompiler.jl#567 which demonstrated that a foreign code instance would not be cached without PrecompileTools. (cherry picked from commit c0611e8)
Allows consumers to separate native inference from inference coming
from a custom AbstractInterpreter.
x-ref: timholy/SnoopCompile.jl#338
I choose
typeof(interp)
since the interp may be arbitrarily heavy and we might not want to hold on to them.