-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove LogDensityProblemsAD #806
base: release-0.35
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release-0.35 #806 +/- ##
================================================
- Coverage 85.78% 85.60% -0.19%
================================================
Files 36 36
Lines 4207 4195 -12
================================================
- Hits 3609 3591 -18
- Misses 598 604 +6 ☔ View full report in Codecov by Sentry. |
5aafaf0
to
5be363f
Compare
Pull Request Test Coverage Report for Build 13243689609Details
💛 - Coveralls |
567e087
to
8e22c05
Compare
BenchmarksCodeNew version -- run on this PR using Test, DynamicPPL, ADTypes, LogDensityProblems, Chairmarks, StatsBase, Random
import ForwardDiff
import ReverseDiff
import Mooncake
cmarks = []
for m in DynamicPPL.TestUtils.DEMO_MODELS
vi = DynamicPPL.VarInfo(Xoshiro(468), m)
f = DynamicPPL.LogDensityFunction(m, vi)
θ = convert(Vector{Float64}, vi[:])
for adtype in [AutoForwardDiff(), AutoReverseDiff(; compile=false), AutoReverseDiff(; compile=true), AutoMooncake(; config=nothing)]
t = @be LogDensityProblems.logdensity_and_gradient($f, $θ, $adtype) evals=1
push!(cmarks, (string(m.f) * "," * string(adtype), t))
end
end
for cmark in cmarks
println("$(cmark[1]),$(median(cmark[2]).time)")
end Old version -- run on master branch using Test, DynamicPPL, ADTypes, LogDensityProblems, LogDensityProblemsAD, Chairmarks, StatsBase, Random
import ForwardDiff
import ReverseDiff
import Mooncake
import DifferentiationInterface
cmarks = []
for m in DynamicPPL.TestUtils.DEMO_MODELS
vi = DynamicPPL.VarInfo(Xoshiro(468), m)
f = DynamicPPL.LogDensityFunction(m, vi)
θ = convert(Vector{Float64}, vi[:])
for adtype in [AutoForwardDiff(), AutoReverseDiff(; compile=false), AutoReverseDiff(; compile=true), AutoMooncake(; config=nothing)]
t = @be LogDensityProblems.logdensity_and_gradient(DynamicPPL._make_ad_gradient($adtype, $f), $θ) evals=1
push!(cmarks, (string(m.f) * "," * string(adtype), t))
end
end
for cmark in cmarks
println("$(cmark[1]),$(median(cmark[2]).time)")
end SummaryForwardDiff and Mooncake have pretty much identical performance to before. This PR makes non-compiled ReverseDiff 1.2x slower, and compiled ReverseDiff 2x faster. Unclear why. This is generally true across all demo models. Results
|
# By default, the AD backend to use is inferred from the context, which would | ||
# typically be a SamplingContext which contains a sampler. | ||
function LogDensityProblems.logdensity_and_gradient( | ||
f::LogDensityFunction, θ::AbstractVector | ||
) | ||
adtype = getadtype(getsampler(getcontext(f))) | ||
return LogDensityProblems.logdensity_and_gradient(f, θ, adtype) | ||
end | ||
function LogDensityProblemsAD.ADgradient(ad::ADTypes.AutoReverseDiff, f::LogDensityFunction) | ||
return _make_ad_gradient(ad, f) | ||
|
||
# Extra method allowing one to manually specify the AD backend to use, thus | ||
# overriding the default AD backend inferred from the sampler. | ||
function LogDensityProblems.logdensity_and_gradient( | ||
f::LogDensityFunction, θ::AbstractVector, adtype::ADTypes.AbstractADType | ||
) | ||
# Ensure we concretise the elements of the params. | ||
θ = map(identity, θ) # TODO: Is this needed? | ||
prep = DI.prepare_gradient(_flipped_logdensity, adtype, θ, DI.Constant(f)) | ||
return DI.value_and_gradient(_flipped_logdensity, prep, adtype, θ, DI.Constant(f)) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I'm not fully convinced with what I've done so far. The code here means that every time we call logdensity_and_gradient
we run DI.prepare_gradient
again, which seems to be inefficient.
I guess the way to get around this would be to call DI.prepare_gradient
once and store that prep inside the LogDensityFunction
, or (IMO better) create a new type LogDensityFunctionPrepped
that wraps the LogDensityFunction plus the prep, and then call logdensity_and_gradient
on that object. That would mean that we've basically reinvented / inlined the LogDensityProblemsAD.ADgradient
type 😄
I'm not opposed to doing that if that's the best way forward, but I wonder if you concur with / see any holes in my assessment @willtebbutt.
Tests are not passing yet - am on itTests pass! 🎉
Now need to benchmark performance before vs after this PR.Benchmarks are in the comment below.
Requires TuringLang/AbstractMCMC.jl#158 to be merged first.
Personally I still have some concerns over this - I'm not sure if
prep
should be moved into the LogDensityFunction struct.