Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sampling with HMC can hang if AD is bugged #2389

Closed
penelopeysm opened this issue Nov 4, 2024 · 3 comments · Fixed by #2392
Closed

Sampling with HMC can hang if AD is bugged #2389

penelopeysm opened this issue Nov 4, 2024 · 3 comments · Fixed by #2392
Labels

Comments

@penelopeysm
Copy link
Member

penelopeysm commented Nov 4, 2024

Minimal working example

using Turing

@model function model1()
    σ ~ InverseGamma(2, 3)
    V ~ truncated(Normal(0, σ), 0, Inf)
end

sample(model1(), NUTS(), 100)

Description

Hangs.

This isn't because of a bug in Turing; it's actually a bug in ForwardDiff, which returns NaN's when calculating the gradient. (This is true of all other currently supported AD backends too, see https://discourse.julialang.org/t/help-cant-get-turing-to-work-on-a-simple-model/122107/2)

Because the gradient is always returned with NaN's, isfinite() on it returns false, and this block goes into an infinite loop:

Turing.jl/src/mcmc/hmc.jl

Lines 176 to 194 in 397d1a7

# If no initial parameters are provided, resample until the log probability
# and its gradient are finite.
if initial_params === nothing
init_attempt_count = 1
while !isfinite(z)
if init_attempt_count == 10
@warn "failed to find valid initial parameters in $(init_attempt_count) tries; consider providing explicit initial parameters using the `initial_params` keyword"
end
# NOTE: This will sample in the unconstrained space.
vi = last(DynamicPPL.evaluate!!(model, rng, vi, SampleFromUniform()))
theta = vi[spl]
hamiltonian = AHMC.Hamiltonian(metric, logπ, ∂logπ∂θ)
z = AHMC.phasepoint(rng, theta, hamiltonian)
init_attempt_count += 1
end
end

It would probably make sense to just error after a sufficiently large number of attempts (not sure about the exact number, but 1000 seems reasonable perhaps?). Alternatively, or additionally, we could also check for NaN's and just directly error if logp or its gradient contains NaN's.

Julia version info

versioninfo()
julia> versioninfo()
Julia Version 1.11.1
Commit 8f5b7ca12ad (2024-10-16 10:53 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, apple-m1)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

Manifest

The relevant parts are:

(ppl) pkg> st
Status `~/ppl/Project.toml`
  [f6369f11] ForwardDiff v0.10.37
  [fce5fe82] Turing v0.35.1

I did paste the whole thing here because I'm the one who made this issue template and I had better abide by it 😄

]st --manifest
(ppl) pkg> st --manifest
Status `~/ppl/Manifest.toml`
  [47edcb42] ADTypes v1.9.0
  [621f4979] AbstractFFTs v1.5.0
  [80f14c24] AbstractMCMC v5.6.0
  [7a57a42e] AbstractPPL v0.9.0
  [1520ce14] AbstractTrees v0.4.5
  [7d9f7c33] Accessors v0.1.38
  [79e6a3ab] Adapt v4.1.1
  [0bf59076] AdvancedHMC v0.6.3
  [5b7e9947] AdvancedMH v0.8.4
  [576499cb] AdvancedPS v0.6.0
  [b5ca4192] AdvancedVI v0.2.10
  [66dad0bd] AliasTables v1.1.3
  [dce04be8] ArgCheck v2.3.0
  [4fba245c] ArrayInterface v7.16.0
  [a9b6321e] Atomix v0.1.0
  [13072b0f] AxisAlgorithms v1.1.0
  [39de3d68] AxisArrays v0.4.7
  [198e06fe] BangBang v0.4.3
  [9718e550] Baselet v0.1.1
⌅ [76274a88] Bijectors v0.13.18
  [fa961155] CEnum v0.5.0
  [082447d4] ChainRules v1.71.0
  [d360d2e6] ChainRulesCore v1.25.0
  [9e997f8a] ChangesOfVariables v0.1.9
  [861a8166] Combinatorics v1.0.2
  [38540f10] CommonSolve v0.2.4
  [bbf7d656] CommonSubexpressions v0.3.1
  [34da2185] Compat v4.16.0
  [a33af91c] CompositionsBase v0.1.2
  [88cd18e8] ConsoleProgressMonitor v0.1.2
  [187b0558] ConstructionBase v1.5.8
  [a8cc5b0e] Crayons v4.1.1
  [9a962f9c] DataAPI v1.16.0
  [864edb3b] DataStructures v0.18.20
  [e2d170a0] DataValueInterfaces v1.0.0
  [244e2a9f] DefineSingletons v0.1.2
  [8bb1440f] DelimitedFiles v1.9.1
  [b429d917] DensityInterface v0.4.0
  [163ba53b] DiffResults v1.1.0
  [b552c78f] DiffRules v1.15.1
  [a0c0ee7d] DifferentiationInterface v0.6.18
  [31c24e10] Distributions v0.25.112
  [ced4e74d] DistributionsAD v0.6.57
  [ffbed154] DocStringExtensions v0.9.3
  [366bfd00] DynamicPPL v0.30.3
  [cad2338a] EllipticalSliceSampling v2.0.0
  [4e289a0a] EnumX v1.0.4
  [e2ba6199] ExprTools v0.1.10
⌅ [6b7a57c9] Expronicon v0.8.5
  [7a1cc6ca] FFTW v1.8.0
  [9aa1b823] FastClosures v0.3.2
  [1a297f60] FillArrays v1.13.0
  [6a86dc24] FiniteDiff v2.26.0
  [f6369f11] ForwardDiff v0.10.37
  [069b7b12] FunctionWrappers v1.1.3
  [77dc65aa] FunctionWrappersWrappers v0.1.3
⌅ [d9f16b24] Functors v0.4.12
⌅ [46192b85] GPUArraysCore v0.1.6
  [34004b35] HypergeometricFunctions v0.3.24
  [22cec73e] InitialValues v0.3.1
  [505f98c9] InplaceOps v0.3.0
  [a98d9a8b] Interpolations v0.15.1
  [8197267c] IntervalSets v0.7.10
  [3587e190] InverseFunctions v0.1.17
  [41ab1584] InvertedIndices v1.3.0
  [92d709cd] IrrationalConstants v0.2.2
  [c8e1da08] IterTools v1.10.0
  [82899510] IteratorInterfaceExtensions v1.0.0
  [692b3bcd] JLLWrappers v1.6.1
  [682c06a0] JSON v0.21.4
  [63c18a36] KernelAbstractions v0.9.29
  [5ab0869b] KernelDensity v0.6.9
  [5be7bae1] LBFGSB v0.4.1
  [929cbde3] LLVM v9.1.3
  [8ac3fa9e] LRUCache v1.6.1
  [b964fa9f] LaTeXStrings v1.4.0
  [1d6d02ad] LeftChildRightSiblingTrees v0.2.0
  [6f1fad26] Libtask v0.8.8
  [d3d80556] LineSearches v7.3.0
  [6fdf6af0] LogDensityProblems v2.1.2
  [996a588d] LogDensityProblemsAD v1.12.0
  [2ab3a3ac] LogExpFunctions v0.3.28
  [e6f89c97] LoggingExtras v1.1.0
  [c7f686f2] MCMCChains v6.0.6
  [be115224] MCMCDiagnosticTools v0.3.10
  [e80e1ace] MLJModelInterface v1.11.0
  [d8e11817] MLStyle v0.4.17
  [1914dd2f] MacroTools v0.5.13
  [dbb5928d] MappedArrays v0.4.2
  [128add7d] MicroCollections v0.2.0
  [e1d29d7a] Missings v1.2.0
  [d41bc354] NLSolversBase v7.8.3
  [872c559c] NNlib v0.9.24
  [77ba4419] NaNMath v1.0.2
  [86f7a689] NamedArrays v0.10.3
  [c020b1a1] NaturalSort v1.0.0
  [6fe1bfb0] OffsetArrays v1.14.1
  [429524aa] Optim v1.9.4
  [3bd65402] Optimisers v0.3.3
  [7f7a1694] Optimization v4.0.5
  [bca83a33] OptimizationBase v2.3.0
  [36348300] OptimizationOptimJL v0.4.1
  [bac558e1] OrderedCollections v1.6.3
  [90014a1f] PDMats v0.11.31
  [d96e819e] Parameters v0.12.3
  [69de0a69] Parsers v2.8.1
  [85a6dd25] PositiveFactorizations v0.2.4
  [aea7be01] PrecompileTools v1.2.1
  [21216c6a] Preferences v1.4.3
  [08abe8d2] PrettyTables v2.4.0
  [33c8b6b6] ProgressLogging v0.1.4
  [92933f4c] ProgressMeter v1.10.2
  [43287f4e] PtrArrays v1.2.1
  [1fd47b50] QuadGK v2.11.1
  [74087812] Random123 v1.7.0
  [e6cf234a] RandomNumbers v1.6.0
  [b3c3ace0] RangeArrays v0.3.2
  [c84ed2f1] Ratios v0.4.5
  [c1ae055f] RealDot v0.1.0
  [3cdcf5f2] RecipesBase v1.3.4
  [731186ca] RecursiveArrayTools v3.27.3
  [189a3867] Reexport v1.2.2
  [ae029012] Requires v1.3.0
  [79098fc4] Rmath v0.8.0
  [f2b01f46] Roots v2.2.1
  [7e49a35a] RuntimeGeneratedFunctions v0.5.13
⌅ [26aad666] SSMProblems v0.1.1
  [0bca4576] SciMLBase v2.58.0
  [c0aeaf25] SciMLOperators v0.3.12
  [53ae85a6] SciMLStructures v1.5.0
  [30f210dd] ScientificTypesBase v3.0.0
  [efcf1570] Setfield v1.1.1
  [ce78b400] SimpleUnPack v1.1.0
  [a2af1166] SortingAlgorithms v1.2.1
  [9f842d2f] SparseConnectivityTracer v0.6.8
  [dc90abb0] SparseInverseSubset v0.1.2
  [0a514795] SparseMatrixColorings v0.4.8
  [276daf66] SpecialFunctions v2.4.0
  [171d559e] SplittablesBase v0.1.15
  [90137ffa] StaticArrays v1.9.8
  [1e83bf80] StaticArraysCore v1.4.3
  [64bff920] StatisticalTraits v3.4.0
  [10745b16] Statistics v1.11.1
  [82ae8749] StatsAPI v1.7.0
  [2913bbd2] StatsBase v0.34.3
  [4c63d2b9] StatsFuns v1.3.2
  [892a3eda] StringManipulation v0.4.0
  [09ab397b] StructArrays v0.6.18
  [2efcf032] SymbolicIndexingInterface v0.3.34
  [3783bdb8] TableTraits v1.0.1
  [bd369af6] Tables v1.12.0
  [5d786b92] TerminalLoggers v0.1.7
  [9f7883ad] Tracker v0.2.35
  [28d57a85] Transducers v0.4.84
  [fce5fe82] Turing v0.35.1
  [3a884ed6] UnPack v1.0.2
  [013be700] UnsafeAtomics v0.2.1
  [d80eeb9a] UnsafeAtomicsLLVM v0.2.1
  [efce3f68] WoodburyMatrices v1.0.0
  [700de1a5] ZygoteRules v0.2.5
  [f5851436] FFTW_jll v3.3.10+1
  [1d5cc7b8] IntelOpenMP_jll v2024.2.1+0
  [dad2f222] LLVMExtra_jll v0.0.34+0
  [81d17ec3] L_BFGS_B_jll v3.0.1+0
  [856f044c] MKL_jll v2024.2.0+0
  [efe28fd5] OpenSpecFun_jll v0.5.5+0
  [f50d1b31] Rmath_jll v0.5.1+0
  [1317d2d5] oneTBB_jll v2021.12.0+0
  [0dad84c5] ArgTools v1.1.2
  [56f22d72] Artifacts v1.11.0
  [2a0f44e3] Base64 v1.11.0
  [ade2ca70] Dates v1.11.0
  [8ba89e20] Distributed v1.11.0
  [f43a241f] Downloads v1.6.0
  [7b1f6079] FileWatching v1.11.0
  [9fa8497b] Future v1.11.0
  [b77e0a4c] InteractiveUtils v1.11.0
  [4af54fe1] LazyArtifacts v1.11.0
  [b27032c2] LibCURL v0.6.4
  [76f85450] LibGit2 v1.11.0
  [8f399da3] Libdl v1.11.0
  [37e2e46d] LinearAlgebra v1.11.0
  [56ddb016] Logging v1.11.0
  [d6f4376e] Markdown v1.11.0
  [a63ad114] Mmap v1.11.0
  [ca575930] NetworkOptions v1.2.0
  [44cfe95a] Pkg v1.11.0
  [de0858da] Printf v1.11.0
  [9a3f8284] Random v1.11.0
  [ea8e919c] SHA v0.7.0
  [9e88b42a] Serialization v1.11.0
  [1a1011a3] SharedArrays v1.11.0
  [6462fe0b] Sockets v1.11.0
  [2f01184e] SparseArrays v1.11.0
  [4607b0f0] SuiteSparse
  [fa267f1f] TOML v1.0.3
  [a4e569a6] Tar v1.10.0
  [8dfed614] Test v1.11.0
  [cf7118a7] UUIDs v1.11.0
  [4ec0a83e] Unicode v1.11.0
  [e66e0078] CompilerSupportLibraries_jll v1.1.1+0
  [deac9b47] LibCURL_jll v8.6.0+0
  [e37daf67] LibGit2_jll v1.7.2+0
  [29816b5a] LibSSH2_jll v1.11.0+1
  [c8ffd9c3] MbedTLS_jll v2.28.6+0
  [14a3606d] MozillaCACerts_jll v2023.12.12
  [4536629a] OpenBLAS_jll v0.3.27+1
  [05823500] OpenLibm_jll v0.8.1+2
  [bea87d4a] SuiteSparse_jll v7.7.0+0
  [83775a58] Zlib_jll v1.2.13+1
  [8e850b90] libblastrampoline_jll v5.11.0+0
  [8e850ede] nghttp2_jll v1.59.0+0
  [3f19e933] p7zip_jll v17.4.0+2
@penelopeysm penelopeysm added the bug label Nov 4, 2024
@devmotion
Copy link
Member

Minor comment (of course doesn't apply to the general problem here): The problem in this specific example is the line

    V ~ truncated(Normal(0, σ), 0, Inf)

One should always use

    V ~ truncated(Normal(0, σ); lower=0)

or

    V ~ truncated(Normal(0, σ), 0, nothing)

Since the latter is less descriptive, I'd only use it if keyword arguments are problematic (e.g. in broadcasting).

@penelopeysm
Copy link
Member Author

@devmotion Indeed, I found that out too when experimenting 😄 While you're here: I hadn't gotten round to reporting the actual NaN gradients:

using DynamicPPL: @model, LogDensityFunction
using Distributions
using LogDensityProblems: logdensity_and_gradient
using LogDensityProblemsAD: ADgradient

@model function model1()
    σ ~ InverseGamma(2, 3)
    V ~ truncated(Normal(0, σ), 0, Inf)
end

import ForwardDiff
ℓ = ADgradient(:ForwardDiff, LogDensityFunction(model1()))
logdensity_and_gradient(ℓ, [1.0, 2.0])
# --> (-3.0285667753085077, [NaN, NaN])

Would you consider this a bug in the AD backends (i.e. we can attempt to minimise the issue and report upstream), or improper usage of Distributions (i.e. maybe the user should be told that they shouldn't do that)?

@devmotion
Copy link
Member

Mainly improper usage of Distributions, I would say. Introducing Inf in calculations makes it very likely that you'll get NaN derivatives, regardless of the AD system.

In the ForwardDiff case, sometimes you can actually get around this problem by switching to NaN-safe mode (https://juliadiff.org/ForwardDiff.jl/dev/user/advanced/#Fixing-NaN/Inf-Issues). But IMO this is only the second-best alternative and doesn't address all Inf/NaN issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants