Segmentation fault on a linear regression model (arm & x86 chips) #500

freddycct · 2022-10-10T16:32:08Z

There are mainly two issues here.

Segmentation fault
autodiff messes up the mapreduce functions. This is less of an issue because we can avoid using mapreduce.

This is the MWE

using Lux
using Enzyme
using Random
using Optimisers

function lossMapReduce(model, X, Y, ps, st)
	# this doesn't work
	mapreduce(+, zip(X, Y)) do (x,y)
		yhat = Lux.apply(model, x, ps, st)[1]
		(yhat[1] - y)^2
	end
end

function loss(model, X, Y, ps, st)
	# this works but causes segmentation fault after running for a few times
	ll = 0.0f0
	for (x, y) in zip(X, Y)
		yhat = Lux.apply(model, x, ps, st)[1]
		ll += (yhat[1] - y)^2
	end
	return ll
end

function generateToyData(rng, K, N)
	W = randn(rng, Float32, K) # this is the "real" parameters
	b = randn(rng, Float32)    # bias
	
	X = map(x -> rand(rng, Float32, K), 1:N) # features
	Y = map(x -> W' * x + b, X) # ground truth
	return X, Y, W, b
end

function main()
	rng = Random.default_rng()
	Random.seed!(rng, 0)
	
	# lux specific codes
	model = Dense(16, 1)
	ps, st = Lux.setup(rng, model) # ps is the parameters, st is the state

	# generate some toy data
	X, Y, W, b = generateToyData(rng, 16, 1000)

	# setup optimisers
	optRule = Optimisers.Adam()
	optState = Optimisers.setup(optRule, ps)  # optimiser state based on model parameters

	# println("ps = ", ps)
	totalLoss = loss(model, X, Y, ps, st)
	println("0/100: loss = $(totalLoss)")

	totalEpochs1 = 200
	totalEpochs2 = totalEpochs1 + 10

	# this causes a segmentation fault after some 50+ epochs
	for epoch=1:totalEpochs1
		# zero the cache
		grads = Lux.fmap(zero, ps)

		# calculate gradients
		autodiff(Reverse, loss, Active, Const(model), Const(X), Const(Y), Duplicated(ps, grads), Const(st))

		# gradient update using adam optimizer
		optState, ps = Optimisers.update!(optState, ps, grads)

		totalLoss = loss(model, X, Y, ps, st)
		println("$(epoch)/$(totalEpochs1): loss = $(totalLoss)")
	end

	# this uses a different loss function (observe that the loss doesn't reduce)
	for epoch=totalEpochs1+1:totalEpochs2
		# zero the cache
		grads = Lux.fmap(zero, ps)

		# calculate gradients
		autodiff(Reverse, lossMapReduce, Active, Const(model), Const(X), Const(Y), Duplicated(ps, grads), Const(st))

		# gradient update using adam optimizer
		optState, ps = Optimisers.update!(optState, ps, grads)

		totalLoss = lossMapReduce(model, X, Y, ps, st)
		println("$(epoch)/$(totalEpochs2): lossMapReduce = $(totalLoss)")
	end
end

main()

Here's the error msgs:
On Apple Silicon

signal (11): Segmentation fault: 11
in expression starting at /Users/freddy/enzyme_demo.jl:80
gc_setmark_pool_ at /Users/freddy/apps/julia/src/gc.c:0 [inlined]
gc_setmark_pool at /Users/freddy/apps/julia/src/gc.c:827 [inlined]
gc_setmark at /Users/freddy/apps/julia/src/gc.c:834 [inlined]
gc_mark_loop at /Users/freddy/apps/julia/src/gc.c:2771
_jl_gc_collect at /Users/freddy/apps/julia/src/gc.c:3098
ijl_gc_collect at /Users/freddy/apps/julia/src/gc.c:3327
maybe_collect at /Users/freddy/apps/julia/src/gc.c:903 [inlined]
jl_gc_pool_alloc_inner at /Users/freddy/apps/julia/src/gc.c:1247 [inlined]
jl_gc_pool_alloc_noinline at /Users/freddy/apps/julia/src/gc.c:1306
jl_gc_alloc_ at /Users/freddy/apps/julia/src/./julia_internal.h:369 [inlined]
ijl_box_int64 at /Users/freddy/apps/julia/src/datatype.c:1181
Allocations: 72478261 (Pool: 72422195; Big: 56066); GC: 34

On x86

signal (11): Segmentation fault
in expression starting at /home/freddy/enzyme_demo.jl:86
page_metadata at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.h:450 [inlined]
gc_setmark_pool at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:827 [inlined]
gc_setmark at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:834 [inlined]
gc_mark_loop at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:2771
_jl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:3098
ijl_gc_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:3327
maybe_collect at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:903 [inlined]
jl_gc_pool_alloc_inner at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:1247 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:1306 [inlined]
jl_gc_alloc_ at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/julia_internal.h:369 [inlined]
jl_gc_alloc at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/gc.c:3372
_new_array_ at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/array.c:134 [inlined]
_new_array at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/array.c:198 [inlined]
ijl_alloc_array_1d at /cache/build/default-amdci4-6/julialang/julia-release-1-dot-8/src/array.c:436
unknown function (ip: 0x7f9ecb4b1ff3)
Allocations: 70127916 (Pool: 70074613; Big: 53303); GC: 32
Segmentation fault

The text was updated successfully, but these errors were encountered:

vchuravy · 2022-10-11T01:32:03Z

I am trying to reproduce this, but for future reference it would be ideal if the MWE would be smaller and use less external packages.

freddycct · 2022-10-11T04:57:52Z

This is the smallest I can get. I hope it helps.

using Enzyme

function loss(X, Y, ps, bs)
	ll = 0.0f0
	for (x, y) in zip(X, Y)
		yhat = ps * x .+ bs
		ll += (yhat[1] - y)^2
	end
	return ll
end

function main()
	ps = randn(Float32, (1, 5))
	bs = randn(Float32)

	X = map(x->rand(Float32, 5), 1:1000)
	Y = map(x->rand(Float32), 1:1000)

	grads = zero(ps)
	for epoch=1:1000
		println("$(epoch)")
		fill!(grads, 0)
		autodiff(Reverse, loss, Const(X), Const(Y), Duplicated(ps, grads), Active(bs))
	end

end

main()

vchuravy · 2022-10-11T18:57:27Z

Thanks that was very helpful.

(rr) p jl_(vt)
Enzyme.Compiler.EnzymeTape{1024, NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5")), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7")), Tuple{Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}, UInt64, UInt8, UInt32, Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}}}, Any, Any, UInt64, Bool}}}
(rr) p vt->size
$9 = 8

That's clearly off, and it should have hit an assert to begin with.

vchuravy · 2022-10-11T18:59:40Z

NT = NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5")), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7")), Tuple{Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}, UInt64, UInt8, UInt32, Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}}}, Any, Any, UInt64, Bool}}
NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5")), Tuple{NamedTuple{(Symbol("1"), Symbol("2"), Symbol("3"), Symbol("4"), Symbol("5"), Symbol("6"), Symbol("7")), Tuple{Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}, UInt64, UInt8, UInt32, Core.LLVMPtr{Float32, 0}, Core.LLVMPtr{Float32, 0}}}, Any, Any, UInt64, Bool}}

julia> sizeof(NT)
80

julia> sizeof(NTuple{80, NT})
6400

julia> sizeof(Ref{NTuple{80, NT}}())
6400

But EnzymeTape is off...

vchuravy · 2022-10-11T19:04:38Z

Even worse the allocation site is thinking sz should be 81920

#2  0x00007f866ccc0dc2 in jl_gc_alloc (ptls=0x56286f6cbe70, sz=81920, ty=0x7f864c6355f0)

So no-one agrees what the size of this type should be.

freddycct · 2022-10-12T03:48:37Z

Related issue: #510

wsmoses added the gc garbage collection label Oct 10, 2022

vchuravy mentioned this issue Oct 11, 2022

Throw a runtime error if tape size mismatch is detected #508

Draft

wsmoses mentioned this issue Oct 12, 2022

Fix large dynamic tape issue #509

Merged

wsmoses closed this as completed in #509 Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation fault on a linear regression model (arm & x86 chips) #500

Segmentation fault on a linear regression model (arm & x86 chips) #500

freddycct commented Oct 10, 2022

vchuravy commented Oct 11, 2022

freddycct commented Oct 11, 2022

vchuravy commented Oct 11, 2022

vchuravy commented Oct 11, 2022

vchuravy commented Oct 11, 2022

freddycct commented Oct 12, 2022

Segmentation fault on a linear regression model (arm & x86 chips) #500

Segmentation fault on a linear regression model (arm & x86 chips) #500

Comments

freddycct commented Oct 10, 2022

vchuravy commented Oct 11, 2022

freddycct commented Oct 11, 2022

vchuravy commented Oct 11, 2022

vchuravy commented Oct 11, 2022

vchuravy commented Oct 11, 2022

freddycct commented Oct 12, 2022