Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segmentation fault on 1.0 (and 1.1 and 0.7) #28536

Closed
kafisatz opened this issue Aug 9, 2018 · 14 comments · Fixed by #29015
Closed

segmentation fault on 1.0 (and 1.1 and 0.7) #28536

kafisatz opened this issue Aug 9, 2018 · 14 comments · Fixed by #29015
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code GC Garbage collector

Comments

@kafisatz
Copy link

kafisatz commented Aug 9, 2018

My tests fail due to a segfault.

I am not sure if I am doing 'something unsafe', but I am pretty sure the tests were almost identical on 0.7beta where things went fine.

https://discourse.julialang.org/t/segmentation-fault-on-0-7/13065/2

any ideas?

the error occurs on 0.7, 1.0 and 1.1 too
https://travis-ci.org/kafisatz/DecisionTrees.jl/builds/415879916

@mauro3
Copy link
Contributor

mauro3 commented Aug 9, 2018

Can you provide a minimal working example? (Ideally without depending on your package) Thanks!

The link to the segfault: https://travis-ci.org/kafisatz/DecisionTrees.jl/jobs/412500251#L974

@kafisatz
Copy link
Author

I cannot, as I am not sure how I would approach that.
It seems to me it is line 43 of this function https://github.com/kafisatz/DecisionTrees.jl/blob/master/test/smoketests.jl#L43
that is causing the issue though.
I.e. when the function dtm is called, the segfault occurs (prinln commands implied that at least)

@jacobadenbaum
Copy link

jacobadenbaum commented Aug 10, 2018

I don't know if this is the same segfault issue and I don't mean to hijack this issue (tell me if I should open a separate issue instead, I just didn't want to file a duplicate) but I had something similar crop up for me as well and the return before julia crashes looks almost the same. I have a MWE(-ish). Sorry it's a bit long, but I've pared it down quite a bit from the original code and I've tried a few different ways to make it simpler, but if I do the segfault disappears.

using Formatting, DataStructures

########################################################################
#################### Formatted Numbers #################################
########################################################################
abstract type FormattedNumber{T} end

mutable struct FNum{T} <: FormattedNumber{T}
    val::T
    star::Int
    format::String
end
function FormattedNumber(val::T, format::String=default_fmt(T)) where T
    return FNum(val, 0, format)
end
default_fmt(::Type{Float64}) = "{:.2f}"
FormattedNumber(x::FormattedNumber) = x

Base.show(io::IO, x::FNum) = print(io, format(x.format, x.val))
value(x::FormattedNumber)   = format(x.format, x.val)
se(x::FNum)                 = ""

########################################################################
#################### Table Index #######################################
########################################################################
Idx{N}      = NTuple{N, Int}
Name{N}     = NTuple{N, Symbol}
Printable   = Union{String, Symbol}

struct TableIndex{N}
    idx::Idx{N}
    name::Name{N}
end

TableIndex(idx::Integer, name::Printable) = begin
    TableIndex(tuple(idx), tuple(Symbol(name)))
end
TableIndex(name::Printable) = TableIndex(1, name)

TableDict{N, T} = OrderedDict{TableIndex{N}, T} where T <: FormattedNumber

########################################################################
#################### Table Col #########################################
########################################################################
mutable struct TableCol{N,M}
    header::TableIndex{M}
    data::TableDict{N, FormattedNumber}
end

function TableCol(header::Printable, kv::AbstractDict)
    pairs = collect(TableIndex(i, key)=>FormattedNumber(value)
                    for (i, (key, value)) in enumerate(kv))
    TableCol(header,
             OrderedDict{TableIndex{1}, FormattedNumber}(pairs))
end

function TableCol(header::Printable, kv::TableDict{N,T}) where
    {N,T<:FormattedNumber}
    return TableCol(TableIndex(header),
                    convert(TableDict{N, FormattedNumber}, kv))
end

########################################################################
#################### Get Values ########################################
########################################################################
function get_vals(x::FormattedNumber)
    val     = value(x)
    seval   = se(x)
    star    = "*"^x.star
    return val, seval, star
end

function get_vals(col::TableCol, x::TableIndex, backup="")
    if  x in keys(col.data)
        val, seval, star = get_vals(col.data[x])
    else
        val     = backup
        seval   = ""
        star    = ""
    end
    return  val, seval, star
end

########################################################################
##################### The Segfaulting Bug ##############################
########################################################################

col =  TableCol("Test", OrderedDict(Symbol(:key, i)=>randn() for i=1:10))
key =  TableIndex{1}((1,), (:key1,))
get_vals(col, key)

A couple things that are weird here:

  1. The bug seems quite fragile. Like, if I change the definition of get_vals(::TableCol, ::TableIndex) to have the return statement inside the if-else clause, it goes away:
function get_vals(col::TableCol, x::TableIndex, backup="")
    if  x in keys(col.data)
        val, seval, star = get_vals(col.data[x])
        return  val, seval, star
    else
        val     = backup
        seval   = ""
        star    = ""
        return  val, seval, star
    end
end
  1. This code actually won't run on 1.0 (since Formatting isn't fully upgraded yet), but while the first version of get_vals segfaults even on 1.0, the second version reaches the error. So this looks like its segfaulting at the compile stage, rather than at runtime...

@kafisatz
Copy link
Author

kafisatz commented Aug 11, 2018

The more visibility this issue gets (even if yours may not be identical), the better ->please tag along.

Thanks for the MWE. Interesting observations...

I do not know anything about segfaults, but I note that Keno recently fixed an issue related to garbage collection. #28445 I am not sure if this is related though.

@kafisatz kafisatz changed the title segmentation fault on 0.7 segmentation fault on 1.0 (and 1.1 and 0.7) Aug 14, 2018
@JeffBezanson
Copy link
Member

With LLVM assertions enabled, DecisionTrees tests print:

Use of %693 does not have a corresponding definition on every path:
896r %1029:vr128 = PSHUFDri %693, 78; VR128:%1029,%693
LLVM ERROR: Use not jointly dominated by defs.

@JeffBezanson
Copy link
Member

I get a very similar error from the case in #28536 (comment) .

@kafisatz
Copy link
Author

@JeffBezanson Thank you for looking into this. However, it is not clear to me how I can further dig into this. What does %693 refer to? I am not sure if (or how) I can enable LLVM assertions on windows and I do not currently have a Linux machine handy. Can I add a few lines to the Travis script to enable these assertions and get an LLVM debug log?

@vtjnash vtjnash added the upstream The issue is with an upstream dependency, e.g. LLVM label Aug 14, 2018
@vtjnash
Copy link
Member

vtjnash commented Aug 14, 2018

That output form is from LLVM's MIR passes. Since the IR domination was valid according to the IR Validation passes, it seems likely this is an upstream bug, so marking as such.

@vtjnash vtjnash added the bug Indicates an unexpected problem or unintended behavior label Aug 14, 2018
kafisatz pushed a commit to kafisatz/DecisionTrees.jl that referenced this issue Aug 15, 2018
kafisatz pushed a commit to kafisatz/DecisionTrees.jl that referenced this issue Aug 15, 2018
@kafisatz
Copy link
Author

I was able to fix the segfault in my code (see this commit kafisatz/DecisionTrees.jl@6e35c57 )
I had a begin end block which I have now removed (possibly it led to some scoping issues...?)

@JeffBezanson
Copy link
Member

Since it's probably an LLVM bug, that change is not necessarily connected in a meaningful way. Glad you have a workaround though.

@Keno
Copy link
Member

Keno commented Aug 29, 2018

Let me try to reduce the upstream error, so we can file a reproducer with them.

@Keno
Copy link
Member

Keno commented Aug 29, 2018

I do actually see a verifier error here, so this may be on us yet:

Instruction does not dominate all uses!
  %4118 = phi <2 x %jl_value_t addrspace(10)*> [ %1744, %L2956 ], [ %3341, %6621 ]
  %96 = extractelement <2 x %jl_value_t addrspace(10)*> %4118, i32 0
Instruction does not dominate all uses!
  %4118 = phi <2 x %jl_value_t addrspace(10)*> [ %1744, %L2956 ], [ %3341, %6621 ]
  %98 = extractelement <2 x %jl_value_t addrspace(10)*> %4118, i32 1

@Keno
Copy link
Member

Keno commented Aug 29, 2018

Bugpoint reduced test case: https://gist.github.com/Keno/1b1ba537a77d11bcf8cf991edf203edb

@Keno
Copy link
Member

Keno commented Aug 29, 2018

Looks to be a bug in gcframe lowering.

@Keno Keno added compiler:codegen Generation of LLVM IR and native code GC Garbage collector and removed upstream The issue is with an upstream dependency, e.g. LLVM labels Aug 29, 2018
Keno added a commit that referenced this issue Sep 2, 2018
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536
Keno added a commit that referenced this issue Sep 3, 2018
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536
KristofferC pushed a commit that referenced this issue Sep 3, 2018
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536

(cherry picked from commit b1dac9f)
KristofferC pushed a commit that referenced this issue Sep 8, 2018
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536

(cherry picked from commit b1dac9f)
KristofferC pushed a commit that referenced this issue Sep 8, 2018
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536

(cherry picked from commit b1dac9f)
KristofferC pushed a commit that referenced this issue Feb 11, 2019
Support for vectors of tracked pointer was incomplete in the GC placement
pass. Try to fix as many cases as possible and add some tests. A refactor
to make all of this nicer (vectors weren't originally part of the implementation
might be good), but for now, let's get it correct first.

Fixes #28536

(cherry picked from commit b1dac9f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code GC Garbage collector
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants