-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error messages, stack traces, Line Numbers... #10595
Comments
By the way, 5587ca3 passes all tests on my machine. |
There's a slight chance that the triangle example only happens on windows 10, so I tried to compile the newest master on my windows 8.1 machine. When I start the 5587ca3 build there, it just silently fails. Also, I think I've seen all of these already on 0.3, besides the depwarn, triangle example and the jload stuff. |
When #10591 is fixed, it could be interesting to try with LLVM 3.6 and see whether things are better. |
Hi Simon, mostly just asking for clarification. I suspect there is still a dearth of developers using Windows (although it's been nice having @tkelman championing Julia there). Are you also working on another OS? Also, do you usually build your own julia executables, or do you sometimes also use precompiled versions? If you build your own, when was the last time you did a clean build from scratch (or at least, rebuilt most of the I would suggest that, although it's good to make others aware of your overall user experience, I would suggest opening issues for specific items that don't already have an issue. I do understand that it's often challenging to find a minimal example (I've been there), but until there are more people using and hacking on Julia, it's unlikely there will be many other people who will be able to help much. (Now, if you can make it to JuliaCon and propose a hacking session to address these things...) |
I'm quite aware, that this is not perfect for an issue... But this has build up over a year and I never opened an issue because of this. So I decided that it's better to have an unspecific issue to at least have somewhere to talk about this, than no issue. (The triangle example is quite specific though, maybe I should open an extra issue for that) What counts for you as building from scratch? I always do WARNING: float32(x) is deprecated, use Float32(x) instead.
in float32 at deprecated.jl:29
in anonymous at no file:161
in include_from_node1 at loading.jl:129 So this doesn't help. I'll try on windows 8.1. I also use ubuntu. |
Now that #10591 is fixed I was able to completely recompile Julia from scratch on Windows 8: julia GLPlot\test\runtests.jl:
[...]
WARNING: float32(x) is deprecated, use Float32(x) instead.
in float32 at deprecated.jl:29
in anonymous at no file:161
in include_from_node1 at loading.jl:129
[...] Julia Version 0.4.0-dev+3955
Commit 0d8cec3 (2015-03-21 16:58 UTC)
Platform Info:
System: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3 |
If you're using source builds, and LLVM 3.3, are you deleting sys.dll? That's probably the biggest single contributor to these. The other problem is very very few people understand how to adjust the backtrace code, and even fewer for the Windows-specific parts of it - pretty much just Jameson and Keno, neither of whom actually use Windows. If the internals were commented better, maybe the slightly larger number of people who at least know how to build things and fix small bugs could figure more of it out. Until this happens, this is going to be a moving target. Jameson in particular has done a huge amount of work on the Windows backtrace code, across different versions of LLVM and Julia. Without more test cases for problematic code, changes here (which are virtually always made with good intentions, trying to improve matters) can certainly cause regressions in other places. We could use a more comprehensive test suite for backtraces, especially across different platforms. If a lot of the code you're working with relies on packages and large binary dependencies, then the best thing you can do is set up AppVeyor for your packages and keep a close eye out for any backtrace regressions (perhaps by setting up regularly scheduled daily builds, whether or not you've made any changes to the package code). |
The julia> stagedfunction exprerror(x::Int)
:(x + y)
end
julia> exprerror(5)
ERROR: UndefVarError: y not defined
in anonymous at no file |
I have no doubt about this! I didn't want to sound aggressive here, I just tried to describe the state. As I don't have real data and just subjective, accumulated memories for this, it probably sounds more like a rant ;) @timholy |
Okay, I renamed sys.dll into sys.dll.backup no I get this: WARNING: float32(x) is deprecated, use Float32(x) instead.
in depwarn at deprecated.jl:40
in float32 at deprecated.jl:29
in anonymous at no file:161
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in reload_path at loading.jl:153
in _require at loading.jl:68
in require at loading.jl:54
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in reload_path at loading.jl:153
in _require at loading.jl:68
in require at loading.jl:54
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in reload_path at loading.jl:153
in _require at loading.jl:68
in require at loading.jl:54
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in reload_path at loading.jl:153
in _require at loading.jl:68
in require at loading.jl:54
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in reload_path at loading.jl:153
in _require at loading.jl:68
in require at loading.jl:51
in include at boot.jl:250
in include_from_node1 at loading.jl:129
in process_options at client.jl:318
in _start at client.jl:402 |
to clarify that, llvm 3.3 didn't know how to emit unwind tables for windows so we generally can't get backtraces (except when we guess and get lucky) for functions in sys.dll. llvm 3.5 included that the necessary support, so it will be really nice to be able to change to the newer version of llvm. |
Regarding issues with wrong line numbers in backtraces, I don't have anything to contribute to the Julia core, but in case someone else views this thread, they may be interested in the following work-around for Julia users: |
actually not yet fixed by #14623, running |
i've looked into this more and realized that llvm emits corrupt dwarf debug information for WinCoff. this is a known, old bug in clang as well. |
from my reading of the dwarf standard and observing gdb and llvm-dwarfdump's representations of this data, i believe the following patch will fix https://llvm.org/bugs/show_bug.cgi?id=15393 diff -ru llvm-3.7.1.src/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp llvm-3.7.1/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp
--- llvm-3.7.1.src/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp 2015-06-23 05:49:53.000000000 -0400
+++ llvm-3.7.1/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp 2016-02-10 01:38:42.470693900 -0500
@@ -55,6 +55,7 @@
case X86::reloc_riprel_4byte_movq_load:
return COFF::IMAGE_REL_AMD64_REL32;
case FK_Data_4:
+ return COFF::IMAGE_REL_AMD64_SECREL;
case X86::reloc_signed_4byte:
if (Modifier == MCSymbolRefExpr::VK_COFF_IMGREL32)
return COFF::IMAGE_REL_AMD64_ADDR32NB;
diff --git a/src/codegen.cpp b/src/codegen.cpp
index 6107f1a..c3cc6ff 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -939,7 +939,7 @@ static Value *getModuleFlag(Module *m, StringRef Key)
static void jl_setup_module(Module *m)
{
if (!getModuleFlag(m,"Dwarf Version"))
- m->addModuleFlag(llvm::Module::Warning, "Dwarf Version",2);
+ m->addModuleFlag(llvm::Module::Warning, "Dwarf Version",4);
#ifdef LLVM34
if (!getModuleFlag(m,"Debug Info Version"))
m->addModuleFlag(llvm::Module::Error, "Debug Info Version",
this gives proper line numbers for precompiled code. next step is to figure out why llvm still hasn't been emitting our unwind tables... |
sorry, llvm was emitting unwind tables just fine. i just did a sloppy job with the patch. |
diff -rpu llvm-3.7.1.src/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp llvm-3.7.1/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp
--- llvm-3.7.1.src/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp 2015-06-23 05:49:53.000000000 -0400
+++ llvm-3.7.1/lib/Target/X86/MCTargetDesc/X86WinCOFFObjectWriter.cpp 2016-02-10 05:06:57.700825100 -0500
@@ -55,6 +55,9 @@ unsigned X86WinCOFFObjectWriter::getRelo
case X86::reloc_riprel_4byte_movq_load:
return COFF::IMAGE_REL_AMD64_REL32;
case FK_Data_4:
+ if (Modifier == MCSymbolRefExpr::VK_COFF_IMGREL32)
+ return COFF::IMAGE_REL_AMD64_ADDR32NB;
+ return COFF::IMAGE_REL_AMD64_SECREL;
case X86::reloc_signed_4byte:
if (Modifier == MCSymbolRefExpr::VK_COFF_IMGREL32)
return COFF::IMAGE_REL_AMD64_ADDR32NB; |
Yay! It's still missing the type signature, but the backtraces aren't |
Okay I feel like I'm still fighting with a lot of line number issues in 0.5, 0.4 linux and windows alike. macro test(x)
if x.head == :(=)
return :()
elseif x.head == :(.)e # errors here
return :()
end
return :()
end
begin
@test a = b
@test a.b
end 0.4:
Note that this is the line number after the actual begin block.
it's a bit better, but still not that great. Especially, since even without the block, it still doesn't report the line number in the macro, and the macro can be arbitrarily deep in the stack trace but it will only report one line number...
and
|
that's unrelated: it happens because the error doesn't get thrown directly, but instead gets unwound through flisp and is thrown directly by eval:
|
unrelated to what? |
sorry, i meant that its worth opening a new issue. this one is targeted specifically at windows recording errors and that additional observation would get lost. |
I opened this issue to be a collection of different issues, but I can surely open a new one :) |
it's easier to track via labels where each issue is separate. a meta issue is really hard to close, since it's unclear what it needs different from the sum of the individual issues. |
makes sense! |
Here is another example: function sumit{T}(arr::AbstractArray{T})
val = zero(T)
for i=1:length(arr)
val += val[i] # this should have been arr. Oops!
end
return val
end
rng = 1:10000
rng_arr = collect(rng)
@time sumit(rng)
@time sumit(rng_arr)
println("final results:")
@time sumit(rng)
@time sumit(rng_arr) And the error I get is: ERROR: LoadError: BoundsError
in sumit at /tmp/tmp.jl:8 # this looks correct
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:304
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading /tmp/tmp.jl, in expression starting on line 155 # line 155??? version:
|
it's picking up the line number for
this might be fixed by #14949? |
@vtjnash what's the status/plan for fixing this the rest of the way for 0.5? Windows users will greatly appreciate better startup time, ideally without having to lose type signatures from backtraces if possible. |
Since this isn't release blocking (it should be fixable without breaking anything), I was considering fixing this whenever 0.5-rc1 is announced. |
Key word should - all things need testing, the longer this is on master before release the better. |
fixed |
Still no type info on win32. No more fixed there than it was in February.
|
One of your recent PR's is also making win64 bootstrap take 12+ GB of memory while building sys.o, so something's very broken. Will have to bisect to find which. |
Apologies, the massive memory usage was caused by something I had locally on a branch that I thought was harmless but apparently wasn't. Looks like win64 is fixed but win32 is not. |
I would care, but nobody uses win32 |
Not true actually, it's about a third of the win64 downloads. |
hm, it looks like |
#17251 covers the remaining issue, I believe |
so that JuliaLang#10595 does not regress, and JuliaLang#17251 is tracked
I have the feeling all these three things get worse and worse.
I've actually caught myself ignoring the error messages and just right away look for any error in the code myself. So it seems that my subconscious has decided that it's not worth looking at the stack trace...
It's always hard to report the errors, because I almost never manage to create a minimal example (and normally don't want to waste more time, after searching for a bug without a debugger, without correct line numbers and without the correct error message).
My gut feeling reports me the following statistic:
LoadError: LoadError: LoadError:
, then one page ofjload(something), jload(something), jload(something)
. This is from my memory, as I wasn't able to reproduce it right now.... in anonymous
without any further stack traceAlso there is this file https://gist.github.com/SimonDanisch/7e9694b05e00c79f716c#file-triangle-jl, which runs at commit 6d68792, but not with 2 days old master and not with 5587ca3.
It instead got me into a great Odyssey, where every single error was silent. I've fixed 2 so far, and now I don't want to further waste my time on this.
First was removed by turning
Dict(:a => ..., :b => ..., ..., ...)
intodict[:a] = ..., dict[:b] = ...
inGLWindow/reactglfw.jl
increatewindow
.The next I didn't really find, but it seemed that it was in
createcontextinfo
inGLAbstraction/src/GLInit.jl
, so I just commented it out (I tried removing all deprecation warnings first).The one I gave up on was at
indexbuffer(GLuint[0,1,2])
.Related error was on 6d68792, where a call to
TemplateProgram
with the wrong arguments simply silently failed.There is something really wrong here :D
I'm not sure if this happens because I aggregated a lot of weird code, or if it is a lot worse on windows, but it's getting really frustrating!
Probably related:
The text was updated successfully, but these errors were encountered: