Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't Compile on CentOS 7.2 #1331

Closed
fribeiro1 opened this issue Oct 17, 2016 · 28 comments
Closed

Can't Compile on CentOS 7.2 #1331

fribeiro1 opened this issue Oct 17, 2016 · 28 comments

Comments

@fribeiro1
Copy link
Contributor

fribeiro1 commented Oct 17, 2016

All compilations w/ 0.5.1-2115.33eefb1 are failing with the "Illegal instruction (core dumped)" message on CentOS 7.2 and VirtualBox 5.0.26 r108824.

Please advise.

@Theodus
Copy link
Contributor

Theodus commented Oct 17, 2016

@fribeiro1 What version of LLVM are you using? Did you build from source or install a prebuilt binary?

@fribeiro1
Copy link
Contributor Author

@Theodus I've installed from the latest ponyc-release package.

@Praetonus
Copy link
Member

Can you give the program you're trying to compile and the output from the compiler?

@fribeiro1
Copy link
Contributor Author

@Praetonus It is actually failing for every program with the same "Illegal instruction (core dumped)" message.

@Praetonus
Copy link
Member

Can you get a debugger stack trace and disassembly of the crash location? Also, what is your CPU?

@fribeiro1
Copy link
Contributor Author

@Praetonus The requested information follows below:

Stack Trace

[root@localhost Desktop]# strace ponyc
execve("/usr/bin/ponyc", ["ponyc"], [/* 43 vars /]) = 0
brk(0) = 0x35a2000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b57000
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=74093, ...}) = 0
mmap(NULL, 74093, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f2a46b44000
close(3) = 0
open("/lib64/libz.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p!\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=90632, ...}) = 0
mmap(NULL, 2183688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a46721000
mprotect(0x7f2a46736000, 2093056, PROT_NONE) = 0
mmap(0x7f2a46935000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f2a46935000
close(3) = 0
open("/lib64/libtinfo.so.5", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\316\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=174520, ...}) = 0
mmap(NULL, 2268928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a464f7000
mprotect(0x7f2a4651c000, 2097152, PROT_NONE) = 0
mmap(0x7f2a4671c000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x25000) = 0x7f2a4671c000
close(3) = 0
open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240l\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=142304, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b43000
mmap(NULL, 2208864, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a462db000
mprotect(0x7f2a462f1000, 2097152, PROT_NONE) = 0
mmap(0x7f2a464f1000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x16000) = 0x7f2a464f1000
mmap(0x7f2a464f3000, 13408, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2a464f3000
close(3) = 0
open("/lib64/libdl.so.2", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\16\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=19520, ...}) = 0
mmap(NULL, 2109744, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a460d7000
mprotect(0x7f2a460da000, 2093056, PROT_NONE) = 0
mmap(0x7f2a462d9000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x7f2a462d9000
close(3) = 0
open("/lib64/libstdc++.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\20\265\5\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=995840, ...}) = 0
mmap(NULL, 3175456, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a45dcf000
mprotect(0x7f2a45eb8000, 2097152, PROT_NONE) = 0
mmap(0x7f2a460b8000, 40960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xe9000) = 0x7f2a460b8000
mmap(0x7f2a460c2000, 82976, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2a460c2000
close(3) = 0
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260T\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1141560, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b42000
mmap(NULL, 3150168, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a45acd000
mprotect(0x7f2a45bce000, 2093056, PROT_NONE) = 0
mmap(0x7f2a45dcd000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x100000) = 0x7f2a45dcd000
close(3) = 0
open("/lib64/libgcc_s.so.1", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360
\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=88720, ...}) = 0
mmap(NULL, 2184192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a458b7000
mprotect(0x7f2a458cc000, 2093056, PROT_NONE) = 0
mmap(0x7f2a45acb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f2a45acb000
close(3) = 0
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \34\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=2112384, ...}) = 0
mmap(NULL, 3936832, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f2a454f5000
mprotect(0x7f2a456ac000, 2097152, PROT_NONE) = 0
mmap(0x7f2a458ac000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1b7000) = 0x7f2a458ac000
mmap(0x7f2a458b2000, 16960, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f2a458b2000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b41000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b40000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b3e000
arch_prctl(ARCH_SET_FS, 0x7f2a46b3e980) = 0
mprotect(0x7f2a458ac000, 16384, PROT_READ) = 0
mprotect(0x7f2a45acb000, 4096, PROT_READ) = 0
mprotect(0x7f2a45dcd000, 4096, PROT_READ) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b3d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2a46b3c000
mprotect(0x7f2a460b8000, 32768, PROT_READ) = 0
mprotect(0x7f2a462d9000, 4096, PROT_READ) = 0
mprotect(0x7f2a464f1000, 4096, PROT_READ) = 0
mprotect(0x7f2a4671c000, 16384, PROT_READ) = 0
mprotect(0x7f2a46935000, 4096, PROT_READ) = 0
mprotect(0x208e000, 2781184, PROT_READ) = 0
mprotect(0x7f2a46b58000, 4096, PROT_READ) = 0
munmap(0x7f2a46b44000, 74093) = 0
set_tid_address(0x7f2a46b3ec50) = 3713
set_robust_list(0x7f2a46b3ec60, 24) = 0
rt_sigaction(SIGRTMIN, {0x7f2a462e1780, [], SA_RESTORER|SA_SIGINFO, 0x7f2a462ea100}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f2a462e1810, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f2a462ea100}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0
brk(0) = 0x35a2000
brk(0x35c3000) = 0x35c3000
brk(0) = 0x35c3000
--- SIGILL {si_signo=SIGILL, si_code=ILL_ILLOPN, si_addr=0x4d64d9} ---
+++ killed by SIGILL (core dumped) +++
Illegal instruction (core dumped)

CPU Info

[root@localhost Desktop]# more /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz
stepping : 9
microcode : 0x19
cpu MHz : 2691.264
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_goo
d nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes x
save avx rdrand hypervisor lahf_lm
bogomips : 5382.52
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 58
model name : Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz
stepping : 9
microcode : 0x19
cpu MHz : 2691.264
cache size : 3072 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_t
sc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx rdrand hypervisor lahf_lm
bogomips : 5382.52
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

Let me know if you need more info.

@jemc
Copy link
Member

jemc commented Oct 18, 2016

@fribeiro1 - can you check your virtualbox version against this note in the README?

@jemc
Copy link
Member

jemc commented Oct 18, 2016

(Hopefully this is just a dup of #693 and #441, and upgrading virtualbox will fix the issue.)

@Praetonus
Copy link
Member

I think it's the same issue as #1176. Your processor supports AVX instructions and LLVM 3.8 incorrectly assumes that it also supports AVX-512. This bug is fixed in LLVM 3.9 but we have a few issues to resolve before we can distribute pre-built packages with LLVM 3.9. Until then, I'd recommend to build from source with LLVM 3.9.

@fribeiro1
Copy link
Contributor Author

fribeiro1 commented Oct 18, 2016

@jemc Unfortunatelly I am using VirtualBox 5.0.26 r108824 already

@dipinhora
Copy link
Contributor

This is most likely due to the Travis CI build server CPU having AVX2 support in addition to AVX while the Intel(R) Core(TM) i5-3340M CPU @ 2.70GHz CPU only supports AVX and not AVX2.

@SeanTAllen
Copy link
Member

Would it make sense to turn off AVX2 in the default builds? I'm unsure of its impact.

@fribeiro1
Copy link
Contributor Author

@SeanTAllen In the mean time, I will try to build it from the source instead.

@SeanTAllen
Copy link
Member

For now, we have decided to document the limitation here. You have to have a AVX2 compatible CPU to use pre-built packages. Otherwise you need to build from source. This should be added to install instructions in the README.

@jemc
Copy link
Member

jemc commented Oct 26, 2016

You can check if your CPU supports AVX2 using grep avx2 /proc/cpuinfo. This should probably be in the README section about limitations of the prebuilt binaries.

@fribeiro1
Copy link
Contributor Author

fribeiro1 commented Oct 26, 2016

Just submitted #1369 to make the change.

@dipinhora
Copy link
Contributor

dipinhora commented Oct 26, 2016

Please keep in mind that Travis runs on GCE (https://docs.travis-ci.com/user/trusty-ci-environment#Overview) and GCE machines can have any of the following CPUs (For the n1 series of machine types, a virtual CPU is implemented as a single hardware hyper-thread on a 2.6 GHz Intel Xeon E5 (Sandy Bridge), 2.5 GHz Intel Xeon E5 v2 (Ivy Bridge), 2.3 GHz Intel Xeon E5 v3 (Haswell), or 2.2 GHz Intel Xeon E5 v4 (Broadwell). see: https://cloud.google.com/compute/docs/machine-types).

The newer CPUs will likely have features that the older ones are unlikely to have and just labeling AVX2 as a requirement for the pre-built packages in the documentation may not be sufficient.

Also, some of the CPUs available in GCE don't have AVX2 so the pre-built packages would work for folks without an AVX2 compatible CPU if the CI ran on the appropriate CPU.

@fribeiro1
Copy link
Contributor Author

fribeiro1 commented Oct 26, 2016

@dipinhora I'd rather warn about AVX2 and update as we learn about other required features than not publishing that information now

@Praetonus
Copy link
Member

@dipinhora We could disable AVX2 with -mno-avx2 in the package builds. Do you think it would work?

@dipinhora
Copy link
Contributor

@fribeiro1 My intent wasn't to discourage the addition into the documentation.

@Praetonus I think the appropriate solution is to decide on the oldest system architecture that we want to support and target it with a arch=<ARCH> to the make flags and add a note into the documentation accordingly. For example, we could target Ivy Bridge by adding arch=ivybridge. This would only use CPU features available on Ivy Bridge CPUs even if the build machine had additional CPU features that Ivy Bridge CPUs don't have (like AVX2, AVX512, etc) but it would mean that systems with older CPUs wouldn't work because we may rely on features that they don't support. This is similar to Linux distributions building binaries for generic i686 target long after newer CPUs came about due to the need for compatibility with older systems. NOTE: The main downside of this is that the Pony Runtime will also be built for the same "safe" target while any executables compiled with Pony will detect the users' system CPU and take advantage of their newer processor. In an ideal case, we would be able to build the Pony Runtime to take advantage of the users' system CPU but that would require compiling it on their machine instead of relying on the Pony Runtime compiled on the CI server when compiling executables with Pony.

@Praetonus
Copy link
Member

@dipinhora That makes sense. For the runtime problem, we could build with Clang on Linux, which would enable us to ship the LLVM bitcode version of the runtime in the packages. The downside is that we wouldn't be testing the compiler against GCC anymore (currently we're doing GCC on Linux and Clang on OSX).

@ponylang/core Any input on what we should set as our baseline Travis architecture? If we decide to ship the LLVM bitcode runtime in the packages, generic i686 looks like the best solution.

@dipinhora
Copy link
Contributor

@Praetonus I had the same thought re: the runtime bitcode and found the existing support in the compiler to rely on it. We'd likely have to make that automatic where if the runtime bitcode file exists, it is used automagically as opposed to having to require the user to pass the option to ponyc. I also found the following reference on Wikipedia LLVM can accept the IR from the GNU Compiler Collection (GCC) toolchain, allowing it to be used with a wide array of extant compilers written for that project. (at https://en.wikipedia.org/wiki/LLVM#Overview_and_description) which would allow us to keep using GCC to compile on Linux (after some testing to confirm that the LLVM in Pony is able to correctly use the GCC IR) but I haven't been able to confirm if that support still exists or if it has been bitrotted/deprecated.

Additionally, IIRC from the ARM cross compiling work, there are some C compile time related decisions that impact how Pony behaves made via IFDEFs that might need revisiting to make more dynamic but my memory on that front is fuzzy and most of these may have already been addressed by the cross compiling work mentioned.

In terms of the baseline target, it's probably a good idea to create two targets, one for 32 bit (i686) and another for 64 bit (x86_64) with Travis only building the 64 bit one by default (as it currently does).

@Praetonus
Copy link
Member

The main issue with using the runtime bitcode by default is that we're running the optimisation pipeline after merging the program and the runtime modules together to get some very good whole-program-optimisations, and that takes a lot of time. But we could certainly supply a --runtimebc-no-wpo flag and use that by default if the bitcode file is found.

I've never worked with GCC's IR but I've heard that it's a pain to work with. I'll look into what can be done.

@dipinhora
Copy link
Contributor

I'm not sure I follow. Are you saying we would want to default to --runtimebc-no-wpo for the shorter compile times because of the skipped whole-program-optimizations? Isn't that contrary to the second bullet point of the Pony Philosophy about performance?

Also, I have a strong suspicion that note in Wikipedia may have been linked to DragonEgg (http://dragonegg.llvm.org/) which is deprecated at this point. Your suggestion about using clang on linux is likely the best option available for generic compiler packages that still support the highest performance code generation by using the runtime llvm bitcode. This would actually be a performance improvement for anyone with a CPU with features the CI server CPU didn't have.

@Praetonus
Copy link
Member

The performances of a program built with the runtime bitcode without WPO would be the same as a program linked with the static library (or a bit better, because of the possible CPU feature differences). There would certainly be a difference between WPO and non-WPO but that difference would be on the call boundaries between the program and the runtime. Each of the components would still be fully optimised separately in a release build.

I'd say that the large majority of the program builds out there are development builds, where full performance isn't needed, but where compilation times matter. If somebody is doing a production build, they can ask for full WPO.

Actually, we might even want to tune the default behaviour based on the type of build. Or even based on flags when building the compiler itself. I think I'll do a RFC to clarify all of that.

Regarding Clang and GCC, I'm currently considering whether pushing ponyc towards a tighter integration with Clang would be a good thing. That would enable us to do some nice things with FFI and the runtime but the available features of ponyc would depend on the availability of Clang and its libraries. But since LLVM and Clang are a package deal and we already depend on LLVM, maybe that would be feasible. Well, it also deserves a RFC.

@dipinhora
Copy link
Contributor

dipinhora commented Oct 27, 2016

NOTE: When I mention release build or debug build below I'm referring to whether -d is passed to ponyc and not to whether ponyc was built with config=debug or config=release.

I think enabling WPO by default for release builds and disabling it by default for debug builds makes a lot of sense and follows current standards where debug builds disable optimization passes that are enabled for release builds.

Default behavior is already dependent on type of build where the debug builds generate debug info and disable optimization passes where the release builds do the opposite. There are likely other differences also but I'm not aware of all of them.

I can't comment on the Clang coupling much but will gladly lurk and read the RFC. 8*D

@jemc
Copy link
Member

jemc commented Oct 27, 2016

I think enabling WPO by default for release builds and disabling it by default for debug builds makes a lot of sense

Agreed. When I'm iteratively developing and want fast compile times, I always compile my pony programs --debug. I would omit --debug for a production-ready build, where I care more about program performance and less about compile times.

@SeanTAllen
Copy link
Member

I believe this should be fixed by #1663. And a new version of pony 0.11.1 will be released soon. Can you please give 0.11.1 a try after it is released and see if your issues are fixed. In the meantime, I'm going to close this issue under the assumption that its been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants