-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compiler hang #9980
Comments
I've also noticed similar hangs and slowdowns lately. Those happen both on local MacOS and CI (Debian Linux) and started happening before 10.3 was released. Unfortunately I don't have anything that can help in pinpointing the issue but CI logs like
|
Thanks @dmorneau. We have recently heard similar reports. Can you check if there is any process waiting on the code_server/code:purge call? If so, what is the callsite? @lukaszsamson, do you remember if this happened with v1.9? Or did it start with v1.10? Tracking this down would help us a lot. |
@josevalim IIRC it started happening 2-3 weeks ago and since over a month all of our build pipelines run on 1.10. The first CI failure from our servers is from 19 days ago. If I had to guess I'd blame it on otp 22.3.2 released 21 days ago. |
@lukaszsamson can you please use OTP 22.2 on CI and see if it addresses the issue? |
If your CI doesn't allow you to choose a particular OTP version, but you can use a custom Docker image, you can try one of these: https://hub.docker.com/r/hexpm/elixir/tags?name=1.10.3-erlang-22.2.8&page=1 |
@josevalim Yes, it's the second screenshot I pasted. It looks like this is the call: https://github.com/elixir-lang/elixir/blob/v1.10.3/lib/elixir/src/elixir_compiler.erl#L77 |
Thanks! It seems there was a bugfix in the code purger on 22.3.0 to 22.3.1, so I would recommend those running into it to revert to 22.2 and see if the issue persists. |
We can but it's not going to be a drop in replacement as we currently use circleci/elixir images which do not tag otp releases |
As @wojtekmach said you can use these docker images which do tag the OTP version: https://hub.docker.com/r/hexpm/elixir/tags?name=1.10.3-erlang-22.2.8&page=1 |
On our side we have seen the problem happening when we switched from |
We use CircleCI images as well and we're unable to use the Hex images due to them lacking git (and possibly other things) |
If anyone runs into this issue, please do:
I have opened up an issue with Erlang/OTP here. @dmorneau, if it is an option, please consider sending the crash dump privately to them. |
Just upgraded to Erlang 22.3.3, the hang happened still (and on Elixir 1.10.3), and finally got an Will I be able to upload it to the Erlang issue tracker if I sign up there? |
@dimitarvp yes, you should. If not, feel free to email me (email is on my profile), and I can make sure it is delivered. |
@josevalim I was able to get a dump on macos with
will that be helpful? In the OTP bug only linux dumps were mentioned. |
@lukaszsamson send it there just in case it can help someone. :) |
OTP 22.3.4 has been released with some fixes that may (or may not) affect this. If folks can upgrade to latest 22.3.4 and see if the issues persist or not, it would be very welcome. Thank you! |
I have made a first test: on my computer (Mac), I don't remember strictly seeing the slow-down/hang, but currently the compilation seems fast with OTP 22.3.4. I'll be able to make a second test once 22.3.4 is available through https://packages.erlang-solutions.com/erlang-solutions_2.0_all.deb (currently it is apparently not), because this is what I use in Docker and where the slow compilation occurred for me. I will try again later! |
As mentioned in the linked Erlang ticket: I checked with a big-ish project of mine and I have seen no hangs or slowdowns during compilation with 22.3.4. |
Didn't want to be the bringer of evil news but I still experience hangs on CI with 22.3.4 and elixir 1.10.3 on a large project. It is kinda random when it triggers and it is difficult to re-run with ssh the build job so that I could get any dumps. |
Same here, unfortunately - still seeing compiler timeouts after 10 minutes in CI occasionally. Here's the info for our build that just timed out:
It appears to be random for us as well, as I can retry the build on the same git SHA and get a build that compiles just fine. |
We are also seeing this behavior with the same random behavior from the same SHAs. We're on 1.10.2, as a data point. I had to cancel compile steps that had been running for 1.5h. Rerunning with same inputs succeeded in 1-2m. Please let me know if I can provide data to help diagnose. |
As others have reported 22.3.4 does not fix the issue. I captured another dump and attached to the erlang bug report. |
The OTP team has provided a modified version of Erlang's source: https://bugs.erlang.org/browse/ERL-1236?focusedCommentId=17936&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17936 If someone could run it on Debian, it would be very much appreciated. You can use something like |
Just FYI, I'm also seeing this issue on Ubuntu (compiles that hang). Env invo:
Seems to have started appearing after I updated Erlang recently. Also, not sure if this can result in problems such as this but Elixir seems to be compiled against OTP22 while I now have OTP23 installed. As others have stated, the behavior seems nondeterministic (ugh). Killing the process and rerunning the same command again has succeeded most of the time so far. If I can help at all with additional info or diagnostics, please let me know! |
This should be fixed on Erlang 22.3.4.1 released today. A fix for 23.0 is coming out soon. Thanks everyone for helping! |
Hello, We are trying to switch from Erlang OTP v22.3 to OTP v23.0.3 in our project. _@ubuntu:/mnt/src/agilis_fw/common_core.fw.x86# elixir --version Elixir 1.10.3 (compiled with Erlang/OTP 23)_ We encounter the hand followed by the erl_crash.dump on each and every module doing the: import OurApp.Gettext I deem it might be related to the issue above. I can share the crashdump: erl_crash.dump.gz.part-ab.gz The ^^ is a one gzip archive split with the split -b 9m command to circumvent the github's restrictions. With my best wishes |
We switched to Erlang 22.3.4.1. We still observe Erlang compiler hangs about 5-10% of our runs in CI - we have almost 800K of Erlang in our project spread over 8 files. (ASN.1 codecs generated by asn1ct.) It seems that the compiler hangs until the job is killed about 40 minutes later. Opened this: https://bugs.erlang.org/browse/ERL-1433 |
I wonder how many people reporting it as being hung are just impatient. It does take some time to complete. I came here thinking that my compilation was inactive only for it to complete 5 minutes later.. |
In my case I definitely saw hangs (on the order of a half hour or more). I've tried to reduce the number of cross-dependencies in my app, eliminate |
Once taking OTP 22.3.4.11 into use, I stopped seeing compiler hangs from the Erlang side. Since my main problem are some ridiculously large Erlang files (that's how asn1ct works) I first suspected the compiler hang to be still in place only to find that Erlang compilation times had increased by at least 50% between OTP20 and OTP22. In the end I did see no more compiler hangs, times have been rather deterministic. What I ended up doing is replacing the default compiler wrapper for Erlang in mix with one that compiles my large Erlang files in parallel. By using more than one core the problem became manageable (= total build time back to OTP20 average or less). |
A couple notes: on Elixir v1.11 we have improved the tooling so
In both cases Elixir v1.11 is a requirement. :) |
Environment
Behavior
While doing a clean build of an app (via
mix test
) after upgrading to Elixir 1.10.3 (from 1.10.2), I hit a hang in the compiler. It printed the "taking longer than 15s..." warning for many files, then stopped making any progress.After 10 ~ 15 min I took a crash dump using SIGUSR1. At that point, the only process with messages in its queue was
code_server
(MsgQ=7). The code server is insidedo_purge
, waiting on a reply fromerts_code_purge
, which is itself inside ado_purge
function, purging a module calledelixir_compiler_18
.The request looks like it came from another process that was compiling some
LiveComponent part of the app:
I can't share the crash dump, but is there something I could check that might help?
This didn't reproduce after I killed the compiler and tried again.
The text was updated successfully, but these errors were encountered: