Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiler hang #10248

Closed
metelik opened this issue Aug 4, 2020 · 11 comments
Closed

Compiler hang #10248

metelik opened this issue Aug 4, 2020 · 11 comments

Comments

@metelik
Copy link

metelik commented Aug 4, 2020

Hello,

We are trying to switch from Erlang OTP v22.3 to OTP v23.0 in our project. I saw the issue #9980 being logged and picked the 23.0.3 where the number of 'hanging' modules decreased but:

_Compiling lib/common_core/schema.ex (it's taking more than 15s)
eheap_alloc: Cannot allocate 6801972448 bytes of memory (of type "heap").

Crash dump is being written to: erl_crash.dump...done_

Environment:
@ubuntu:/mnt/src/agilis_fw/common_core.fw.x86# elixir --version
Erlang/OTP 23 [erts-11.0.3] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]
Elixir 1.10.3 (compiled with Erlang/OTP 23)

OS: Linux ubuntu 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Furthermore, we encounter the hand followed by the erl_crash.dump on each and every module doing the:

import OurApp.Gettext

Another series of crashdumps occur for the modules doing the following:
use Absinthe.Schema
use Absinthe.Relay.Schema, :classic

I can share a crashdump:

erl_crash.dump.gz.part-ab.gz
erl_crash.dump.gz.part-ac.gz
erl_crash.dump.gz.part-aa.gz

The ^^ is a one gzip archive split with the split -b 9m command to circumvent the github's restrictions.

With my best wishes
Tomasz Motyl

@josevalim
Copy link
Member

This is an issue with Gettext. In a nutshell, you have too much data such that embedding it all in a single module is inefficient. There is one option you can use though: :one_module_per_locale. There is more info here: https://hexdocs.pm/gettext/Gettext.html#module-backend-configuration

If that doesn't work, try using gettext master, which has further optimizations around this area. Thanks!

@metelik
Copy link
Author

metelik commented Aug 4, 2020

Thanks a million Jose for your prompt response.I shall give your above hint a try. We do not have that many gettext entries though:
root@ubuntu:/home/src/agilis_fw/common_core.fw.gettext# egrep -rn "gettext(" ./lib/ | wc -l
222

I am not that familiar with the compiler's intrinsic details. You say 'too much data' - how come? What is the limit? I somewhat do not see how 222 gettext entries would exhaust the resources whilst compiling.

How about the second part:

use Absinthe.Schema
use Absinthe.Relay.Schema, :classic

?

As I see it efficiency is one thing we could improve. Should the compiler crash though having like 16G RAM available? Why this started occurring with OTPv23 and never crashed before?

With my best wishes
Tom

@josevalim josevalim reopened this Aug 4, 2020
@josevalim
Copy link
Member

Oh, I missed the Absinthe ones when reading. You have 16GB of RAM but the error message says:

eheap_alloc: Cannot allocate 6801972448 bytes of memory (of type "heap").

So you are trying to allocate ~7GB on top of what is already being used. Therefore the question is what is trying to allocate so much memory and how we can break that apart. Gettext can be a culprit, which is why I jumped to the conclusion when I saw it (sorry!), but if you have only 222 entries, it certainly won't be an issue.

However, this is most likely not an issue with Elixir or Erlang compilers, but rather with a library/app code. For example, what could happen is that the compiler on OTP 23 is faster, which allows it compile more things in parallel, which makes this error more likely. So without having a mechanism to reproduce the high memory consumption, it is unlikely we can improve anything. :) So can you isolate it further, it would help us a ton!

@metelik
Copy link
Author

metelik commented Aug 10, 2020

Hello Jose,

Just an update. I added the :one_module_per_locale to the project's gettext config and threw in additional 4GB of ram (12GB --> 16GB) my devel virtual machine I run elixir/erlang compilation on. The issue appears to be alleviated. The vital question though is why it requires so much memory to compile a Phoenix (v1.3 - we plan to upgrade but one issue at a time) app. The app utilises gettext and the absinthe backend, the schema of which comprises of two rather big modules ~1MB each.
I can serve with the erl_crash.dump when the VM has only 12GB of memory. Would you possibly have a hint on other information with which I could provide you.
I am not sure whether this is really a bug. The only thing that is a bit disconcerting is the amount of memory the compiler requires.

With my best wishes
Tomasz Motyl

@josevalim
Copy link
Member

josevalim commented Aug 10, 2020

Hi @metelik! As I said above, Gettext is most likely not the issue since you have so few translations.

The only way to move forward here is to isolate the issue. It is most likely not an Elixir issue, but rather a library that is generating too much meta code during compilation - which becomes extremely large during later compilation passes.So in a nutshell, yeah, it is not supposed to happen, but we are likely not the culprit. :)

@juanperi
Copy link

juanperi commented Aug 10, 2020

a better indicator of number of translations to see if gettext might be the culprit would be
egrep -rn "^msgid" ./priv/gettext/**/*.po | wc -l

this will take into account not only source strings, but also the number of locales you are translating to

Edit
PS: the previous one would work as long as you have the translations already in place, as it checks for the PO files with the translations

@metelik
Copy link
Author

metelik commented Aug 10, 2020

@epilgrim I scanned for all *.po's in the project including dependencies. No gettext dir in the priv directory. The thing is scattered in './_build/lces2/lces2_dev/rel/firmware/lib/' after my adding :One_module_per_locale:
@ubuntu:/home/src/agilis_fw/firmware.fw# egrep -rn "^msgid" find . -name "*.po" | wc -l
334418

@josevalim Yes I totally get it. I tried and ripped nearly everything but gettext and absinthe schema (the suspected culprits) from the project and guess what... The problem disappeared ;) Like in case of a typical heisenbug.

On but rather a library that is generating too much meta code during compilation
Is there a simple enough and clever way to see precisely what the compiler is doing or just standard 'halving' approach?

@juanperi
Copy link

334K strings to translate are a lot more than 222 entries :). As @josevalim said before, try the one_module_per_locale and gettext master branch, as there is a PR with a mitigation to the memory usage when compiling many locales elixir-gettext/gettext#262
Now, if you deleted everything else except gettext and it worked, then I don't know

@metelik
Copy link
Author

metelik commented Aug 10, 2020

334K strings to translate are a lot more than 222 entries :). As @josevalim said before, try the one_module_per_locale and gettext master branch, as there is a PR with a mitigation to the memory usage when compiling many locales elixir-gettext/gettext#262
Now, if you deleted everything else except gettext and it worked, then I don't know

I know - shock! I initially egrepped for the occurences of gettext( in OUR (without deps) code ... I shall give a master branch a try as a next thing... At least a starting point... Thanks a million folks... I shall come back having more data hopefully.

@josevalim
Copy link
Member

Oh, 300k is definitely the root cause. :) I have released a new gettext version, so make sure you are on v0.18.1. If the issue persists, feel free to e-mail me your priv/gettext directory, and I would love to see where I can optimize it further. :)

@josevalim
Copy link
Member

And thanks for the help @epilgrim!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants