Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta #36799

Open
urschrei opened this issue Sep 28, 2016 · 100 comments
Assignees
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@urschrei
Copy link
Contributor

urschrei commented Sep 28, 2016

I'm trying to build a cdylib (https://github.com/urschrei/lonlat_bng) which requires the https://github.com/urschrei/ostn15_phf crate.

building on AppVeyor, on i686-pc-windows-gnu, using the latest beta, is failing with an OOM error:

Details

The ostn15_phf crate is essentially just a very big static map, built using PHF (the generated, uncompiled map is around 42.9mb)

The build passed when running cargo test, using rustc 1.12.0-beta.3 (341bfe43c 2016-09-16):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/105/job/3y1llt6luqs3phs3

It's now failing when running cargo test, using rustc 1.12.0-beta.6 (d3eb33ef8 2016-09-23):
https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/27pgrkx2cnn2gw50

The failure occurs when compiling ostn15_phf with fatal runtime error: out of memory

@sfackler sfackler added regression-from-stable-to-beta Performance or correctness regression from stable to beta. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 28, 2016
@urschrei urschrei changed the title Compilation of a crate using a large static map fails on latest i686 Beta Compilation of a crate using a large static map fails on latest i686-pc-windows-gnu Beta Sep 28, 2016
@sfackler
Copy link
Member

cc @brson just want to make sure you're aware of this.

@pnkfelix pnkfelix added regression-from-stable-to-stable Performance or correctness regression from one stable version to another. P-high High priority and removed regression-from-stable-to-beta Performance or correctness regression from stable to beta. labels Sep 29, 2016
@nikomatsakis
Copy link
Contributor

Related to #36926 perhaps? ("1.12.0: High memory usage when linking in release mode with debug info")

@nikomatsakis
Copy link
Contributor

@urschrei have you tried this on platforms other than windows, out of curiosity?

@nikomatsakis
Copy link
Contributor

I see. It seems to work on x86_64-pc-windows-gnu, but not i686.

@urschrei
Copy link
Contributor Author

urschrei commented Oct 6, 2016

I haven't tried to build i686 on Linux or OSX, but I easily could…

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 6, 2016

Well, I just did a run on my linux box. The memory usage is certainly through the roof: https://gist.github.com/nikomatsakis/ea771dd69f12ebc5d3d5848fa59fb43a

This is using nightly (rustc 1.13.0-nightly (a059cb2f3 2016-09-27)). The peak is around 4GB.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 6, 2016

So @alexcrichton has executed a massif run: https://gist.github.com/alexcrichton/d20d685dd7475b1801a2ccac6ba15b08

The peak result Measurement 58 shows something like this:

Percentage Area
36% HIR
9% MIR
14% type/region arenas
5% region maps
2% constants

The peak result (measurement 48) looks pretty similar. More memory used by MIR:

Percentage Area
26% HIR
14% MIR
5% type/region arenas
4% region maps
3% constants

These numbers are just based on a kind of quick scan of the gist.

@nikomatsakis
Copy link
Contributor

It seems clear we need to revisit our handling of statics and constants in a big way. But then this has been clear for a long time. =) I'm wondering if we can find some kind of "quick fix" here.

I also haven't compared with historical records -- but most of that memory we see above, we would have been allocating before too, so I'm guessing this is a case of being pushed just over the threshold on i686, versus a sudden spike.

@nikomatsakis
Copy link
Contributor

Sizes of some types from MIR (gathered from play):

statement:      192
statement-kind: 176
lvalue:         16
rvalue:         152
local-decl:     48

@nikomatsakis
Copy link
Contributor

I am feeling torn here. It seems the best we can do short-term is to make some small changes to MIR/HIR and try to bring the peak down below 4GB. The long term fix would be to revisit the overall representation particularly around constants and see what we can do to bring the size down. One thing I was wondering about (which is probably answerable from @alexcrichton's measurements) is what percentage of memory is being used just in the side-tables vs the HIR itself.

In any case, it seems like 152 bytes for a mir rvalue is really quite big.

@nikomatsakis
Copy link
Contributor

Looking again at the massif results, it looks like MIR is taking more memory than I initially thought. One place that seems to use quite a bit is the list of scopes.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 12, 2016

Better numbers:

percent pass
35.09% HIR
34.96% MIR
11.77% mk_region, node_type_insert
6.24% region_maps
2.42% consts
1.39% spans

@brson
Copy link
Contributor

brson commented Oct 20, 2016

Just pinging @nikomatsakis to keep this P-high bug on his radar.

@nikomatsakis
Copy link
Contributor

I've made basically no progress here, I'm afraid. I think the most likely path forward in short term is to try and reduce memory usage in various data structures. Not very satisfying though. I'm not sure if we can make up the gap that way.

@nikomatsakis
Copy link
Contributor

Discussed in compiler meeting. Conclusion: miri would be the proper fix, but maybe we can shrink MIR a bit in the short term, perhaps enough to push us over the edge. I personally probably don't have time for this just now (have some other regr to examine). Hence re-assigning to @pnkfelix -- @arielb1 maybe also interested in doing something?

@pnkfelix
Copy link
Member

cc @nnethercote in case he might have input on ways to attack this

@nnethercote
Copy link
Contributor

nnethercote commented Oct 28, 2016

Massif is slow and the output isn't the easy to read, but it's the ideal tool for tackling this. The peak snapshot in @alexcrichton's data is number 48, and I've made a nicer, cut-down version of it here: https://gist.github.com/nnethercote/935db34ff2da854df8a69fa28c978497

You can see that ~30% of the memory usage comes from lower_expr. I did a DHAT run (more on that in a moment) and this is the main culprit:

    ExprKind::Tup(ref elts) => {
        hir::ExprTup(elts.iter().map(|x| self.lower_expr(x)).collect())
    } 

push_scope/pop_scope account for another ~15%.

@arielb1
Copy link
Contributor

arielb1 commented Oct 28, 2016

@nnethercote

The push_scope/pop_scope is MIR things. I think interning lvalues (they take 16 bytes, but occur multiple times per statement/terminator - they should take 4) and boxing rvalues (many statements are storage statements that don't have an rvalue) should claw back some performance.

Also, enum Operand is very common, in most contexts just an lvalue, and takes 72 bytes. Something should be done about this, even just boxing constants (which would reduce it to 24 bytes, or 16 bytes with interning lvalues, or 8 bytes with interning lvalues and bitpacking).

@arielb1
Copy link
Contributor

arielb1 commented Oct 28, 2016

Having a 32-bit MIR "shared object representation" MirValue with 3/4 tag bits and the rest an index to an interner would basically apply all of these suggestions (where MirValue can represent all of Constant, Lvalue, Rvalue, Operand, and we keep the structs as strongly-typed newtypes).

We can then intern MirValues and/or occasionally GC-intern them, for further memory gain.

bors added a commit that referenced this issue Sep 24, 2017
put empty generic lists behind a pointer

This reduces the size of hir::Expr from 128 to 88 bytes (!) and shaves
200MB out of #36799.

This is a performance-sensitive PR so please don't roll it up.

r? @eddyb
@michaelwoerister
Copy link
Member

@arielb1 Oh right, with one CGU memory usage should be the same unless there's some kind of bug.

bors added a commit that referenced this issue Sep 25, 2017
encode region::Scope using fewer bytes

Now that region::Scope is no longer interned, its size is more important. This PR encodes region::Scope in 8 bytes instead of 12, which should speed up region inference somewhat (perf testing needed) and should improve the margins on #36799 by 64MB (that's not a lot, I did this PR mostly to speed up region inference).

This is a perf-sensitive PR. Please don't roll me up.

r? @eddyb

This is based on  #44743 so I could get more accurate measurements on #36799.
@arielb1
Copy link
Contributor

arielb1 commented Sep 27, 2017

@petrochenkov's span reform PR had lost us 400MB of space here, which means making this compile is that much harder. Maybe we should just use enough bits for the crate when encoding spans?

@arielb1
Copy link
Contributor

arielb1 commented Sep 27, 2017

or use 40 bit spans, which should be enough to handle 64MB crates?

@michaelwoerister
Copy link
Member

@arielb1 I'm not sure I understand: The span PR makes this test case use 400 MB more -- or less?

@arielb1
Copy link
Contributor

arielb1 commented Sep 28, 2017

it makes the test case use 400MB more, because the span just overflows the 24-bit length we have.

@arielb1
Copy link
Contributor

arielb1 commented Sep 28, 2017

The current peak is during LLVM translation where we are using memory from both MIR and LLVM. I think that after miri lands, we could easily store constants as byte arrays, which should bring this well into the green zone.

@nikomatsakis
Copy link
Contributor

triage: P-medium

It seems like we are going to wait until we can fix this properly.

@rust-highfive rust-highfive added P-medium Medium priority and removed P-high High priority labels Nov 9, 2017
@oli-obk
Copy link
Contributor

oli-obk commented Apr 17, 2018

miri has landed. I'm not entirely sure how to reproduce this though.

@urschrei
Copy link
Contributor Author

urschrei commented Apr 17, 2018 via email

@oli-obk
Copy link
Contributor

oli-obk commented Apr 17, 2018

jup. it's even in beta (although somewhat broken)

@urschrei
Copy link
Contributor Author

Just ran into #49930, which causes…some failures. Will hold off and try again when the backport lands.

@oli-obk
Copy link
Contributor

oli-obk commented Apr 19, 2018

we have new nightlies!

@urschrei
Copy link
Contributor Author

i686 is failing on ac3c228 (2018-04-18) with exit code: 3221225501: https://ci.appveyor.com/project/urschrei/lonlat-bng/build/job/qaipuqj88243xs4c#L115

(Which is a memory exhaustion error I think?)

@oli-obk
Copy link
Contributor

oli-obk commented Apr 20, 2018

looks like it. Because the x86_64 target host works. You can probably build for the i686 target on the x86_64 host.

@urschrei
Copy link
Contributor Author

@oli-obk Oh, I had no idea – can you point me at some details?

@oli-obk
Copy link
Contributor

oli-obk commented Apr 21, 2018

You need to install the cross toolchain via rustup and invoke cargo with the target flag for that cross target

@pnkfelix pnkfelix added the I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. label Apr 12, 2019
@dkg
Copy link

dkg commented Apr 22, 2022

Not sure whether this is exactly the same issue, but compiling the test suite for v0.8.5 of the xxhash-rust crate also failed on 32-bit platforms due to rustc memory exhaustion. in that crate, test-vector.rs contained a series of ~5K assert_eq! expansions, and rustc was consuming > 6GiB of RAM during the build on 64-bit platforms (which obviously won't be possible for a 32-bit rustc process).

Future versions of that crate will work around the failure by changing the test.

I don't know how to run the more detailed RAM diagnostics @nikomatsakis and @alexcrichton report above, but i offer this as another example of excessive memory allocation that fails on 32-bit platforms. I can also report this as a separate issue, if that would be useful.

@Noratrieb Noratrieb added O-x86_32 Target: x86 processors, 32 bit (like i686-*) and removed O-x86-all labels Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. I-compilemem Issue: Problems and improvements with respect to memory usage during compilation. I-crash Issue: The compiler crashes (SIGSEGV, SIGABRT, etc). Use I-ICE instead when the compiler panics. O-x86_32 Target: x86 processors, 32 bit (like i686-*) P-medium Medium priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests