-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
item_bodies_checking
CPU time and liveness_and_intrinsic_checking
mem usage explode on a generated ~2.5MB .rs file
#79671
Comments
@yupferris you can see memory usage per-query by using |
I tried this, but for some reason there's no memory usage reported; is this not available on mac perhaps? `$ cargo rustc --release -- -Z time-passes`
|
item_bodies_checking
mem usage and CPU time explode on a generated ~2.5MB .rs file
That's super weird, it shows up for me on linux: -Z time-passes
|
Huh, I guess it's only available on windows and linux :( rust/compiler/rustc_data_structures/src/profiling.rs Lines 603 to 635 in 1f95c91
|
Aha, good to know! Unfortunately, the build kills my windows system and I don't have a linux box, so I'm not sure I can be of much help diagnosing further.
This was kindof interesting, mem usage didn't seem to increase much in |
item_bodies_checking
mem usage and CPU time explode on a generated ~2.5MB .rs fileitem_bodies_checking
CPU time and liveness_and_intrinsic_checking
mem usage explode on a generated ~2.5MB .rs file
I updated the -Z time-passes log, it was |
Aha, yep, there it is - on my box I observed upwards of 11GB (just watching rustc in the activity monitor), so it's definitely more! |
Adding |
This prevents the compiler from performing expensive and unnecessary lints on the generated code, saving several seconds and, more importantly, several GB(!!) of memory during compilation of large sims. See rust-lang/rust#79671 (comment) for more info.
Indeed, that's a huge help for the memory issue! I'm still a bit curious about compile time and whether or not it can be improved/worked around, but this certainly unblocks the project at least! |
Is there a non-dropbox location for the source code? Some general advice: If you generate large functions, splitting up generated code into smaller ones may help. Generating explicit type annotations may help too. |
|
Good to know, I will see if there's some kind of partitioning I can do, thanks!
Indeed this helped a bit - with the memory workaround in, a full compile takes ~60s on my system. With type annotations on top of that, it drops to ~50s. Hopefully I can come up with a graph partitioning scheme that allows generation of smaller functions to try and work around these scaling issues. |
FWIW it's actually pretty easy to reproduce with a dumb generator like this: use std::env;
use std::fs::File;
use std::io::{self, Write};
fn main() -> io::Result<()> {
let output_file_name = env::args().skip(1).nth(0).expect("Missing output file name arg");
let mut output = File::create(output_file_name)?;
writeln!(output, "pub struct Lol {{")?;
writeln!(output, "pub i: bool,")?;
writeln!(output, "pub o: bool,")?;
writeln!(output, "}}")?;
// Uncomment to ignore `liveness_and_intrinsic_checking` etc
//writeln!(output, "#[automatically_derived]")?;
writeln!(output, "impl Lol {{")?;
writeln!(output, "pub fn prop(&mut self) {{")?;
// Modify to see mem explosion in `liveness_and_intrinsic_checking` and nonlinear time scaling in `item_bodies_checking`
let count = 30000;
let mut last_temp_name = None;
for i in 0..count {
let temp_name = format!("temp{}", i);
writeln!(output, "let {} = !{};", temp_name, last_temp_name.unwrap_or("self.i".into()))?;
last_temp_name = Some(temp_name);
}
writeln!(output, "self.o = {};", last_temp_name.unwrap())?;
writeln!(output, "}}")?;
writeln!(output, "}}")?;
Ok(())
} This generates a .rs file that can be included as a module in another project and even if it's unused, compilation will scale poorly as in the original provided project(s). |
This is to reduce the amount of (unnecessary) bindings in the generated prop fn, as this has been found to have nonlinear time and mem scaling in the rust compiler currently (see rust-lang/rust#79671 and an initial/related workaround in 7c2ff85), and it's been recommended to reduce these. As expected, this appears to be quite effective! A side effect of this is that IR expressions can now have unbounded depth, so lowering those also has to be iterative instead of recursive. I've introduced parens in some cases which may be unnecessary (to avoid the complexity of determining whether or not we actually need them), so we now mark the impl item with an attribute to ignore these. There are a couple cases like Signal::bits which ends up being lowered to a shift and a bitmask in addition to a cast (if necessary). If a user builds a graph with several of these calls on the same Signal, the intermediates between these steps aren't refcounted like normal Signals are (refcounting happens too late for that) - so we won't be able to flatten these to a single temporary. There are a few other such cases as well. I've chosen to simply ignore all of them - they're not practically an issue, and we can always revisit them later if need be (though this will likey require the introduction of yet another IR and perhaps more complex passes such as CSE on it to be effective). Also rename Node -> Frame for the intermediate types used when traversing the graph iteratively, as this less ambiguously describes what these types represent.
This is to reduce the amount of (unnecessary) bindings in the generated prop fn, as this has been found to have nonlinear time and mem scaling in the rust compiler currently (see rust-lang/rust#79671 and an initial/related workaround in 7c2ff85), and it's been recommended to reduce these. As expected, this appears to be quite effective! A side effect of this is that IR expressions can now have unbounded depth, so lowering those also has to be iterative instead of recursive. I've introduced parens in some cases which may be unnecessary (to avoid the complexity of determining whether or not we actually need them), so we now mark the impl item with an attribute to ignore these. There are a couple cases like Signal::bits which ends up being lowered to a shift and a bitmask in addition to a cast (if necessary). If a user builds a graph with several of these calls on the same Signal, the intermediates between these steps aren't refcounted like normal Signals are (refcounting happens too early for that) - so we won't be able to flatten these to a single temporary. There are a few other such cases as well. I've chosen to simply ignore all of them - they're not practically an issue, and we can always revisit them later if need be (though this will likey require the introduction of yet another IR and perhaps more complex passes such as CSE on it to be effective). Also rename Node -> Frame for the intermediate types used when traversing the graph iteratively, as this less ambiguously describes what these types represent.
Changes from #79727 reduced max rss for |
I've been working on a compiler project to be able to describe digital logic in rust. This logic can be compiled to Verilog for use on FPGA or in silicon, and in order to test, it can be compiled to a simulator as pure rust code. The idea is that a user would use a
build.rs
script to compile the relevant module(s) and output to amodules.rs
file, and then the hw module(s) can be tested/verified with regular rust code/tests.The main project I'm building with this is getting to the point where the simulator exposes some scaling issues. In particular, when compiling the generated rust code, rustc's memory usage stays around 100-150MB for most of the process, but then suddenly jumps to 11GB+(!!!) towards the end, before either completing (mac) or causing my system to come to a grinding halt (windows) and requiring a hw reset (due to the exorbitant memory usage causing constant swapping). Additionally, the compilation process takes well over a minute for this single file.
I've packed one such
modules.rs
into a project. It's enough tocargo build [--release]
to reproduce the issue. The file contains a single simulatorstruct
. Itsnew
fn is relatively normal, as is itsposedge_clk
fn. Itsprop
fn is very large - this is where the majority of the logic lives to propagate signals through the design. So I would imagine there's something that scales superlinearly that tends to work fine under more typical circumstances, but for a large fn with loads of bindings, it blows up.I did a bit of preliminary profiling out of curiosity. Running
cargo rustc --release -- -Z self-profile
yields this:(summary output for `a`)
Unfortunately there doesn't seem to be any memory usage info in this report (at least not that I know how to extract), but most of the time is spent in
typeck
, which may be relevant, but I'm not at all familiar with rustc's internals.For the sake of trying something I did hack my compiler to output types for all the temporaries in the
prop
fn (that version is available here). While this does make things about 10 seconds faster, the mem usage spike is still there, so it's not really a viable workaround.(summary output for `b`)
Meta
rustc --version --verbose
:This issue occurs for
stable
as well. I have not triedbeta
. It also appears in windows builds.Thanks for your time, and let me know if there's anything else I can do to help identify the issue!
The text was updated successfully, but these errors were encountered: