Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store metadata in import libraries for MSVC #39669

Closed
wants to merge 1 commit into from

Conversation

Diggsey
Copy link
Contributor

@Diggsey Diggsey commented Feb 9, 2017

This reduces the size of rust dylibs by an order of magnitude in some cases.

@rust-highfive
Copy link
Collaborator

r? @arielb1

(rust_highfive has picked a reviewer for you, use r? to override)

@arielb1
Copy link
Contributor

arielb1 commented Feb 9, 2017

r? @alexcrichton

@rust-highfive rust-highfive assigned alexcrichton and unassigned arielb1 Feb 9, 2017
@bors
Copy link
Contributor

bors commented Feb 10, 2017

☔ The latest upstream changes (presumably #39694) made this pull request unmergeable. Please resolve the merge conflicts.

@@ -889,6 +889,24 @@ fn get_metadata_section_imp(target: &Target,
let blob = MetadataBlob::Raw(buf);
verify_decompressed_encoding_version(&blob, filename)?;
return Ok(blob);
} else if target.options.is_like_msvc {
// Treat the import library as though it were an rlib!
let ref implib_filename = filename.with_extension("dll.rlib");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks basically equivalent to the rlib branch above, can these branches be shared?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, will do that

let _ = fs::create_dir_all(dst_dir);
let ref dst = dst_dir.join(filestem).with_extension("dll.lib");

// Modify import library to remove metadata before linking
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to emit the symbol with a visibility level that we don't have to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or otherwise, can you comment as to why we're doing this?

@alexcrichton
Copy link
Member

cc @vadimcn

This looks pretty good to me, but ideally we'd unify code paths as much as possible to avoid too much divergence between platforms. It seemed a little odd to me to be frequently frobbing between .dll.rlib and .dll.lib, would it be possible to avoid those transition costs? Similarly, could you add in documentation somewhere the high level overview here? (I had to just reverse engineer what a .dll.rlib was)

@retep998
Copy link
Member

Does the metadata really have to be removed from the .dll.rlib before it can be passed to the linker?

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 14, 2017

I'll add more of a description in the comments, but for anyone reading, the new .dll.rlibs are to .dll.libs as .rlibs are to .libs (ie. they are the same import library generated for the dll, but contain an additional object storing the rust metadata).

WRT the back and forth between .dll.lib and .dll.rlib: when producing an import library, there doesn't appear to be a way to include the metadata as part of the build, so the .dll.lib -> .dll.rlib conversion step is necessary.

For the reverse direction, I tested both with and without the code to remove the metadata prior to linking. While not removing the metadata seemed to work™️, I wasn't confident enough that there wasn't some edge case where it would fail, say under different linker flags. It's also what we do for rlibs, and I don't know what's different here that would mean that rlibs must have their metadata removed before linking, while import libraries need not?

@vadimcn
Copy link
Contributor

vadimcn commented Feb 15, 2017

Sounds like we are doing a bunch of unnecessary I/O every time we are linking. All for the sake of not having an extra file around?

@alexcrichton: IMHO, we should just bite the bullet and move metadata to a sidecar file for all library formats on all platforms.

@retep998
Copy link
Member

@Diggsey If not removing the metadata before linking works, and the resulting binary doesn't suddenly become bigger, then we should definitely not waste time removing the metadata until someone can demonstrate an edge case where it does fail.

@alexcrichton
Copy link
Member

@Diggsey I'm not sure what you mean about rlibs, we don't store metadata as an object file in there? We take it out manually when we pass -Wl,--whole-archive to the linker (which we only do some of the time) because otherwise ld will error on the unknown format.

@vadimcn do you feel that should block this PR? Or happen in parallel?

@vadimcn
Copy link
Contributor

vadimcn commented Feb 15, 2017

@alexcrichton: I am not super-opposed to landing this, but opposed nevertheless - on the grounds that every special case we add becomes an extra bit maintenance burden later. It is already hard to follow all the special cases we have in link.rs

@alexcrichton
Copy link
Member

@vadimcn yeah that's true. @Diggsey can you elaborate on the technical rationale for doing this on Windows?

@vadimcn I think we'd want to consider such a change carefully though as it'd be changing the binary layout of other platforms which is susceptible to cause breakage. Whereas with windows there's already an extra file to place metadata into it seems there's a much lower risk of breaking things (although I'd prefer to avoid changing the name if possible)

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 18, 2017

@alexcrichton

I'm not sure what you mean about rlibs, we don't store metadata as an object file in there? We take it out manually when we pass -Wl,--whole-archive to the linker (which we only do some of the time) because otherwise ld will error on the unknown format.

I simply meant that the metadata was stored as an additional file within the archive, not that it was a valid object file.

I think my confusion stems from the fact that there is no --whole-archive option used on MSVC, and yet we still remove the metadata from any rlibs when producing a dylib. If it's not necessary to remove the metadata, then we shouldn't need to remove it when producing a dylib either.

It might be better if the Linker trait operated at a slightly higher level, because at the moment it's providing a very leaky abstraction.

can you elaborate on the technical rationale for doing this on Windows?

I explained this in the RFC, but to summarise:

  • Space saving when distributing rust applications, including the rust compiler itself (eg. the libstd DLL is almost entirely metadata)
  • Metadata is only needed at build time
  • The MSVC toolchain already has a file for storing metadata about a DLL that's only needed at build time: an import library.
  • Rustc already knows how to embed/extract metadata into/from a static library.

@alexcrichton
Copy link
Member

I agree with @vadimcn that more platform-specific code means more platform-specific code, introducing more vectors for bugs and such. I also agree though that this is the "proper" solution to the problem on MSVC, however. I don't think the technical benefits here are really all that much, almost no one is shipping Rust dylibs much less on MSVC.

I'd still personally be ok landing this PR, however. (with comments addressed)

@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 21, 2017

Cool, I'll make the changes ASAP.

(Also, you are shipping rust dylibs on MSVC 😉 )

@Diggsey Diggsey force-pushed the metadata-in-import-libs branch from d84efba to 43a5561 Compare February 22, 2017 00:42
Remove metadata objects from rlib import libraries

Fix tidy issues

Don't try to remove metadata before linking

Undo change to dylib linking
@Diggsey Diggsey force-pushed the metadata-in-import-libs branch from 43a5561 to 9a58016 Compare February 22, 2017 01:17
@Diggsey
Copy link
Contributor Author

Diggsey commented Feb 22, 2017

Updated and squashed

@alexcrichton
Copy link
Member

I'm a little unsure about how to proceed with this PR. I agree with @vadimcn that all the extra casing in an already very complicated backend isn't great. The technical benefits seem slim to only improving file size of the dll itself, but my guess is that this PR as-is actually inflates the size of the rustc distribution at least (due to metadata being stored uncompressed in import libraries.

I think it'd be a slam dunk, though, if we had some concrete numbers showing something like the compiler loads X% faster or we save Y bytes in the distribution. @Diggsey you wouldn't happen to have any of those numbers would you?

@retep998
Copy link
Member

retep998 commented Mar 2, 2017

due to metadata being stored uncompressed in import libraries.

If metadata is stored compressed in the dll originally, is there any reason we can't store it compressed in the import library?

@alexcrichton
Copy link
Member

Not particularly, no. It's not what this PR does, nor is it what we do on any other platform. I don't think it would help the "this is just adding more special cases" counterargument as well.

@Diggsey
Copy link
Contributor Author

Diggsey commented Mar 11, 2017

@alexcrichton Here is a list of DLL sizes and the changes:

Name                              Before    After Saved
----                              ------    ----- -----
arena-*.dll                       103424    83456   19%
flate-*.dll                       149504   143360    4%
fmt_macros-*.dll                  120320   103936   14%
getopts-*.dll                     178176   139776   22%
graphviz-*.dll                    136192   102912   24%
log-*.dll                         138752   118272   15%
proc_macro_plugin-*.dll           340992   331264    3%
proc_macro-*.dll                  135168   125952    7%
rustc_back-*.dll                  717312   643584   10%
rustc_borrowck-*.dll              969728   714240   26%
rustc_const_eval-*.dll            556544   446464   20%
rustc_const_math-*.dll            200192   131072   35%
rustc_data_structures-*.dll       392704   131072   67%
rustc_driver-*.dll               2099200  1955328    7%
rustc_errors-*.dll                366080   260608   29%
rustc_incremental-*.dll           830464   648704   22%
rustc_lint-*.dll                  368128   325120   12%
rustc_llvm-*.dll                30444544 30369280    0%
rustc_metadata-*.dll             1357312  1097728   19%
rustc_mir-*.dll                  1043456   802304   23%
rustc_passes-*.dll                401920   361472   10%
rustc_platform_intrinsics-*.dll   354816   314368   11%
rustc_plugin-*.dll                138752   129536    7%
rustc_privacy-*.dll               202752   188416    7%
rustc_resolve-*.dll              1125888   960512   15%
rustc_save_analysis-*.dll        1490432   988160   34%
rustc_trans-*.dll                2161152  1667584   23%
rustc_typeck-*.dll               2527744  1928704   24%
rustc-*.dll                      7609856  3973632   48%
rustdoc-*.dll                    3517440  2946048   16%
serialize-*.dll                   600064   244224   59%
std-*.dll                        2287104   880640   61%
syntax_ext-*.dll                 1174528  1034240   12%
syntax_pos-*.dll                  178688   113152   37%
syntax-*.dll                     5542400  3714048   33%
term-*.dll                        286720   244224   15%
test-*.dll                        408576   317440   22%
-------------------------------------------------------
Total                           70657024 58680832   17%

Of these DLLs, the ones that exist in /bin correspond directly to space savings, since the import libraries won't be distributed. For those in rustlib/<target>/lib, we have to distribute the import libraries too, so there will be an overall size increase for that specific folder, due to import libraries seemingly storing the metadata in a less efficient form:

   Before       After   Gained
101226956   121319960      20%

Overall, the losses and gains balance out, and there is almost no change to the overall rust distribution size. However, there are real savings for programs built with rust which use dynamic linking. Even the smallest dynamically linked rust program saves ~1.4mb due to the reduced size of std-*.dll, which must be distributed with the main binary.

@alexcrichton
Copy link
Member

Ok but to clarify, we don't know of any programs using dynamic linking? (it's barely supported by Cargo/rustc).

Also to clarify, what do you mean that a dynamically linked program will save memory? The contents of the dll don't go into the executable? Do you mean the metadata sections don't need to get loaded into memory?

@Diggsey
Copy link
Contributor Author

Diggsey commented Mar 11, 2017

I didn't mention memory? However, now that you bring it up, the DLLs being smaller means that they should use less memory when loaded.

As far as programs using dynamic linking: the RLS, clippy, and rust itself use dynamic linking, not because they rely on a plugin architecture (which AFAIK is the only benefit of dynamic linking not well supported by rust atm) but because there are multiple binaries with common dependencies, and so dynamic linking has the potential to save space.

Most programs (unlike rust) are not compilers, and so would not need to distribute the import libraries.

@alexcrichton
Copy link
Member

Sorry I meant that you claimed a rust program save 1.4mb, but I don't know where this is coming from.

None of those programs distribute dlls though, so I don't think there's any gains to be won there?

It sounds like this change as-is does:

  • Increases the size of the distribution
  • May decrease the memory footprint of a process (this isn't measured one way or another)
  • Adds more special casing to the backend

Overall I'm not particularly in favor of this given these numbers unfortunately.

@alexcrichton
Copy link
Member

ping @Diggsey, any thoughts on my last comment?

@Diggsey
Copy link
Contributor Author

Diggsey commented Mar 27, 2017

None of those programs distribute dlls though, so I don't think there's any gains to be won there?

If clippy were to ship in binary form, it should be distributing those DLLs. RLS doesn't need to because it's going to be shipped as part of rust.

Overall, I think the biggest point in favour of this is libstd. People are going to be dynamically linking their rust programs as support for it improves, and libstd, which would need to be distributed in every case, sees the single biggest space saving with these changes.

@alexcrichton
Copy link
Member

But none of the reflects reality? The one user shipping dlls, the compiler, is hurt by this change?

@alexcrichton
Copy link
Member

none of those*

@Diggsey
Copy link
Contributor Author

Diggsey commented Mar 28, 2017

Not to my knowledge, although I wouldn't necessarily know - for example, do either servo or firefox dynamically link their rust components?

Based on comparable C/C++ applications, distributing DLLs is exceedingly common. However, if you think that rust is sufficiently far from reaching that point that it's not worth optimizing for yet, then I'll close this.

@bors
Copy link
Contributor

bors commented Mar 29, 2017

☔ The latest upstream changes (presumably #40867) made this pull request unmergeable. Please resolve the merge conflicts.

@nrc
Copy link
Member

nrc commented Apr 5, 2017

triage: we discussed this in the tools meeting. It sounds like the benefits of this PR will not be realised until we have better dynamic library support and it adds complexity to an already complex part of the code base. Closing for now, potentially worth revisiting if/when we distributing DLLs is more common-place.

@michaelwoerister suggested that an alternative that works today is making a cdylib where we already strip metadata. This works even with Rust code, as long as you only need the C API.

@nrc nrc closed this Apr 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants