-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop exporting every symbol #37530
Comments
If |
@retep998 Adding cdylib seems to have no effect on my linux system (although admittedly I had no idea this distinction was made :] ) I thought perhaps it needed at least an unmangled symbol to "kick" in, but this doesn't seem to help either.
|
@m4b that's likely a bug in cdylibs if it's doing the wrong thing here. The intention of cdylibs are to only export a C interface, which in the past I've inspected via |
wasn't aware of
This does not affect the importability of |
This may be related to #32887 as well. I don't really understand most of the terms being mentioned there and how they relate to each other personally, but patches are of course always welcome! |
I think that is related to another issue I saw with rust internal functions being assigned JUMP_SLOT relocations, which is very unusual, but didn't raise an issue (yet). I will read the above issue more thoroughly to see if it's related, but I think your suspicion that they're related might be correct. If the internal functions are given PLT entries, (and hence the JUMP_SLOT relocations), this would probably qualifiy them to be pushed into the dynamic, exported symbols array by the static linker :/ |
MotivationSo. I think we can fix this (and it is something that eventually needs fixing). To illustrate what I think we're on target for, the snappy shared object library is a good comparison because:
The resulting ELF ABI exactly parallels the two API headers, i.e.:
On my distro, this binary is stripped, and AlgoAnyway, I've found a few places for attack, potential candidates being mostly in So I'm not really familiar with the rust compiler internals at all and all the if the crate is the base crate (i.e., the final artifact we're compiling) && it is a cdylib then: I don't know enough about rustc internals to easily write "is the base crate" (i assume it's crate num zero, but I don't see it documented anywhere), or whether currently there is such a distinction. I also don't see an easy way to write "is defined in base crate", as the first link I gave just ends up slurping all the symbols into a vector of reachable symbols, which is just an array of strs, but I'm sure it's somewhere; probably in one of those context references ;) The algo might seem a little strange, but modulo me messing up booleans (which is likely), I think it's correct. Exposing a C API should not re-export all public or even extern symbols from either dynamic or statically linked dependencies. It should only export public extern definitions which we ourselves have created; hence, only public, extern symbols defined in our crate/lib. This is what The only other thing to watch out for will be the (also, on cursory examination, this appears to not be working either? passing Anyway, yea I think it's doable, with some kind of riffing on the above algorithm. P.S. I suspect this will remove every single one of those JUMP_SLOT relocations for the pub symbols coming from dependencies which are statically linked in, since they're now hidden, not in the dynamic array (or any symbol table), and so the linker should be free to optimize. |
cc me |
@m4b AFAIK yeah dealing with symbol visibility (particularly hidden) is the way to tackle this. The only real hard case I think is what to do when one of the output artifacts is an rlib. If all we're producing is a cdylib, dylib, executable, or staticlib, then we can make as many symbols hidden as possible as they're all private impl details. If we're producing an rlib, however, then this rlib will become one of a few things:
We unfortunately don't know which, we we'd have to pessimistically assume that it could become a public dependency of a dylib, causing the symbols to not be hidden... I may not be understanding what hidden means though. We should really just play around with the options here and see what happens. If the bootstrap and tests succeed then whatever system we have should work fine. |
FYI: I've implemented what is probably a small improvement in the right direction: It makes sure that symbols get hidden if the only reason they are externally visible was that they are shared between object files of the same crate. You can give it a try if you build that branch. I haven't really thought hard about all of this though. One thing to note though: We might be switching to MIR-only rlibs at some point, which would make all of this a lot easier, I suspect. |
@michaelwoerister Your patch looks like a good start! Unfortunately it does not seem to affect symbol visibility? I compiled several binaries with your changes, and the only difference I noticed was that rust static libs had PC32 relocations instead PLT32 irrc. However, strangely enough, I noticed when passing the
Strangely though, a few functions were exported and I'm not sure why (i.e.,
Regardless, whatever P.S., I suspect most of size that disappeared here was simply the dynamic string table massively shrinking since the symbols are no longer exported (another reason to get this resolved correctly):
|
I just read up on this a little and -- if LLVM IR behaves like C here, which is likely -- we'll also need to mark the declarations as |
OK, so I pushed an update to the branch from above. Now the compiler should do the right thing when compiling with multiple codegen units. |
@michaelwoerister so I'm a little confused, why aren't all It just seems to me Rust should have chosen the keyword "export", or whatever, and that was that - it would be exported. This is a very well defined notion in all 3 binary formats. It's what wasm did, etc. |
With monomorphization, you can easily get dynamic links even to private items. That is, some things can't be instantiated at all until their type is fully resolved, and those might still reference internal details from their original crate. |
I don't really follow. I assume the compiler has full knowledge of every resolved symbol when it's compiling, otherwise it's an undefined reference. Again, either the symbol is marked exported (in which case it's exported in the very defined manner in every binary format) or it isn't. I just don't see a problem here. |
For a silly example: fn get_random_number() -> usize {
4 /* guaranteed by xkcd 221 */
}
pub fn foo<T>(value: T) -> (T, usize) {
(value, get_random_number())
} This It would be very annoying if you had to manually mark every such reachable item as exported, especially when the compiler can figure this out for you. |
So, looking at In both cases, all information is known, all symbols are resolved, all types have instances and their corresponding code. If it's being compiled into an executable, and since the executable didn't define that function, it isn't exported. And that's that. (really you shouldn't export anything in executables, but whatever). If it's being compiled into a library, and the library doesn't mark it as "exported" (or Yes maybe the intermediate object file "exports" it (but there isn't a dynamic symbol table in object files anyway), but that is irrelevant, since the final target doesn't export, so it isn't exported. (and yea, I agree it would be annoying to mark every reachable item as exported, but I don't think marking which functions you want exports would require this - it is as you say, something the compiler can - and should - figure out) |
@cuviper So, I think I see what you're getting at, but I think we have two different perspectives. Correct me if I'm wrong, but you're essentially asking what does it mean to "export" a rust generic function? And the first problem with this is that, imagining it, for example being used dynamically, it may have to expose the helper function to any consumers of the generic symbol in order for them to complete the instantiation of the generic symbol in their code? (or something like that :P) |
Right, I think we're almost in agreement. :) Suppose we're compiling my example as a shared library AIUI, determining that minimal set of reachable items (and hiding the rest) is exactly what @michaelwoerister is working on. |
Yea ok, wow. 💥 |
From my point of view there are 3 concepts at play here that partially overlap:
It's not entirely clear whether the exporting part should be inferred from the other two. In GCC you need an explicit annotation to control this:
That last option doesn't sound too bad to me but one might still want to explicitly control what's exported and what isn't. A |
To recap what the intentions have been in the past:
That is, we've attempted to infer from syntactic visibility and |
OK, I can see that there are a few bugs in the implementation then. I'll update my branch to fix those. |
I've pushed a new version of the compiler to https://github.com/michaelwoerister/rust/tree/symbol-visibility. This version should now do the right thing for cdylibs (i.e. hide anything that isn't marked with |
Unfortunately this doesn't seem to have any effect for me, unless I'm not doing it correctly. How are you testing/checking for whether the symbols are exported? Might I suggest we agree upon either:
For a cdylib with a single exported symbol (ie no_mangle, extern C fn), I would expect no more than 20 symbols (being conservative, really there should be 1 but whatever), not the 2k we're seeing now. After building, I basically copy all the std stage two artifacts to the stage 2 rustc build directory, copy the tools, create a simple file with an exported symbol and then do:
And check the resulting exports, which is still 2k+ Are you seeing different results (with one of the above method for checking exports)? |
So executables on x86 with unpatched compiler seem fine actually (likely because default for executable is everything is private). This includes no unnecessary plt relocations. Been messing with arm and aarch64 and they also seem fine with unpatched compiler. |
Well, "just" fixing cdylibs and staticlibs is good too I have a pretty cleaned up version of my fixes here: Just needs a test case or two, then I'll make a PR. |
Improve symbol visibility handling for dynamic libraries. This will hopefully fix issue #37530 and maybe also #32887. I'm relying on @m4b to post some numbers on how awesome the improvement for cdylibs is `:)` cc @rust-lang/compiler @rust-lang/tools @cuviper @froydnj r? @alexcrichton
I think this is resolved for cdylibs so closing, feel free to reopen if desired for some reason. |
I think this is even more relevant now with |
Hi, seems that I still face the problem. I use It is Thanks! |
Lower case types like |
@bjorn3 Thank you! Let me have an experiment. |
@bjorn3 Yes, it does work. Thanks again! |
@dcow Sure. Simply call (Indeed originally I called that, but not sure why the pipeline change later omits that strip call. - but that is unrelated to this question) |
Hello, There is one case where having all symbols being dynamic makes sense for cdylib. Some C programs use libc's backtrace_symbol() to generate a backtrace. However backtrace_symbol only reads dynamic (global) symbols and not local ones, in an attempt to contain the ELF handling logic in libc (ref: https://bugzilla.redhat.com/show_bug.cgi?id=169017#c1). So this present issue effectively removes backtrace_symbol()'s ability to provide symbols for frames in the Rust .so file (the cdylib). As an alternative, I tried to build the crate as dylib, which keeps all symbols dynamic. However it then fails to be loaded from C (mangling? specific linker options? my Rust .so itself depends on a C .so). Question: is there an option for cdylib to preserve the old behavior, having all symbols dumped in the dynamic symbol table? |
Even with the old behavior we didn't include all symbols in the dynamic symbol table. Many have to be made private to prevent symbol conflicts. That includes all In any case backtrace_symbols isn't all that reliable even if all symbols are in the dynamic symbol table. It requires frame pointers to be preserved (not preserved by default on most targets for rustc) rather than walking the stack using the information stored in the Maybe you could export a function from the rust cdylib doing |
That's an interesting take. Will try it out to see how that could work. Thank you @bjorn3 ! |
This problem still exists if you link rust object files into a macos framework. It seems related to #73295 and this is also for chromium, but the problem I'm seeing is that all rust symbols are exported in the framework and we don't want any of them to be (we only want to export the symbols for the framework methods we are making available). |
@danakj cxx symbols are usually the first ones that show up in the error list (and then truncated there due to the number of errors), but there are tens of thousands of them including rust core lib functions that all get exported. The static library we used to generate from cargo had tens of thousands at least. See |
Yeah, ok you're right, we're seeing different things on Windows vs Mac. On Windows, it is just the What I don't understand is why Windows doesn't end up with all the Rust symbols exported but Linux/Mac do. |
I believe that's what this issue is, but this was mainly targeted at dynamic libs. It doesn't seem like a major issue for Linux because they are presumably removed when stripping the exe? The problem is with something like macos framework (or other dynamic lib) that needs to export some symbols, but doesn't want to export all symbols |
On Linux, a typical rust ELF dynamic library will export every symbol on earth, e.g.:
which looks like most of
core
andstd
, e.g.:Why not do this? Well, it's just The Right Thing To Do ™️ but also, technically it unnecessarily increases the size of the binary by adding dynamic symbol entries + string table entires, in addition to bloom filter words.
It might slightly increase dynamic loading time but I'm probably lying now to increase bullet points for why we shouldn't export all the symbols.
I suspect this may have come up already but I can't seem to find it, but I recall perhaps it had something to do with a requirement of std to make all their symbols public and this is somehow reflected in the binary? or perhaps it was something to do with more 3rd party linker nonsense.
Anyway, if there's a technical possibility to stop it, we should try and be good binary citizens.
The text was updated successfully, but these errors were encountered: