-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reloading a dylib stops working when moving from 1.19.0 to 1.20.0 #44365
Comments
Dylibs, like rlibs, have a completely unstable ABI. Among countless other problems, the compiler is free to inline code from a dylib into crates using the dylib, and will in fact do this for A reliable way to do this (insofar any hot-reloading can be reliable) would be using the cdylib crate type and interact with with the shared object through a C interface with a stable ABI. That is, declare (You can share type definitions if you structure your code accordingly, and the function bindings might be easier to generate automatically than with C.) |
@rkruppe I believe I've followed your suggestion in this branch, (diff here), please let me know if I missed anything! The code in that branch doesn't reload the library on 1.20.0 or 1.190. Oddly, if I don't change In both of those branches, the code compiles and runs on 1.19.0 and 1.20.0 just fine, the reloading just doesn't work except in the 1.19.0 plain dylib case. |
One detail that's missing in your code is that you still get a Reading over the diff, I wonder whether Edit: Indeed something's fishy with the way you use the cdylib. I'm surprised that having a dependency on a cdylib crate works at all! A cdylib is more like a staticlib or executable in that it's a finished artifact for consumption by the outside world, rather than something specifically for other Rust crates to consume. |
I ran When you talk about "having a dependency on a cdylib crate" are you specifically referring to these lines? This might be clear already, but my intention is to do this live reloading stuff only in debug mode and to have state_manipulation be a normal crate in release mode. In order to do this I need to comment out the cylib/dylib line in state_manipulation's Cargo.toml before building in release mode. I build in release mode infrequently enough that this is no big deal. I had so far in this issue only been running the code in debug mode, but I tried it in release mode and after fixing |
I am referring to the fact that (I'm booted into Windows at the moment so I can't try to reproduce the actual error (reloading failing to have an effect) in debug mode. The fact that |
That's the error I get if I don't comment out the The version of the I think these debug mode vs release mode differences might be making things more complicated than necessary. So I made another branch from the Just in case it might be relevant, have you noticed these lines in the root Cargo.toml?
I haven't used workspaces besides this so I'm not sure of the full effects of that |
I decided to see if I could get reloading working at all on windows. If I try to just build while the program is running I get an error saying the build process cannot remove the
I tried that procedure on windows, on a fresh download of the simplified C_ABI branch, (note, without changing cdylib to dylib,) using both 1.19 and 1.20.0. In both cases, it worked; the increment about became 1000. Performing the same procedure on Linux, (using [1]
|
I investigated some more (on a linux machine). I can reproduce the behavior you describe, but haven't been able to identify the cause. I'm at my wit's end: with both dylib and cdylib, the right .so file changes and contains the code I'd expect, and the file change is correctly noticed and the library is reloaded, yet results differ. Maybe someone else can diagnose this? |
I searched back through the nightlies and found that if I set the Rust version with This works using using this branch where |
Based on the commit hashes that are printed when I run |
This issue is currently on the 4th page of the issues list, so I doubt someone who could quickly make further progress will just find it. @rkruppe You probably know better than I, who would be most able to help. Can you ask someone else to have a look at this bug? @ mentioning everyone involved in that list of commits seems like overkill. |
#42899 and #42727 seems like the most likely candidates, though only because they're the only things in the list that look like they touch linking at all. So: cc @alexcrichton |
Hm there's a lot of discussion here and anything dealing with loading/unloading dylibs is absolutely fraught with unsafety, is there a distillation of the problem at this point? |
No idea what the cause might be, but there is a reasonably small reproduction: https://github.com/Ryan1729/tiny-live-code-example/tree/3636b8e29af3e141c12f4e00514b34840f99c12e The problem that occurs on Linux with
@Ryan1729 I have noticed that this reproduction could be even smaller, though. For example, you could remove |
I've removed |
Is this a rust bug? Testing https://github.com/Ryan1729/tiny-live-code-example/tree/f7b1b07801866db6de3a525f6ec2ebb1cac05549 I get the same behavior on 1.19.0 and 1.20.0, which is that the changes don't seem to make a difference when the library is reloaded. Additionally I saw via |
If |
Ok that does reproduce for me, but this is pretty much super uncharted territory. The |
Is there a less uncharted territory way to achieve the same effect, that is, swapping in new behaviour at runtime? For my use case I'm fine with using or not using a C ABI. Also, given it didn't cause problems I'd be fine with an extra copy of the standard library, since I'm only using the reloading during development, (the first version of the example program only reloaded in debug mode and used |
Maybe? I'm not particularly well versed in dynamic libraries. If you're mixing two Rust programs together via dynamic libraries, though, the one being |
I've written a version of the example that performs the rename-then-copy mentioned above and exits with a zero exit code on 1.19.0 and a non-zero exit code on 1.20.0. It's on this branch. Hopefully that's useful for |
I've taken that automatic rename-then-copy version and ran It reported the following:
That's the final commit in the range, and it's only a change to a CI script. |
After a few false starts I've ran
which points to the problem being in #42727. |
One thing I've noticed is that the symbols for allocation are exported from cdylibs when they shouldn't be (they should be internal symbols). That may fix the bug here, but it also may not. |
Would the code that exports those symbols be in #42727? If so, can you point me in its general direction? 115 changed files is a lot to search through. |
Oh sure! So right now you can see for yourself:
All of the |
Here's the symbols that are exported after compiling the automated branch with the compiler at Using cdylib
Using plain dylib
|
Yeah for the cdylib case the |
I changed these lines to the following let export_level = if special_runtime_crate {
// We can probably do better here by just ensuring that
// it has hidden visibility rather than public
// visibility, as this is primarily here to ensure it's
// not stripped during LTO.
//
// In general though we won't link right if these
// symbols are stripped, and LTO currently strips them.
if &*name == "rust_eh_personality" ||
&*name == "rust_eh_register_frames" ||
&*name == "rust_eh_unregister_frames" {
SymbolExportLevel::C
} else {
SymbolExportLevel::Rust
}
} else if (&*name).starts_with("__rdl_") ||
(&*name).starts_with("__rust_") {
SymbolExportLevel::Rust
} else {
export_level(scx, def_id)
}; then recompiled the automated branch. If I set the two inner crates to be cdylibs then those symbols are removed
but the reloading still doesn't work. If I set the crates to be dylibs then the reloading still doesn't work, but the symbols are still exported, that is, I still see the following results from
Should that code change have caused the dylib symbols to be internal? Or are those symbols handled separately? |
I checked out 4c225c4, the last commit the reloading works on, and I edited In the dylib case the reloading still works and I get this result from
Which is the same result I get if I use the unaltered version. |
Ah then I'm not sure what's going on :( |
I don't understand what's going on either, but I've managed to cause the symptoms with this 42 line diff. You can checkout 4c225c4 and note that using that version to compile this version of the test results in an executable that reloads dynamic libraries without an issue. Then you can apply that diff with |
I'm not sure why the use of liballoc_system affects anything, but it's generally a bad idea to rely on dlclose actually fully unloading a library. Instead, I strongly suggest using a different name for each version of the library. Not just the filename on disk should differ, but also the path embedded in the library file itself, if any: at least on macOS, such a path is always included, and defaults to the path the library is written to. |
@comex I have now done some reading about I will try your suggestion. Having to deal with multiple copies of the library on disk seems like it will be annoying, but if I can get it to work it sounds better than being stuck on |
On Linux, you can use |
@jethrogb Interesting. I think I would like the library reloading to be portable, and it's nice to be able to let libloading deal with the platform specific portions. But it's nice to know that |
Since the POSIX standard (currently?) doesn't provide a way to reload dynamic libraries without the workaround, I suppose this issue can be closed. |
I'm also dealing with the issue for live reloading in games (I wrote live-reload), so it's good to know that this is going on and it's not something I changed. |
I have been writing programs which check if a dynamic library's modification time has changed and reload it if so. This allows me to interact with the program, producing transient state but then make changes to the dynamic libraries code and load it to see the result of the changes on the transient state without closing the program.
This feature of my programs has stopped working since I updated to 1.20.0. If I switch back to 1.19.0 with
rustup default 1.19.0
then it works again. Switching back withrustup default stable
breaks it again.I've made a small example program that does the lib reloading which you can see here. On 1.19.0 if I launch the program and change, say, this line to something like
state.counter += 1000;
and rebuild the dynamic library, (or just the whole crate,) the running version of the program will eventually notice the different modification time and start incrementing the counter by 1000 instead of 1. On 1.20.0 this no longer works. On this branch in particular you can see that the new modification time is noticed and the library reloading code is run but it doesn't do anything.Looking through the patch notes for 1.20.0 I noticed that
ManuallyDrop
had been stabilized. So I tried using it in this branch but it didn't cause any (noticeable) change in behaviour, (besides compile errors on 1.19.0 of course.)The text was updated successfully, but these errors were encountered: