-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to distribute THREAD_MODEL=posix
builds?
#326
Comments
cc: @mingqiusun, @penzn |
I would suggest that, at least initially you just build a completely separate sysroot. e.g. In he long run we could come up with some kind of scheme where clang you select the library subdirectory within the sysroot differently, just like it does for architecture. e.g. But I think two different sysroots should for a the time being. |
Here's what we discussed today: wasi-libc would get distributed with a new subdirectory |
I think we discussed not a separate sysroot just a new directory inside the existing sysroot (e.g. The completely separate sysroot is what we could use to try this out without first landing the clang patch. |
Oh, ok, then I was misunderstanding what was said. So the contents of |
Basically yes, although crt1.o tends to be handled separately ... it might be special handling. Take a look a the current clang driver code for now it picks the lib directly along with how it picks crt1.o. |
It can also be treated as a 'feature' of the target, and maybe triple can become |
Sure, if you think we can shoehorn it into the tipple that would great! |
That might be a bit tricky tough... what happens if you build with |
I guess when the triple gets expanded from what the user specified on the command line and the missing components are added we could inject the |
Probably, I'll look at the driver and see what would be easier. |
One thing that may be worth pointing out here from a Rust-perspective is that Rust can probably only really support this if the string is a brand new target. Rust additionally has precompiled artifacts (the Rust standard library) and I don't think that infrastructurally it's really possible to change the Rust compiler and distribution infrastructure for just the wasm target to have two sysroots in the similar manner that's being proposed for C here. Personally I have always hoped to avoid having a threaded and non-threaded target for all wasm targets but at the current point in time I think it's the best solution at least for Rust. I'm sure, though, that whatever naming scheme is used for a C target (if that's settled on) will want to be mirrored in Rust as well. |
To add to what Alex said here, adding a new triple ends up having a lot of administrative overhead. I'd like to second the above ideas of having threading-enabled artifacts to be available in the same sysroot, under the same target triple, and teaching the clang and rust drivers how to find them, so that we don't need a whole separate target. On a tangential note, I expect I'll have a hard time remembering the name "mt" stands for; could we name the artifacts something with "thread" in the name? |
To clarify, though, as far as I know that's not a viable strategy for Rust (using the same target with separate sysroots). Implementing that for Rust would involve a fair amount of change to the release process, building, etc, that's all wasm-specific and doesn't have any precedent. I would be quite skeptical that unless someone is already signed up to do the work that this would never get done because it's too different from what's already there. Put another way, if a new target isn't made I don't think that Rust will, in any near-to-mid timeframe, gain support for precompiled artifacts with multithreading. Broadly speaking I do personally care about Rust having good multithreading support for wasm, but not to the point that I can justify refactoring Rust's own distribution mechanism to accomodate this use case. I very much agree that adding new targets is a very big hammer but that's what I mean by I don't see a better solution at this time. Rust has an intermediate solution, though, that C doesn't where it's easier, at least on the nightly Rust channel, to compile the standard library from source with atomics/threads enabled. |
I'm not advocating for separate sysroots. I'm advocating for a single sysroot that has muliple libc.a's, etc., at different paths within the sysroot, and having the Rust driver know how to select the ones it needs. I haven't looked into it in detail, but I'm imagining this is something that doens't require major refactoring. |
Thanks for clarification, @alexcrichton! In my opinion, adding a new target is not that big of a deal. I personally lean towards that, but it will ultimately depend on how easy it is to implement that. To recap the three options so far were:
We can probably treat the middle one as the least useful, given the Rust situation. @alexcrichton would Rust be able to pick different versions of standard library based on threads being enabled or disabled? (Edit) As an aside, the way wasm target handles features is a different from what native toolchain does, there you don't have to recompile entire libc with different flags to allow it to be used in multi-threaded builds. |
I'm not aware of any precedent in Rust targets to enable picking different libraries based on features. Rust is also a much different ecosystem for distributing the standard library than C, and Rust further does not have any precedent to draw on for distributing "two targets in one" which is effectively what this is. The first might be easy-ish to solve with a "oh let's just add a few The closest precedent I can think of is the "soft float" and "hard float" targets that Rust has. I believe there's
In general Rust is not nearly as "loose" as C compilers with compiler options and target-specific internals. Most of Rust internals are shaped along the lines of "basically all targets look like this" so the are relatively few target-specific options unless every other target can be configured with the same options (e.g. target features are naturally target-specific but every target has some set of options). AFAIK there's generally not much appetite in rustc development for significant configurations for target-specific options unless there's significant motivation to do such. Given the nascent stage of multithreading for wasm (and wasm in general to an extent) I don't predict that it would be easy to land lots of wasm-specific bits inside of rustc at this time. |
To me this makes sense. Even from C compiler's point of view threaded and non-threaded builds would be different targets, you can't really mix them in one link line. This brings me expanding on the aside above, that the way wasm toolchain handles these features is a different from what native toolchain does, there you don't have to recompile libc or user objects with different flags to allow them to be used in multi-threaded builds. You only need to do it to the extent required to support multi-threading, but in wasm case linking would fail if not all of the objects have been built with thread support (as an extreme example, even the ones not accessing memory at all). I feel this is a bit too restrictive, but don't have any better idea about how to do this yet. Though from this perspective it is probably better to keep two separate targets for C as well. |
In terms of target features vs native toolchaisn I think it depends a bit on the feature. For example the precompiled wasi-libc doesn't use simd, but you're allowed to use simd in your own object files and have that all link together. Personally I view threads as a significant underlying change since, at least for the Rust standard library, all the internals need to change. The Overall I personally see that so much is different for a threaded build that it sort of justifies the existence of a separate target because the standard libraries are so significantly different internally. |
I was picturing something like If we do a new target, then it's |
and we can make that work, but it's not clear to me what the advantage is. |
For option (3) above: I assume that its up to the rust driver to decide how to construct the Wouldn't option (3) be mostly about adding a little complexity to the code that creates that linker flag? I imagine the complexity needed in the rust driver to handle this would be pretty much identical to that needed in the clang driver. |
@sunfishcode most of what you describe already works today. You can compile with A new target does not imply a new To reiterate again as well though, the complexity is not in building a In any case though I mostly wanted to bring up how avoidance of a new target string will cause significant complexities for Rust. Whether or not wasi-libc wants to take that into account is outside of my wheelhouse and there's no reason that C and Rust could have a different story. Personally I'd like them to be the same but that's outside my wheelhouse. |
This is somewhat tangential, but perhaps relevant to this discussion: Debian, and by extension Ubuntu, has been shipping wasi-libc (as well as libstd-rust-dev-wasm32) for some time now. I am a Debian Developer, but not the maintainer of these packages, nor have I been involved at all in packaging these two, or any other relevant packages like LLVM/Clang. However, I have been working recently on adding compiler-rt packages for wasm32/wasm64, as well as libc++ for wasm32-wasi, all generated by the llvm-toolchain-NN source packages (which are shipped in Debian, Ubuntu, and apt.llvm.org). See Debian Salsa MR#97 and Debian Salsa MR#103). I believe that this will alleviate the need for using a sysroot on these platforms, and, more broadly, the WASI SDK as a whole. In other words, once these land, it will be possible to just run Given the above, I'm interested in what the future holds for WASI, including how we could handle these new pthread builds.
Finally, this may be a very naive question but perhaps worth asking: in a future where a (p)threaded ecosystem exists and is stable, is there value in shipping the single-threaded one? In other words, given it's early days for WASI in Debian, would it make sense for Debian to just switch the "wasi-libc" package (and libc++ etc.) to a THREAD_MODEL=posix build, and not ship the "single" builds at all? This is all pretty new to me, so excuse my ignorance! Would love to hear your feedback and thoughts -- thanks! |
On the general approach of avoiding a sysroot: One big different between the normal "multi-lib" approach of Debian/Ubuntu, and wasi-sdk, is that wasi not linux. Its more like a true cross-compile to different OS/platform. Treating it as just another multi-lib might not make sense. Is there any precedence for non-linux targets in the "multi-lib" setup? For example, is that how you package the android SDK? (actually android is linux, so that isn't a good example, but still). One issue that might come up is that, normally in a multi-lib environment the |
Regarding the ongoing need to support both a threaded and non-threaded version the SDK, I think that will likely continue for several years. There are performance advantages to building specifically for single-thread wasm, even when you know you target supports threads. Regarding "shipping multiple variants of every package", surely that will be true whether you use a separate target or a sub directory with a target? Any package that contains a library file will need to have a separate package for you, right? The only unnecessary duplication that I can see is that libc headers, which in theory can be shared. If you are really don't want to ship multiple versions of the libc headers (becaused they happen to be the same for threaded and non-threaded SDKs) we could maybe have clang's default search path contains several entries from most to least specific. e.g.
That way if you want to have a single set of shared headers you could install them in However all of this assumes the header trees will be identical between flavors. This maybe true today, but I'm not sure we would want to guarantee this. As a hypothetical example, I can imagine for the non-threaded version of the SDK we may not want to include |
(This is a bit off-topic to the current issue, but it's super helpful to me and the insight is invaluable! Let me know if I should move this somewhere else, and where!)
The decision to place wasi-libc in /usr/include/wasm32-wasi was made by the wasi-libc maintainer when it was first introduced, seemingly in ~March 2020, so I can't really speak to it. To respond directly to your question, from what I can tell from a few Android packages, they are installed in /usr/include/android, and (e.g.) /usr/lib/x86_64-linux-gnu/android/. A good WASI-equivalent example however may actually be mingw, which seems to be placed under /usr/x86_64-w64-mingw32/include, /usr/x86_64-w64-mingw32/lib etc. So you are raising a good point!
This is actually right on the spot! So the LLVM WebAssembly driver really assumes a --sysroot argument is present, and would include paths such as (absolute) I went back and forth on whether |
Yes, I think we are getting a bit off topic here. Perhaps we should open an issue in the upstream llvm repo about if/when to include /include on the include path? On the specific issue of how to package the pthread flavor of wasi-sdk, I think we have most agreed at this point to make it a separate triple. This would mean that if you installed both flavors they would both contain (most likely) the same headers. What do you think of the idea of having clang search |
wasi-libc Debian maintainer here, I did this as it felt natural and to avoid conflict with regular
Generally by analogy with the FHS for MinGW's usage of If we ever have an entire OS written in wasm32-wasi, then sure the host compiler can look for wasm32-wasi include files in |
Where does this assertion come from? As far as I cant tell this is not currently the case. When cross compiling both gcc and clang seems add
(Note the So it seems that the /include and /usr/include are considered to be fallbacks to be searched after the arch-specific directories. I guess that might make sense when you are cross compiling to another architecure, but maybe not when you are cross compiling to an entirely different OS I guess? I think your clang patch has the right idea. It adds |
I meant this prescriptively, though I should probably have been more precise and said "triplet". The existing practise of "falling back" to IMO the behaviour should not be different between giving an explicit or an implicit sysroot, because this results in non-uniform and surprising behaviour and makes things awkward when e.g. scripting and passing through default values, which sometimes might need to be done explicitly. Another flag like To summarise, my order of preference for the behaviour is this:
BTW GCC apparently has separate |
I guess I don't quite see why we shouldn't assume that "sysroot-is-target" when I'm also not sure how easy it will be for clang to decide in host triplet is "compatible", perhaps just comparing some parts of the host and target triplets? I don't have particularly strong opinions, but it would seem a little redundant for sysroot builders (e.g. NDK, emscripten, wasi-sdk) to have |
I like the idea of just accepting this. It's advantageous for WASI to be able to "fit in" within existing cross-compilation filesystem layouts. |
I'm saying these two things are orthogonal and in general one should preserve orthogonality in flag behaviours. If you assume "sysroot-is-target", this should be done regardless of whether --sysroot was passed explicitly or a default value was chosen for it. This general principle of programming makes it easier to compose lower-level tools into higher-level tools, as the behaviours are more predictable. Take your example about sysroot builders, yes if they only focus on wasm32-wasi today it make seem "redundant" but if they extend their functionality or someone composes them with another tool, then it's not redundant and in fact it would be less messy.
I think you're confusing me with @paravoid , we are both Debian developers but I'm giving an alternative opinion here. I didn't write any patches for clang, but I do maintain rustc in Debian as well (where the question about include paths is moot). |
Regarding the duplication if headers, assuming we stick with our decision to treat a. Ship the headers twice, once under each target subdirectory (Any other options?) I imagine there is some precedent for (a) ? e.g. |
The wasi-sdk headers appear to be about 9 MiB installed. That's not free, but if it's significantly simpler to duplicate the headers than to figure out some kind of sharing configuration, I propose that we just duplicate it. |
Can we close this now that #331 has landed? |
Will #331 be shipped in the next version of wasi-sdk? I'm not too sure what the release process is but the original intent of this issue is that future versions of wasi-libc would have both targets. I think we can close this once that is figured out. |
Yes the next version should have two different target triples in the sysroot I think... although there maybe a little more work to make that happen still. |
the question of how we should distribute is kind of answered though: under a different target triple. |
I agree, but I'm concerned with solving the end problem: @sbc100 or @sunfishcode, what else needs to happen for the One other point, though: as I was writing up the proposal to add this new target to Rust, I began to wonder whether |
I'd be fine with calling the clang triple |
It interesting that using |
(earlier comment deleted) |
The current release process isn't very automated. I take the build artifacts from CI and manually upload them to the release. So to add more artifacts here, we just need to (a) make sure CI produces them, and (b) that whoever does the next release knows to upload those artifacts. |
Ok, I think the "how" is resolved at this point so we can close this. @yamt has WebAssembly/wasi-sdk#274 open to add |
When attempting to use
wasi-sdk
to compile a program that might use threads (or shared memories, etc.), the program will fail to link because the libc used has not been compiled with all of the right features (i.e.,--shared-memory
which relies on--features='atomics,bulk-memory'
):There should be some way to package up the output of a
make THREAD_MODEL=posix
build in the released binaries and helpfully link in the right things when, e.g.,-pthreads
or-mthread-model=posix
are used. Any thoughts on how to move forward on this?The text was updated successfully, but these errors were encountered: