Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make stdlibs use the artifact system #33973

Open
KristofferC opened this issue Nov 28, 2019 · 23 comments
Open

Make stdlibs use the artifact system #33973

KristofferC opened this issue Nov 28, 2019 · 23 comments
Assignees
Labels
artifacts stdlib Julia's standard library

Comments

@KristofferC
Copy link
Member

KristofferC commented Nov 28, 2019

It would be nice if the stdlibs started using the artifact system to declare what libraries they depend on and how to get them for the different platforms. That would make the stdlibs easier to move out from the julia repo and in cases where one doesn't want to bundle all stdlibs in a sysimage (e.g. in an "app") it would be clear what libraries can be excluded from bundling as well.

@KristofferC KristofferC added the stdlib Julia's standard library label Nov 28, 2019
@StefanKarpinski
Copy link
Member

@staticfloat, you seem like the prime candidate for this 😁

@staticfloat
Copy link
Member

So here's the thinking that Stefan and I have briefly discussed:

We should firm up some of the implicit laziness that the stdlibs have relied upon with respect to binary dependencies, and simultaneously use this as an opportunity to take a step towards decoupling stdlibs from the Julia build system both at build time and at run time.

Properties we want

  • Libraries used by Julia's stdlibs should be represented through JLL packages and Artifacts, so that the resolver knows about them. This should ease the difficulties we've had in the past with e.g. MbedTLS.jl needing to stay in strict lock-step with those shipped alongside Julia.
  • The stdlibs should remain performant with respect to code loading. Things baked into the system image should still be fast to load. This implicitly requires us to not upgrade packages within the system image.

Implementation strategy

To represent stdlib binary dependencies through JLL packages and Artifacts, Stefan and I think the best way is to start shipping a read-only depot with Julia that gets added on to the default list of depots, that contains all of our stdlibs, their JLL packages, and their Artifacts. This would clean out the majority of the libraries from <prefix>/lib/julia, and would instead rely on some hoops we jump through to load them from <prefix>/share/julia/stdlib/vX.Y.Z/artifacts/. It will be a fun challenge to make this work for everything including LLVM. Unsure if we can get there, but we'll give it a good shot. Once these stdlib packages are baked into the system image, we would have a list of things that the resolver shouldn't mess with, so that it doesn't accidentally install a new version of e.g. OpenBLAS_jll, which would just confuse everyone.

  • At build time, we need a flexible way to download Julia packages along with their dependencies and create a read-only depot from them. While I am a little shy of introducing Julia as a build-time dependency for Julia, that would be the easiest way to do this; we could run the resolver, say "we need LinearAlgebra, HTTP, SuiteSparse pinned at these versions, etc...." and that list of packages gets downloaded, instantiated into a depot, and is what is read out of in order to create our initial system image. We can of course generate the list of packages offline and just hardcode it to avoid the bootstrapping dependency. We're already doing this with Pkg, I intend to do it with everything.
  • At install time, we need to settle on a structure for how the system depot will be organized. This means we'll need to add to the default DEPOT_PATH (even though the Julia code is embedded within the system image, things like artifacts will still need to be found). I see we already have a <prefix>/share/julia/stdlib/vX.Y directory, perhaps the only change here is to version it by the patch version as well and add artifacts to it.
  • At Julia init time, we'll need to change how libraries are loaded. For most packages this will be trivial; we'll have them import JLL packages which will already be loaded (as they are part of the system image) and they'll bring their libraries along. For others, like loading LLVM, it would be great for libjulia to be able to find libLLVM from the artifacts directory, but we may continue to have some libraries that are more "special" than others.
  • At Pkg.add time, we'll need a flexible way for Pkg to know what is baked into the system image and shouldn't be touched. Similarly to how it has a list of stdlibs right now, we'll just need to add the JLL packages and whatnot to them. This might come for free if I'm reading this code right. I am particularly interested in how this mechanism can be integrated with PackageCompilerX.

Possible different strategies

  • Instead of installing a full julia package depot complete with artifacts, we could instead continue to ship libraries like we do now, and generate an Overrides.toml that points to the <prefix>/lib/julia directory for all the artifacts that we care about. This is almost 100% doable right now, although it does require a few alterations (JLL package artifacts expect libraries to be available at <artifact_dir>/lib/libfoo.so and Overrides.toml only allows you to override artifact_dir. Additionally, Overrides.toml files only accept absolute paths, so we'd need to allow relative paths within them.) I'm a little bearish on this since I think it would be better to have a fully-flexible stdlib selection pipeline; this would significantly close the gap between what we do to make a Julia package distribution versus what we tell people to do in order to make a Julia "app", which I think is a good thing.

@staticfloat
Copy link
Member

Thinking more about things like Julia needing to be able to find libLLVM at dynamic-link time, it will be sufficient on non-windows platforms to bake in RPATHs to look in $ORIGIN/$(datarootdir_rel)/stdlib/vX.Y.Z/artifacts/<LLVM_jll tree hash>. The only snag here is Windows; we can bake in a call to AddDllDirectory() within init.c, but that's a little unsatisfactory. The reason why I'm thinking about this is that I'd like to make it as straightforward as possible for us to truly have Julia use JLL packages for stdlibs, such that eventual rebuilds of Julia system images with newer versions of stdlibs can actually use their binaries in as natural a way as possible.

We also need a plan for dealing with from-source builds. Assuming that we still want from-source Julia builds to work, we're going to have to engage bandying about some benign falsehoods; when we build libopenblas, we'll have to bundle it up as a "fake" OpenBLAS_jll product. It's not too hard, we basically just install openblas like we always do (we're already careful to keep the OpenBLAS recipe in Yggdrasil as close as possible to the from-source build in JuliaLang/julia) but then slap it into the share/julia/stdlib/vX.Y.Z/artifacts/<tree hash> directory that we will know it needs to go into from parsing the OpenBLAS_jll/Artifacts.toml file.

Some pros/cons of what I've considered so far:

  • We get to keep the ability to build completely from source (still important, despite the disadvantages, IMO).
  • I get the joy of writing a TOML reader/writer in make/bash/python so as to avoid a Julia dependency.
  • I need to write a TOML writer so that I can modify the Artifact.toml files in e.g. OpenBLAS_jll. This is necessary so that when we compile a local OpenBLAS, the tree hash listed in the Artifact.toml matches the tree hash of the files on-disk. This isn't that important right now as we don't check if the files in a folder actually match, but we might in the future, and we want to avoid the (rare but possible) case of someone explicitly trying to update their OpenBLAS, and files not getting installed because the Pkg system thinks they already exist.
  • Because the JLL packages are installed to the stdlib folder, they're going to be considered stdlibs by Pkg (via the load_stdlib() function) so attempts to e.g. add OpenBLAS_jll are going to immediately return, and loading OpenBLAS_jll will always return the stdlib version, rather than any version the user may have endeavored to install.

@staticfloat
Copy link
Member

Digging into this over the past few days, I've come up with a few difficulties that may take some calm thinking to untangle properly:

First off, there's a philosophical decision to be made; do we want the actual binaries themselves to live in an $prefix/share/julia/artifacts/<tree hash>/lib/libfoo.so location, or do we want them to continue living in $prefix/lib/julia? For some binaries, it doesn't matter than much, but for others, it matters quite a bit.

Personally, I would like to push as much as possible for stdlibs and even the basic requirements for Julia (like LLVM, MPFR, GMP) to use artifacts. This is doable with enough scaffolding construction such that Julia can find things, but we need to answer if the necessary scaffolding is worthwhile:

  • For libjulia to load at all, it's going to need an RPATH embedded within it to find libLLVM
  • For libgit2 to find libssh2, we'll need to use the actual JLL package or something equivalent to build our own web of dynamic linker subversion. Otherwise, libgit2 sitting in one treehash directory won't find libssh2 sitting in some other treehash directory.
  • For JLL loading to work this early in the bootstrap process, we'll need to change how JLL packages work, to avoid a dependency on Pkg (or even other stdlibs, in the case of things like GMP and MPFR, since they're in Base). Either breaking their dependency on Pkg by splitting Pkg.Artifacts and Pkg.BinaryPlatforms out into something that can be loaded very early on, or by generating special JLL packages that don't know anything about Artifacts and re-implementing themselves more-or-less from scratch. Personally, I think this is not that bad of an option, but it would be some extra work.

Let's remind ourselves as to why we're doing this; with this kind of a system, it makes system image building much more modular and easy to understand; the distance between binaries users install and the binaries that ship with Julia shrinks. The resolver can see that LLVM_jll already exists on the user's machine and is of a particular version; attempts to Pkg.add("OpenBLAS_jll") naturally succeed immediately, as it's an stdlib, and using it is blazingly fast, as we would expect.

I don't have a concrete solution in mind yet, this is the third time I've written out this comment because I keep on experimenting with different things and finding new problems. The good news is that I have artifact downloading implemented in Make/Python, and putting JLL packages/artifacts into the share/julia folder works; but these bootstrapping issues are thorny.

@ViralBShah
Copy link
Member

I would be fine with them living as artifacts. Making the system image more modular will allow to build smaller system images for deployment - so that's the right direction, imo.

@staticfloat
Copy link
Member

I've made great strides in this on my branch. I've converted everything that it makes sense to, excepting LLVM. LLVM is a special case that I will address after this. First, the changelog:

Changelog

  • I've added JLL package downloading/artifact construction to the deps/ makefiles. JLL packages that are used verbatim get altered somewhat to eliminate any dependency on Pkg. This is easy since the only two things we use Pkg for (getting binaries and knowing which platform we're running on) are both known at build time. Note that this means the platform gets baked into the stdlib JLLs, which is fine, it doesn't tend to change.

  • If you choose to build something from source, a "fake" JLL package is generated with the same UUID, but it won't look in the artifacts directory for its binaries; it instead looks in Julia's bundled lib/julia directory. Note that @vchuravy raised good points in Paths to library directories in Overrides.toml Pkg.jl#1704 that it would be nice if this could be done via Overrides.toml, but since it's not quite flexible enough to do this across all platforms (if all platforms could create symlinks this would be easy) it's easier for us to just generate fake JLL packages. Perhaps in a future Pkg release we can work this out with a more flexible Overrides.toml syntax.

  • The Makefile system has been spruced up to have deps with dependencies be able to find their dependencies at compile-time, such that if you want to, for instance, build SuiteSparse from source but provide OpenBLAS from BB, the libraries get found properly.

  • Libdl has been moved into Base as Base.Libc.Libdl. This is necessary so that we can do dlopen() inside of Base.

Splitting up LLVM_jll

It seems to me that we have an issue; we want to provide libLLVM alongside Julia in a JLL such that when users ask for a handle to libLLVM in a Pkg-informed way (e.g. through Pkg.add("LLVM_jll")) they are locked to the version that ships with Julia, and thereby get the same version that comes with their Julia version. However, LLVM_jll provides a lot more than what Julia itself ships with; it contains nice things like clang and opt and whatnot. I don't really think we should therefore start shipping clang with Julia, rather the opposite.

I think we should split LLVM_jll up into multiple packages; perhaps having a LibLLVM_jll and then have LLVM_jll depend on LibLLVM_jll, and only LibLLVM_jll is shipped with Julia. @maleadt and @vchuravy I am very interested in both of your thoughts on this.

@maleadt
Copy link
Member

maleadt commented Mar 4, 2020

I think we should split LLVM_jll up into multiple packages; perhaps having a LibLLVM_jll and then have LLVM_jll depend on LibLLVM_jll, and only LibLLVM_jll is shipped with Julia.

Sounds good to me. The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file. Maybe those should also be provided by the LibLLVM_jll?
For other LLVM-based WIP I also need the headers and binaries, but that's just to build a tool so would be fine to put in a LLVM_jll package that only gets installed as part of a build_tarballs.jl.

It's not entire clear to me though how we would version this thing (e.g., with multiple builds of the aforementioned tool, one for each LLVM version, and I just want to install whichever one's compatible with the user provided LLVM while maintaining semver of the tool), but that's orthogonal to this refactor.

@staticfloat
Copy link
Member

The CUDA compiler really only needs libllvm, however, with the addition of some additional API calls from this source file . Maybe those should also be provided by the LibLLVM_jll?

Won't the symbols in the file you linked be a part of libjulia? Those symbols will then always be available, right?

@maleadt
Copy link
Member

maleadt commented Mar 4, 2020

Won't the symbols in the file you linked be a part of libjulia? Those symbols will then always be available, right?

Sure, but since they are essentially an extension of ilbllvm's C API it might make sense to put them there?

@staticfloat
Copy link
Member

Ah, I see what you mean; these aren't used by the rest of Julia, they're only for the benefit of LLVM.jl.

Since we need to still support users building LLVM from source, I think we should probably keep it as a part of Julia's source.

@vchuravy
Copy link
Member

vchuravy commented Mar 4, 2020

Regarding LLVM_jll the right approach is probably to follow what Linux distros have been doing and break it up into LLVM_jll (with opt/llc/llvm-*) and Clang_jll for Cxx jl and Cxxwrap.jl

@staticfloat
Copy link
Member

My branch now works on Linux, MacOS support is pending a new OpenBLAS JLL (as MacOS is more sensitive to things like dylib IDs than Linux is), and then finally Windows. The great triumph is that a default build (e.g. with nothing setting any USE_BINARYBUILDER_XYZ=0 settings) has only the following libraries outside of the main package depot's artifacts directory, with the vast majority being served from artifacts:

julia> using Libdl; filter(l -> !occursin("artifacts", l), Libdl.dllist())9-element Array{String,1}:
 "linux-vdso.so.1" "/home/sabae/src/julia-jllstdlibs/usr/bin/../lib/libjulia.so.1"
 "/lib/x86_64-linux-gnu/libdl.so.2"
 "/lib/x86_64-linux-gnu/librt.so.1"
 "/lib/x86_64-linux-gnu/libpthread.so.0"
 "/lib/x86_64-linux-gnu/libc.so.6"
 "/lib/x86_64-linux-gnu/libm.so.6"
 "/lib64/ld-linux-x86-64.so.2"
 "/home/sabae/src/julia-jllstdlibs/usr/lib/julia/sys.so"

julia> length(filter(l -> occursin("artifacts", l), Libdl.dllist()))
32

@ViralBShah
Copy link
Member

Can the system image eventually be served as an artifact - so that I can then have many different system images for different projects?

@staticfloat
Copy link
Member

I think the piece that needs to be solved is getting Julia to load a project-specific sysimage. Right now you need to pass -J which isn't very user-friendly. It would be nice to have something similar to --project and JULIA_PROJECT. I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

@ViralBShah
Copy link
Member

Well then we could even do optimized system images by architecture!

@staticfloat
Copy link
Member

We already do that; we have images by architecture (e.g. x86_64, i686, etc...) and then within an image, we compile functions multiple times such that newer processors have versions of functions with expanded instruction sets.

@tkf
Copy link
Member

tkf commented Mar 20, 2020

I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

Wouldn't it require various tools to agree on where to look at the system image? For example, you may want to use the same sysimage in your editor and in stand-alone scripts.

Maybe the UI/API in Pkg.jl or PackageCompiler.jl can include something that creates a simple text file (say) <path>/.sysimages/sys.$(triplet).link containing the path to the actual sys.$(triplet).$(dlext) file? I guess it is then easy enough to handle within libjulia? Also I guess you can use sysimage downloaded in ~/.julia/artifacts this way. It'd be nice if re-locatable sysimgs with non-stdlib packages can be distributed and used in different projects.

@davidanthoff
Copy link
Contributor

I'm thinking something like --project=<path> could imply -J<path>/.sysimages/sys.$(triplet).$(dlext) or something. It's a little tricky because we would need to do all of this without running Julia code. It would be a lot easier for this to happen in the context of an editor rather than the REPL, as an editor can automatically pass options and whatnot easily.

The VS Code Julia extension has been shipping with exactly something like that for more than a year: https://www.julia-vscode.org/docs/dev/userguide/compilesysimage/.

@tkf
Copy link
Member

tkf commented Jun 30, 2020

@davidanthoff See #35794 that adds it to Julia.

@KristofferC
Copy link
Member Author

I don't think this is a release blocker for 1.6 so removing milestone. @staticfloat please put it back if you see fit.

@KristofferC KristofferC removed this from the 1.6 features milestone Oct 20, 2020
@StefanKarpinski
Copy link
Member

It isn't going to make it for 1.6 but will be in 1.7.

@ViralBShah
Copy link
Member

Did this make into 1.7, and have we done sufficient work to close this?

@staticfloat
Copy link
Member

Sadly no, it did not. There is still some significant work to be done, but some smaller pieces have made it in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
artifacts stdlib Julia's standard library
Projects
None yet
Development

No branches or pull requests

9 participants