Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Library Variants #136

Closed
samoht opened this issue Jun 9, 2017 · 47 comments
Closed

Support Library Variants #136

samoht opened this issue Jun 9, 2017 · 47 comments

Comments

@samoht
Copy link
Member

samoht commented Jun 9, 2017

A few library use the "linking trick" to be able to compile libraries against a single cmi and select the correct implementation at link time. This is similar to functors, where multiple implementation can have the same signature, but this is done out of the language using build system invocations, hence the name "linking trick".

A good application for this is to select at link-time the best implementation for a given signature: mtime does this to select between the javascript native way of getting time when generating a javascript app vs. POSIX when generating a native app as described by @dbuenzli here. This could also be applied to SHA implementation, crypto, etc. We could even imagine using something similar in MirageOS, when we don't care having multiple implementation of the same signature living in the same app.

@ghost
Copy link

ghost commented Jun 9, 2017

This seems fine to me. As for everything in jbuilder, the user shouldn't have to think about the low-level machinery to implement this.

Here are some possible ways to present it:

Proposal 1

The user writes this:

(library
 ((name plop)
  (public_name plop)
  (implementations
   (((name foo)
    (libraries (...)) ;; deps specific to the foo backend
   )
   ((name bar)
    (libraries (...))
   )))
  ))

Then for the module that have several implementations, instead of writing the implementation in file.ml, they would write file.foo.ml and file.bar.ml.

Proposal 2

The user writes this:

;; src/jbuild
(library
 ((name plop)
  (public_name plop)
  (virtual_modules (blah))
   ))

And in this library blah.ml must not exist.

Then:

;; src/foo/jbuild
(library
 ((implements (plop))
  (public_name plop.foo)))

And in src/foo, we must only have blah.ml.

I think I quite like the second proposal. In the end this should be quite easy to implement. We should make sure to properly report errors, for instance if a user tries to link an executable with virtual libraries without providing the corresponding implementations, jbuilder should report a proper error instead of letting the compiler dies with an error that is impossible to understand.

@samoht
Copy link
Member Author

samoht commented Jun 9, 2017

I think I quite like the second proposal better, as I usually you want to distribute the implementations in different packages (for instance github-unix and github-js are supposed to implement github.cmi and might be installed separately). Also it is in general nice if other people can decide to implement a virual module distributed separately.

@dbuenzli
Copy link

dbuenzli commented Jun 9, 2017

I'd also really prefer the second if only to avoid (re)naming sources name tricks (but then I know jbuilder is not shy of doing that).

A few more comments about the general idea. Since I don't see this as a trick but a legitimate software engineering practice I nowadays avoid calling this a trick but "link time selection of libraries" or "library variants". I think it would be worthwhile to get @lpw25's input on how he plans to solve this in namespaces as it could inform the design.

One of the thing I was always annoyed about the current state of affairs is that since you only get to use one of the library in a compilation it feels illegitimate to me for them to exist in the namespace of library (or ocamlfind package) names.

Suppose I have an app that uses mtime and ptime which compiles to a regular operating system binary or to JavaScript, fundamentally I don't want to have to describe these build dependencies twice: once with the mtime.os, ptime.os package and another time with the mtime.jsoo, ptime.jsoo variants. I would really like to say I'm using mtime and ptime and have an additional syntactic construct that selects the library variants (os vs jsoo) that I use for a given build.

Somehow this is what I was trying to do with module selectors when I was getting into the territory of providing compilation helpers in odig (something I put on hold since @lpw25 told me namespaces would solve all these problem).

@lpw25
Copy link

lpw25 commented Jun 9, 2017

The namespaces proposal supports multiple library implementations via the following mechanism:

  1. You install the .cmi files in a directory called foo
  2. You install the .cmx files of the implementations (bar and baz) in directories called foo.bar and foo.baz.

Essentially the compiler will treat a directory called foo.bar as corresponding to the Foo module -- ignoring everything after the ..

So at compile time you specify -P blah/lib/foo and at link time you specify -P blah/lib/foo.bar. Or, if you are using OCAML_NAMESPACES, you don't need to specify anything at compile time and you specify -P +foo.bar at link time.

Mostly this is orthogonal to what jbuilder has to do: the namespaces proposal only cares about how things are laid out in the install directories, whereas the main thing for jbuilder is how you want to lay things out in the source directories.

I would really like to say I'm using mtime and ptime and have an additional syntactic construct that selects the library variants (os vs jsoo) that I use for a given build.

This seems quite a lot harder to support. It requires different libraries to agree on the different variants that there are (os and jsoo in this case). It makes some sense for things like target architecture, although possibly that would be better supported through opam -- so I would have a 4.04.1+jsoo switch which would always install the foo.jsoo implementation as foo -- since the target should be the same for all packages used to build something for that target.

@dbuenzli
Copy link

dbuenzli commented Jun 9, 2017

This seems quite a lot harder to support. It requires different libraries to agree on the different variants that there are (os and jsoo in this case).

If the only problem is the agreement I don't see it as a huge problem. Somehow having to specify the library variant for all off the libraries seems to push back the problem on the end-user and the build-systems, where it could simply be a ocamlc -variant jsoo ... where variants for a library name are looked up iff they exist.

But indeed e.g. for jsoo it was never clear whether we would be better off treating it as an architecture and try to fit it in the larger cross-compilation problem. But that is only pushing the problem further since we never got a real story for this in opam (e.g. do we use separate switches for build and host, do we put many host architectures in the same switch etc.)

@lpw25
Copy link

lpw25 commented Jun 9, 2017

I suppose ocamlc -variant jsoo could use foo.jsoo for all -P foo and make all bar.jsoo in the OCAML_NAMESPACES available as Bar. I'm probably not going to try and solve this in the initial namespaces proposal, but it is certainly something we can look at down the line.

@dra27
Copy link
Member

dra27 commented Jun 10, 2017

But that is only pushing the problem further since we never got a real story for this in opam (e.g. do we use separate switches for build and host, do we put many host architectures in the same switch etc.)

Possibly relevant to the options being considered here - I'm not a betting man, but I think jbuilder is likely to have a story for cross-compilation using multiple opam switches before opam itself has an official story for cross-compilation.

@ghost
Copy link

ghost commented Jun 12, 2017

I agree that we shouldn't have to specify n times jsoo if we want the jsoo implementations. Especially it'd be a pity to have to update all the revdeps of a library if it needs to switch from one implementation to two.

Looking at cases where one would want variants, or where we used to use findlib predicates it seems to me that the system could choose the implementation simply based on the set of libraries being linked. For instance:

  • choosing the jsoo implementations seems to correspond to exactly the cases where we link with the js_of_ocaml library
  • choosing the multi-threaded implementations seems to correspond to exactly the cases where we link with the threads library. And in fact ocamlfind itself has heuristics to automatically add the mt predicates, which seems to indicate that in general users expect these properties to be inferred

Moreover there are libraries that work with only one of the alternatives. For instance Core forces the use of threads, and I'm sure there are libraries that are for javascript only. This means that we'd need some way to specify that the threads and js_of_ocaml libraries force a specific variant.

What would be the advantage of introducing a new concept of variants?

@dbuenzli
Copy link

What would be the advantage of introducing a new concept of variants?

Not sure exactly what you mean by this but it seems to me that:

it seems to me that the system could choose the implementation simply based on the set of libraries being linked.
[...]
This means that we'd need some way to specify that the threads and js_of_ocaml libraries force a specific variant.

Is strictly equivalent to introduce the concept of a variant but in a less direct way and thus potentially more obscure for build problem analysis.

I don't mind if you can store -variant flags in cm[x]as, the way c flags can be stored but it's nicer if they show up on the cli in my opinion.

@ghost
Copy link

ghost commented Jun 12, 2017

Actually I was thinking that we might end up having to write (variants (mt os ...)) for every executable that we define, so using only the set of libraries seemed simpler to me.

But we shouldn't have too, typically the threads would only implement the mt variant. Then if you depend on threads you shouldn't have to specify mt at all since there is only one possible choice. For jsoo/os it's even simpler, jbuilder would make the obvious choice depending on what you are building (a native executable or javascript application).

@ghost
Copy link

ghost commented Jun 14, 2017

OK, so to sum up a bit this discussion, we have a concrete proposal that would allow to support choosing the implementation of a library at link time and would be compatible with the namespace proposal. Eventually we want to take advantage of namespaces and the notion of variants to make things simpler for users and especially not have to specify the implementation for every single library we link against.

We can proceed as follow:

  • we implement the proposal 2 mentioned in this thread
  • once variants are accepted upstream in OCaml, we integrate them in jbuilder, backport them using an alternative mechanism for previous versions of OCaml and deprecate giving public names to implementations

I think in addition to the proposal 2, we also need to add the limitation that all implementations must be defined in the same scope as the virtual library. i.e. they can be part of separate opam packages, but they must all be part of the same project. This will allow to know all the available implementations and infer the right one automatically when there is a single possible choice. We can lift this limitation later if variants work differently.

Regarding how things are installed, there is just one thing that's not completely clear to me: if <lib> is the output of opam config var lib and a package installs a library foo and an implementation foo.bar, it would create both <lib>/foo and <lib>/foo.bar. @lpw25 is that correct? I thought packages were supposed to install there artifacts only in <lib>/foo. Additionally if a second implementation was distributed as a separate package, would the package have to be named foo.baz?

@lpw25
Copy link

lpw25 commented Jun 15, 2017

is that correct?

I think that is the easiest way to do it. I would suggest that opam's rules should allow the foo package to install into any <lib>/foo.bar. Packages called foo.baz would be expected to provide an implementation of foo in <lib>/foo.baz. It might also be useful to allow packages called something like foo/bar which would be installed into <lib>/foo/bar and essentially add a Bar module to the Foo library. One could also imagine requiring the foo package to give permission for things to install implementations or submodules of it. It is really up to the opam-repository maintainers how they want to do it.

@dbuenzli
Copy link

@lpw25 I think it would be better to keep variants of lib in the lib directory rather than at the same level, in order not to subvert too much the package prefix property opam is trying to enforce (though in practice it will have to be for packages extending namespaces of others).

Why don't you simply use another convention for denoting variants ? In general I find a bit confusing if variants are distinguished and specified by . which has meaning in the module system. For example you could use lib/@variant/ for the implementation of variant variant of lib.

@lpw25
Copy link

lpw25 commented Jun 15, 2017

Why don't you simply use another convention for denoting variants ?

No need for another convention: <lib>/foo/foo.bar would work fine too if people want to put them in subdirectories.

@dbuenzli
Copy link

No need for another convention: /foo/foo.bar would work find too if people want to put them in subdirectories.

That's fine with me but I'd prefer if there was a single way of doing that (i.e. always in subs).

@bobot
Copy link
Collaborator

bobot commented Jun 29, 2017

Is someone trying to implement @diml's proposal 2?

@bobot
Copy link
Collaborator

bobot commented Jun 29, 2017

@diml How your proposal work when you use such library. Is the META giving specific information to know which package is the implementation of what?

@ghost
Copy link

ghost commented Jun 30, 2017

I suppose we can have a variable implements = "findlib-package" to be able to report errors properly when ones specifies multiple implementations of the same library. We also need a virtual = "true" variable on the virtual library to be able to detect when no implementations are provided.

@dbuenzli
Copy link

Why do you want add new variables to META files exactly ?

Though it would be a noble goal that the implementation packages do not show up as ocamlfind packages themselves (see the -variant discussion above) if that means adding new thing to META files I would be rather against it to ease transition toward the hypothetical namespace proposal.

@ghost
Copy link

ghost commented Jun 30, 2017

Just to report better error messages. When the compiler knows about variants, it should be able to report proper errors. In the meantime, it will just report that the implementation of some unit is missing or present twice, which won't be helpful to debug the problem.

@dbuenzli
Copy link

Note sure exactly what your problem is. I have had libraries with variants (see e.g. mtime, ptime.clock) for quite some time now without the problems you mention.

@ghost
Copy link

ghost commented Jun 30, 2017

I'm thinking that if someone lists mtime as a dependency without specifying mtime.os or mtime.jsoo when linking an executable, they will get an error from the compiler saying that some module has no implementation. If we add a few thing to META files, the error will be instead something along the lines of:

blah depends on library mtime. However mtime has virtual modules and no implementation was specified. You need to add either mtime.os or mtime.jsoo to the list of library dependencies.

Which is more helpful.

@dbuenzli
Copy link

I'm thinking that if someone lists mtime as a dependency without specifying mtime.os

N.B. this is for mtime.clock.{os,jsoo}, mtime has backend independant representations.

But in fact this doesn't happen because I do not install the .cmi alone in a separate dir so you have to choose immediately. This may help understand:

> tree $(opam config var mtime:lib)
/Users/dbuenzli/.opam/4.03.0/lib/mtime
├── META
├── jsoo
│   ├── mtime_clock.a
│   ├── mtime_clock.cma
│   ├── mtime_clock.cmi
│   ├── mtime_clock.cmti
│   ├── mtime_clock.cmx
│   ├── mtime_clock.cmxa
│   ├── mtime_clock.cmxs
│   └── mtime_clock.mli
├── mtime.a
├── mtime.cma
├── mtime.cmi
├── mtime.cmti
├── mtime.cmx
├── mtime.cmxa
├── mtime.cmxs
├── mtime.mli
├── mtime_top.a
├── mtime_top.cma
├── mtime_top.cmx
├── mtime_top.cmxa
├── mtime_top.cmxs
├── mtime_top_init.ml
├── opam
├── opam.config
└── os
    ├── libmtime_clock_stubs.a
    ├── mtime_clock.a
    ├── mtime_clock.cma
    ├── mtime_clock.cmi
    ├── mtime_clock.cmti
    ├── mtime_clock.cmx
    ├── mtime_clock.cmxa
    ├── mtime_clock.cmxs
    └── mtime_clock.mli

The disavantage of this approach though is that you can't have another library coding against Mtime_clock in a backend independent fashion, such a library would also need to propagate the variants itself (unless it uses a functor). The advantage is that you do not cut yourself from inlining optimizations.

@ghost
Copy link

ghost commented Jun 30, 2017

I see, what I had in mind with my proposal was this:

mtime
├── META
├── mtime.a
├── mtime.cma
├── mtime.cmi
├── mtime.cmti
├── mtime.cmx
├── mtime.cmxa
├── mtime.cmxs
├── mtime.mli
├── mtime_top.a
├── mtime_top.cma
├── mtime_top.cmx
├── mtime_top.cmxa
├── mtime_top.cmxs
├── mtime_top_init.ml
├── opam
├── opam.config
└── clock
    ├── mtime_clock.cmi
    ├── mtime_clock.cmti
    ├── mtime_clock.mli
    ├── jsoo
    │   ├── mtime_clock.a
    │   ├── mtime_clock.cma
    │   ├── mtime_clock.cmx
    │   ├── mtime_clock.cmxa
    │   └── mtime_clock.cmxs
    └── os
        ├── libmtime_clock_stubs.a
        ├── mtime_clock.a
        ├── mtime_clock.cma
        ├── mtime_clock.cmx
        ├── mtime_clock.cmxa
        └── mtime_clock.cmxs

I think this is closer to the namespace proposal.

Then, when you write a library, either you depend on mtime.clock and are backend independent, or you depend on mtime.clock.os and benefit from cross-library inlining.

@dbuenzli
Copy link

Yes I think what you propose makes sense. I'm just always a bit wary when people want to add things to ocamlfind.

@bobot
Copy link
Collaborator

bobot commented Jul 3, 2017

In the details, I'm not sure to understand if people agrees on what should be done for the META. To summarize my understanding of the problem:

  1. it should be compatible with the current ocamlfind without modification
  2. one should be able to write a backend independent library
  3. jbuilder could make things simpler (predicates automatically selected) and more efficient (automatically produce all possible cmx for backend independent libraries in a dependent way) when used but it should not be a requirement.

The proposal 2.A (META implementation details of proposal 2):

  1. uses predicates (how to manage the namespace of predicates?) starting with v.: v.os, v.jsoo.
  2. implementation packages and virtual packages are ocamlfind packages
  3. the virtual package requires the implementation package using predicates: in mtime.clock packages requires(v.os) = "mtime.clock.os"
  4. implementation packages do not use the predicate "v.*" because they should be used only in this case, they define the variable implements so that jbuilder could simplify things when used.
  5. implementation packages define error variables with the complementary implementation predicates, for forbidding to use them with the wrong predicates/variants.
  6. use -opaque when building the .cmi of the virtual packages so that we have no warning when building in a backend independent way (ie. without .cmx)

@dbuenzli
Copy link

dbuenzli commented Jul 3, 2017

TBH I'm not sure it's worth trying to do all the erroring stuff and make ocamlfind aware of the variants. The less obscure corners of ocamlfind I'm using the happier I am. E.g. if I'm not mistaken a META for @diml's proposal. Could simply be:

description = "Monotonic wall-clock time for OCaml"
version = "%%VERSION_NUM%%"
requires = ""
archive(byte) = "mtime.cma"
archive(native) = "mtime.cmxa"
plugin(byte) = "mtime.cma"
plugin(native) = "mtime.cmxs"

package "top" (
  description = "Mtime toplevel support"
  version = "%%VERSION_NUM%%"
  requires = "mtime"
  archive(byte) = "mtime_top.cma"
  archive(native) = "mtime_top.cmxa"
  plugin(byte) = "mtime_top.cma"
  plugin(native) = "mtime_top.cmxs"
)

package "clock" (
  description = "Monotonic time clocks"
  version = "%%VERSION_NUM%%"
  directory="clock"
  requires = ""

  package "os" (
    directory="os"
    description = "Mtime_clock for native OS"
    version = "%%VERSION_NUM%%"
    requires = "mtime clock"
    archive(byte) = "mtime_clock.cma"
    archive(native) = "mtime_clock.cmxa"
    plugin(byte) = "mtime_clock.cma"
    plugin(native) = "mtime_clock.cmxs"
    exists_if = "mtime_clock.cma" )

  package "jsoo" (
    directory="jsoo"
    description = "Mtime_clock for js_of_ocaml"
    version = "%%VERSION_NUM%%"
    requires = "js_of_ocaml clock"
    archive(byte) = "mtime_clock.cma"
    archive(native) = "mtime_clock.cmxa"
    plugin(byte) = "mtime_clock.cma"
    plugin(native) = "mtime_clock.cmxs"
    exists_if = "mtime_clock.cma" )
)

@dbuenzli
Copy link

dbuenzli commented Jul 4, 2017

In general my experience with ocamlfind has been less than stellar and I prefer not to solve problems in META files.

@ghost
Copy link

ghost commented Jul 13, 2017

If we do something clever with findlib predicates and it's not compatible with what ends up being merged in OCaml, it's going to be painful. Hopefully, namespaces are for a not too distant future, so for now I think we should implement the minimum to get people going, even if it means having to specify the same variant multiple times.

Once variants are merged, then we can do something clever to emulate them with findlib predicates, so that we can start using them without having to drop compatibility with older versions of OCaml.

@bobot
Copy link
Collaborator

bobot commented Jul 19, 2017

@diml could you answer the second problem I see with @dbuenzli's proposition, namely that during linking you need to specify all the library variant in the right order and the tool must keep the relative order of independent dependencies?

if it is explicitly added for linking through -package nothing ensure that it is linked before lib_foo

If your build system is not able to guarantee manually specified linking order, better find another one.

If jbuilder does it, it is not specified anywhere and #172 (the part about the use of Top_closure) was not rejected on that front, even if it could break this property.

Additional question, does jbuilder keep the relative order of independent modules when creating library?

Of course jbuilder could implement its own logic and encode its own information in the META files in order to make the use of such libraries first class only for jbuilder user and not for ocamlfind, ocamlbuild, oasis, ... users.

@diml Could you add in your second proposal how people describe the use and linking of such libraries in jbuild format? Just to fix if people must list all the implementation libraries, in the order or not, or just at least one to select the variant. Thank you.

@ghost
Copy link

ghost commented Jul 19, 2017

What I had in mind is that when you use such libraries, you simply list the concrete implementations:

(executable
  ((name blah)
   (libraries (mtime.clock.os lwt foo.os bar.os))))

I hadn't considered the ordering problem you describe, which is indeed a blocker. I guess we could ask users to specify ordering constraints when linking an executable, but TBH I'm not excited by this idea. It's not going to be very practical when using a lot of libraries and it's weird to have to re-specify at link time something you already wrote in jbuild files before.

I suppose we could simply use @dbuenzli's layout. We couldn't write backend independent libraries without using functors, but at least we avoid the ordering problem.

I agree that we could do something better right now, and encode it in ocamlfind predicates. At this point I would rather leave all the bike-shedding about variants to OCaml and the namespace proposal and simply follow what comes out of it.

@lpw25
Copy link

lpw25 commented Jul 19, 2017

I think that the removing .cma files section of the namespaces proposal would be sufficient to solve the ordering problem. Unlike modules linked from .cma files, modules linked from -I options are topologically sorted by dependency.

@dbuenzli
Copy link

Unlike modules linked from .cma files, modules linked from -I options are topologically sorted by dependency.

You mean by the compiler ?

@lpw25
Copy link

lpw25 commented Jul 20, 2017

Yeah

@ghost
Copy link

ghost commented Aug 3, 2017

I had another look at this and at the namespace proposal and talked with @lpw25 about variants etc...

Updated proposal

Defining a library with multiple implementations

The user writes this:

;; src/jbuild
(library
 ((name plop)
  (public_name plop)
  (virtual_modules (blah))
   ))

And in this library:

  • blah.ml must not exist
  • blah.mli must exist

Then to define an implementation:

;; src/os/jbuild
(implementation
 ((implements plop)
  (variant os)))

And the only ml file src/os should contain is blah.ml.

The main difference with the original proposal is that implementations have no name and specify a variant instead.

Using such a library and selecting an implementation

Executable stanzas now allow an additional field to specify a list of variants:

(executable
 ((name prog)
   (variants (os))))

Implementation with namespaces or findlib

This scheme maps directly to the concept of variants in https://github.com/lpw25/namespaces, so we'll just need to install the cmx files at the same place.

With findlib, the most natural encoding is to map every variant to a findlib predicate and select the archive and library dependencies depending on this predicate. We would generate META files that looks like this:

requires(os) = "unix"
requires(jsoo) = "js_of_ocaml"
archives(byte, os) = "@os/plop.cma"
archives(byte, jsoo) = "@jsoo/plop.cma"
...

However, to match exactly the semantic of variants, we must ensure that it is an error to specify none of os or jsoo or to specify both, so we need to generate some additional lines to enforce this:

error(-os, -jsoo) = "must specify exactly one of 'os' or 'jsoo'"
error(os, jsoo) = "variants 'os' and 'jsoo' cannot be used together"

As a side effect, this method should also fix #208 since it will effectively give a way to specify custom findlib predicates.

@samoht
Copy link
Member Author

samoht commented Aug 3, 2017

The new proposals seems fine. Will it allow to define implementations outside of the package defining the virtual module? This is sometimes required because you want to test a new variant, or because you don't have ownership on the main repo, etc.

@ghost
Copy link

ghost commented Aug 3, 2017

Ah, no it doesn't. We could support having implementation in separate packages but only as long as all the packages are part of the same project, since we need to know all the variants in advance. Well at least with the findlib implementation, namespaces don't have this limitation since there is no META file.

In any case it means that you have to install files in other package installation directories, which is slightly annoying, but I don't see a way around it without explicitly naming implementations.

@bobot
Copy link
Collaborator

bobot commented Aug 7, 2017

The new proposal is neat, I like it. Use of variants could be extended from executable definition to library definition in order to support the case of a library that depend on a specific variant.

@dinosaure
Copy link

I just want to ping this issue to know what is the state now after 2~3 months of the last reply.

@rgrinberg rgrinberg changed the title support the linking trick, e.g. adding only a cmi in a library Support Library Variants Dec 13, 2017
@rgrinberg
Copy link
Member

Will libraries with virtual_modules also require (wrapped false)? Since this will require having an .mli only module.

@ghost
Copy link

ghost commented Jan 9, 2018

@dinosaure we haven't started working on it. It's quite a bit of work, but it's still definitely on the TODO list.

@rgrinberg nope. There are no mli only modules here, at the time of linking the final executable, all implementations will be provided.

A virtual library is just a library with holes that are filled when linking.

@avsm
Copy link
Member

avsm commented Dec 31, 2018

fixed in #1430 and will be in dune 1.7.0+

@avsm avsm closed this as completed Dec 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants