-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: adds conditional modules (#6195) #6884
Conversation
conditional modules are executed when a dependency module becomes available and/or replaced. it has the following syntax: module A requires B, C.D using B import C.D # do stuff with B, C.D # module A will be rerun whenever # B or C.D are (re)defined end
What's the advantage of this approach over adding general load hooks that trigger when a named module is loaded? If we end up going with this approach, I think that it really ought to be |
tl;dr; It's trivial to simulate one approach with the other, so it is not a question of which approach is more general. However, I think that module-based load-hooks are actually more sane & more general. They both start executing code, so the difference is in how they are intended to be used. I expect that the function approach will require the use module A
export myf
myf() = 1
LOAD_HOOKS["B"] = function()
@eval begin
using B
myf(::B) = 2
end
end
end vs. module A
export myf
myf() = 1
module AB when B
importall ..A
using B
myf(::B) = 2
end
end
end Theoretically, the module version would also be more amenable to caching partial results, resulting in much better optimization (e.g. faster load times) for the module approach over the function approach. Finally, the namespace segregation of the module version may be helpful to the user, since it makes it harder for the library to provide a constant or function that is only available after the user imports a different module. This is simply speculation, but my expectation is that the module approach forces the user to clearly define what is part of the default interface of the module, and what is only possible when another module is defined. For example, in the Finally, the module method could be smart enough to take a call to |
Solid arguments. I wanted to do the least invasive thing that would work, but it may well be that we want much deeper integration for this kind of feature. |
@JeffBezanson can I get your comments? I feel like this is something worth merging before 0.3 (rather than shortly after), since it instantly breaks backwards-compatibility. |
I think we really need the functionality provided here. One question is whether in the example
the |
Sounds a nice solution. I wonder whether it would be possible to unload modules if one wants to. This is not high priority, but how would it fit in the design if one wanted to allow this in the future? In R in happens quite often that you want to unload/detach a package, e.g. because it conflicts with another one. |
probably this can be done the same way as variable unloading #2385 ;-) |
I agree that we really need this and it should happen before 0.3 because it will be a major compatibility watershed. It's a good point that the function hook approach can actually be made to require less boilerplate (especially that Since modules are generally constant, I think the only way to unload things would be to reload everything except for the thing you're unloading. |
We're using In MathProgBase, we need to choose a default package to solve each problem class (linear programming, integer programming, etc.). See https://github.com/JuliaOpt/MathProgBase.jl/blob/master/src/defaultsolvers.jl. How can we use this syntax for that? |
Would it be enough that a module that is not referenced anymore can be |
IMHO the main point is that exported symbols are hidden. Freeing memory would of course be good, but I'm not sure it would make such a big difference. |
I envisioned that you would use isdefined at the point of usage (or in Using Pkg.installed (or Pkg.dir) seems brittle to me, since the user/sysadmin may install the module elsewhere Unloading is very hard, since modules often add methods to external functions (and affecting type inference). I expect it is (mostly) possible with this solution (and not possible with the function hook approach). But very time consuming to implement, so unlikely to happen anytime soon. |
In our case we want to try to automatically load one (but not all) of the default packages. That is, if |
This kind of complex requirement is actually an argument for the hook approach. You can add hooks for A, B and C – and each hook unhooks the other hooks. There's no way to have enough syntax to express complex things like that declaratively. But I'm not sure I understand the situation correctly. This is for code that you always want to have loaded when both this and some other module are loaded, whereas what @mlubin is describing sounds like something else. |
Yes, in our case we want to try to load a certain set of packages (with nontrivial precedence rules) ourselves instead of forcing the user to explicitly import them. |
I'll post some possible example code for your use case later tonight (when I'm at a computer) If the user explicitly loaded a package, would you always use that, or always default to the highest precedence? Or could the user load multiple and select between them, but defaulting to one of them? |
The way it currently works, we default to the highest precedence. I think this makes more sense so that the behavior doesn't change based on the status of the user's session (which packages might have been loaded before). Users can override the default choice by loading their own packages and explicitly setting a solver to use. For example:
uses the default linear programming solver, and
uses Clp. At a higher level in JuMP, we have
uses the default solver, and
overrides it to use Gurobi. Note that "solvers" aren't just the modules themselves, they are instances of a type that implements a certain interface. These instances can also carry parameters for the optimization algorithm, like |
There's lots of ways to structure this, to achieve various results. I've decided to use Winston as an example, since it is pretty similar, but a lot simpler. The goal here is to seamlessly provide multiple backends (Gtk and Tk), and use whichever is available. Unlike the solver interface, on OS X, Tk is observed to segfault if another toolkit (such as Gtk) is loaded in the same process, so it isn't as much of a concern to have the Julia interfaces for them play together nicely. module Winston
module TkW requires Tk
import ..Winston
window(...) = ...
__init__() = Winston.set_available(TkW)
end
module GtkW requires Gtk
import ..Winston
window(...) = ...
__init__() = Winston.set_available(GtkW)
end
end
function set_available(toolkit)
global window = toolkit.window
end
function __init__()
if !isdefined(:TkW) && !isdefined(:GtkW)
# try to initialize a GUI
try
require("Gtk")
catch
try
require("Tk")
catch
warn("Could not load the Gtk or Tk GUI toolkit. Interactive plotting will be unavailable.")
end
end
end
end
end One of the key observations is that, unlike the function hook approach (which needs eval heavily), I never have a need to use |
Seeing this in action is great Jameson. I think you have found a really nice solution for a problem that arises in various packages. +1 to get something like this into 0.3 if its mature (have not tested yet). |
Wow, really nice @vtjnash. |
Ping @SimonDanisch as there are also lots of package interdependencies in the OpenGL package family he is working on. I see this for providing optional OpenGL image export without the need to have a hard dependency on Cairo.jl |
I'm happy with the extra module syntax if it can cover all cases, but let me play devil's advocate: It might be useful – and powerful – to have a function hook system with a macro to cover the common case (and hide the boilerplate). For example, say you have an module A
f(::Int) = 2
@require B f(::B.T) = 4
@require B begin
# Multiple definitions, submodules etc.
f(::B.S) = 5
end
# Resolve arbitrarily complex dependencies
foo_loaded = false
bar_loaded = false
@require Foo begin
foo_loaded = true
set_default_implementation()
end
@require Bar begin
bar_loaded = true
set_default_implementation()
end
# Submodules
@require (Gtk, Winston) module DisplayA
# Something...
end
end Some advantages:
Arguably, you get the best of both worlds that way. |
hiding the we already need to write an optimization for loading modules. if syntax is part of I have no problem with having an |
The thing is, if defining functions in a module via What I'm thinking is that you have a process that goes like
That way you solve the problem of a module's methods being dependent on available modules (because during step 1 they aren't). The only issue with this is that the dependent code itself isn't cached. I suspect that won't be a huge problem, since it will mostly be small amounts of glue code, but if you really want to cache as much as possible you can then use the inner module technique you've suggested. Dependent inner modules is a neat solution to this, so I'm not trying to put that down – I'm just trying to point out that using load hooks and |
right, i meant to say /after/ the |
Right, which is exactly what I was talking about – as I said, I can (and do) call You've said yourself that it's trivial to simulate one approach with the other – it must follow that any problems introduced by one are also introduced by the other. If load hooks preclude optimisations, and this PR enables load hooks, this PR precludes optimisations, surely? |
I really think it's important to get this settled for 0.3 and had tagged it as such, but someone seems to have removed the tag. |
@vtjnash and I did a bunch of thinking about this yesterday, in particular looking for some kind of prior art. In short, it's hard to find a package system that does anything like this, and many people seem to think it's a bad idea. For example see npm/npm#930 Somebody in the second link is quite scathing on this (perhaps too much). From reading npm threads, my main takeaway is that while they have similar issues, they mostly assume that "plugins" or "bridge" packages will be separate packages, and the debates are around how to express the dependencies and how to detect which plugins are available. In contrast, we've tried to minimize the number of packages and
I'm not convinced it's so bad to tell users to write an extra npm has "optional dependencies" (which simply means that failing to build/install the dependency is not fatal) and advocates the equivalent of |
I think the following analogy is more accurate: I own a car and recently bought a bike rack. Cars do not have a common interface (the trunks are similar, but not identical), nor do bike racks come in unique models for each car. Instead, it came with an instruction booklet that lists cars by make and model and explains how to adapt the bike rack for the trunk configuration of my car. Conditional modules are that instruction booklet. For prior art, I feel that package managers have historically been disconnected from the language design, and thus lag the language in functionality. I think the closest prior art would be the helper apps for the kernel that make it responsive to environment changes / user action: udev, dbus, etc. (also perhaps subsets of zeroconf / bonjour / mdns; and launchd / systemd behaviors). whereas I think autoconf / cmake are fairly representative of the traditional static solution to this (and lead to one the most problematic elements for package systems: variants) which have traditionally gone with the build-time detection approach (and failing badly at being a robust solution, due to the lack of language support -- and exacerbated by their need to be language agnostic) I'm against any design that results in the necessity for the user to clear the cache (or conversely, for clearing the cache to cause a behavioral change). If the cache is not able to resolve dependency links correctly, it is failing badly at its primary purpose (which is to transparently accelerate repeated operations). That doesn't mean we need conditional modules, just that we need to implement this dependency link. |
@vtjnash: so what's your position on this issue then? |
Ok, I agree on manual cache-clearing. That would not be good. We could potentially have In addition to whether cache-clearing is manual, there's the issue of when it happens. Currently you need to restart and reload a package for us to look at whether a recompile is needed. That might be preferable to packages constantly re-configuring themselves at runtime based on which other packages you load. There are multiple kinds of use cases here. A major distinction I think is based on the amount of code involved. If a piece of "adapter" code is a large module, it should be a separate package installed manually. But many uses of But I really think most of these cases are better solved by dependency injection, as here: JuliaIO/JSON.jl#129, or with interfaces. Consider this tiny method definition: https://github.com/dcjones/Gadfly.jl/blob/51e245da67aa97e8a3cc3ae5731ac1e8e6d8bc95/src/Gadfly.jl#L47 It seems weird to me that we'd need conditional loading hacks and the involvement of a package or module system just to express what seems to be a small fact about Plots. |
Dependency injection works well enough for return types but I worry about type instability and performance implications. Especially for dependencies injected via keyword arguments.
In a sense, the issue here is that the resolution of |
In most cases the depended-on package is going to be loaded, if at all, after the parent package has loaded, so an if statement doesn't work. e.g. Juno boots and Gadfly is not loaded yet; user loads Gadfly; we define The current alternatives to this seem to be to:
(1) could work if we had really good ways of expressing display in a frontend-agnostic way, but that's a hard (/intractable) problem in itself. Base's current display system, at least, is a long way from that ideal and probably always will be. (2) and (3) don't remotely scale. I really need an approach that solves this problem, even if that approach is "allow Requires.jl to exist in peace". |
Isn't the design already unscalable if new code needs to be written for every combination of graphic generator and graphic displayer? I guess the reason it might not be is that the new code needed is very small, like I see here:
I find it hard to believe that there is no way to define an interface that allows this one line to move into Gadfly. |
This is another example where evaluating |
There's a middle ground between complete coupling of generators/displayers and complete decoupling. Right now we aim for things to be generic where possible and pave over the gaps where necessary. If it's not necessary too often then it scales well enough. Again, I'm not arguing that this stuff is technically impossible without conditional loading, just that the large amount of extra coordination needed between package authors will slow things down a lot, among other disadvantages. Stefan's right that lazily-loading types / methods would cover this use case, if it's possible to implement that. |
Lazy type evaluations were touched upon on julia-dev two days ago: https://groups.google.com/d/msg/julia-dev/U1HxsML0w4Q/jcOpk1eGAwAJ The context was somewhat similar with slimming of Base but keeping the un-used bits around. |
This is intriguing, because you'd really rather have those definitions always be part of Jewel.jl, but only get called if an object of one of those types arises. I imagine in other dynamic languages you might write code like We would need to figure out the evaluation rule. When we see Something like header files comes to mind. What we need to insert a method is just the top line of a type definition: its name, parameters, and supertype. Of course we could also use something like this for mutually-referencing type definitions. However I don't yet see a natural place to put such things when working across packages. |
I like this train of thought a lot. It would make a ton of sense to write something like extern DiskData.BigVector{T} <: AbstractVector{T}
foo(::BigVector) = ... Something like this could also address some of the issues with sharing generic functions across packages when they're not defined in Base. |
I think that could work. Of course it's annoying to need to duplicate the declaration, but it doesn't increase the coupling too much since in these cases you already have to know about the other package's type. |
+1 for |
Does an extern declaration even need to provide that much information? |
With that amount of information we can handle method definitions as normal. Without it, we would need to do something much more complicated along the lines of deferring the method insert until the type is fully defined. |
I think |
Some combination of |
I suspect extern X.foo
X.foo(::MyType) = ... Would just work in light of |
Are lazy types still on the table? If so, this would also resolve being able to declare mutually referential types?
Presumably, using |
@mauro3 Yes! I think this approach is viable. Types already exist in a partially-constructed state very briefly, to allow field types to depend on their enclosing type. We just need to allow this state to persist longer. We can also introduce provisional modules, which can only contain |
Possible syntax for provisional modules: extern module Foo
# type declarations and function signatures go here
end |
Closing in favor of #15705. |
conditional modules (described in julep #6195) are executed when a dependency module becomes
available and/or replaced. it has the following syntax:
TODO:
A.B.*
when the dependencies for B are not loaded / add a placeholder module forA.B
to reserve the nameusing A.B
should autoload the declared dependencies for BPossible changes:
tuple expressions
module ... requires ...
ormodule ... when ...
...
should be automatically imported inside the module, to avoid repetition