-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
package interactions #2025
Comments
One partial way to cope with this is to establish canonical types used as interfaces between packages: this is part of the reason that we created
The trouble is that so many methods will need to have DataFrame's as the canonical type if the method is robust to missing data. |
I think the only thing core julia can do to help with this situation is some kind of conditional loading (your second option). |
Right, but I'm thinking of a very particular kind of conditional loading:
|
This arrangement allow you to provide glue code for a package to make it work nicely with as many other packages as you want, without any of the packages depending on each other. If you happen to load both, you get the appropriate glue; if you only load one or the other, then you don't. |
This seems like it will put a big burden on DataFrames, no? |
It should be more like an optional dependency, so only one of those glue directories is needed. |
Neither glue directory is required – they're only loaded if they exist. The main reason to look for both of them is so that the order in which requires occur doesn't affect what gets loaded. Afaict, "optional dependency" is an oxymoron. |
Typically for a foundational package like DataFrames, the other packages will provide the glue. |
What's wrong with a separate glue package? Though see my last comment in #1809. If you need to override (not just extend) the behavior of another module to achieve what you want, I guess evalfile(fname::String, mod::Module) = eval(mod, parse(readall(fname))[1]) might be useful. |
The issue with a separate glue package is that it we support loading a third package when |
I suspect that all this points towards making requirements declarative rather than imperative. |
Let me elaborate on that. I think I've figured out what "optional dependency" means: if |
This looks related with my actual situation: https://groups.google.com/forum/?hl=es&fromgroups=#!topic/julia-users/wwxKj0QoKzM I'm thinking on this too... When you need a package on another, you penalized the load. For example:
Package Benchmark uses Dataframes [ https://github.com/johnmyleswhite/Benchmark.jl ] :
And the time of load of this small package is huge because the load time of DataFrames. Maybe the better option is, compile only what its used. In order you don't compile the full DataFrame package if you only call a type and two methods from DataFrames. When Julia becomes compilable, are this things going to happen? |
You can't use only a few thing of a package/module...
|
Diego, your focus on how fast Julia programs load hints to me that you may be doing something wrong. Why is starting Julia such a bottleneck for you? That said, the package interactions thing is clearly an issue (hence me opening this issue). |
You can only use a module as a module. |
I usually run scripts programs. A lot of times. I know I can't avoid this running everything inside Julia. The problem I see it's not a problem for Julia, it's a problem for a lot of languages. For example... Y created scripts on Python using Bio, for me and for sharing with co-workers in my group. Doing this, I note the load time of modules. Yes... Are seconds! But I use to run them in pipelines 188086 times (at one second... gives me a little more that two days only loading packages). I'm affray of Julia going in that way. Maybe when becomes compilable this it's not going to be a problem.... But at the moment I don't know if design a faster-to-load package or not ? If the answer if trying to make a little faster to load package (even for a compilable Julia)... interactions between them is a problem for make it possible. |
Loading Julia programs will be fast when we have a compiler. Until then it will continue to be slow. |
As fast that I don't have to be worry for load times and size package... Or is it good trying to make it smaller ? |
It's always good to make things smaller. Honestly though, if you're starting a program 1 BILLION TIMES, you should really consider trying to run things in a single long-running process. Starting a C program that just exits is not instantaneous either. |
Julia is a general-purpose language with good support for running external programs--perhaps consider using Julia as the glue for your pipelines? |
+1. Julia is good at this kind of thing: http://docs.julialang.org/en/latest/manual/running-external-programs/ |
I read about that, but I didn't get a chance yet. I'm used to use bash. It would be a good idea, I'm going to try it ;) |
Getting back to the point of this issue (excuse me for the noise) I think can be great to be able to define a method for a DataFrame without import the package for example. It's going to be useful the declaration of method for types without importing all the packages. |
Are you proposing lazy loading of dependencies? |
I don't know if lazy loading it the expression... And I don't know if Stefan it's saying the same with declarative instead of imperative. I'm saying that if you are going to use k-means on a matrix, you don't need to load DataFrames. Maybe its more like the ability of define methods for types you don't load, in order that Julia can use it when their are already loading. Maybe lazy loading (load only when you need it) can be a good option too. |
+1 for allowing loading glue when a specific set (pair) of packages is installed |
It seems to me that this "glue plan" calls for introducing a "CONFLICTS" file for packages alongside "REQUIRES": suppose packages A and B have some glue code stored within A, then B gets updated in an incompatible way, and the glue code in A doesn't work with the new version of B. Since the two packages do not explicitly depend on each other, the packaging system would have no way to know this, unless told explicitly somehow. Maybe there are better ways to deal with this situation than introducing conflicts, but this is the easiest I can think of. BTW I'd also like to have the "glue code" feature. |
There is now a request to add a JSON serialiser for DataFrames. aviks/JSON.jl#10 However, given the relative sizes, and use-cases, of the two packages, I am loath to add a dependency on Dataframes to JSON. Any thoughts on a way out at present? |
Is there some visible development on this? |
I believe at this point this can and needs to be added as a feature in 1.x. |
so the 30-sec wait until display of first plot in Gadfly waits until then? |
I don't believe that delay is related to conditional dependencies? Am I missing something? |
Yes, Gadfly doesn't even have conditional dependencies. It doesn't use Requires.jl and it doesn't |
@ChrisRackauckas Just for the record, Gadfly renders via Compose and Compose has some infrastructure like https://github.com/GiovineItalia/Compose.jl/blob/master/src/Compose.jl#L30-L47, so you're technically right that Gadfly doesn't have conditional dependencies, but it depends on it ... |
Interesting. I didn't know Compose did that. Then it is the same problem as Plots.jl. Why does it have to be lazy though? It's the lazy loading part that makes it difficult. |
Maybe we see (compared to Plots.jl) some covergence here: similar problems bounded by the same constraints lead to similar solutions. Compose actually manages two backends, a homegrown SVG and a link to Cairo for other formats. |
The ideas for fixing Plots.jl backends is much simpler than having some kind of Base hook for conditional deps. Instead its to pull in the backends with |
What do you mean by just? Isn't that just giving up on modularization (which is (imho) re-using code without changing it)? |
No. It's putting the backend code into a separate package like |
Sorry, i'm lost. I thought, that the backend code is already in a separate package (i.e. GR.jl). And in your example, isn't the PlotsGR the abovementioned glue package? And when is the decision taken to execute/precompile PlotsGR? |
When the user calls |
Now that we have |
@timholy That sounds great. Could you explain a bit what that PR does? Does it fix all issues with the approach currently adopted by Requires? |
At one time Requires did a lot of "sneaky stuff" (i.e., overwrite methods in Core and Base), but over time it has worked more harmoniously with base julia; in particular, the addition of All this is provided by Base; now, on to Requires. First, let me describe the state of Requires master, which is largely the work of @MikeInnes. Requires defines a single callback function and pushes it to This Dict gets populated through julia> macroexpand(Main, quote
@require JSON="682c06a0-de6a-54ab-a142-c8b1cf79cde6" include("morecode.jl")
end)
if !Requires.isprecompiling()
Requires.listenpkg(Base.PkgId(Base.UUID("682c06a0-de6a-54ab-a142-c8b1cf79cde6"), "JSON")) do
Requires.withpath(@__DIR__) do
Requires.err(@__MODULE__, "JSON") do
const JSON = Base.require(JSON [682c06a0-de6a-54ab-a142-c8b1cf79cde6])
include("morecode.jl")
end
end
end
end All that basically does is register the following: const JSON = Base.require(JSON [682c06a0-de6a-54ab-a142-c8b1cf79cde6])
include("morecode.jl") to be executed (whenever JSON gets loaded) inside whichever module you used In the master branch of Requires, If that gets merged, I think it's fair to say that Requires is a clean and straightforward solution(*) to the problem of executing code that is dependent upon other modules having been loaded. That may not be the full list of ways we want to support interaction among packages, but it's the big one, and the one for which there aren't as good alternatives. Again, most of this progress has been from the work of @MikeInnes and those who designed the Unfortunately, I don't think it's a deprecatable change, so it's pretty heavily breaking. (*) some might object to monkeying with |
That sounds great, @timholy! In the future it might be good to make this entire business a little nicer to use and more official, but for now it sounds like we have everything we need. I support making the breaking change now so that Requires becomes a "clean and straightforward solution". |
@timholy You mention above this a solution of one half of the problem, what would be the other? |
@lobingera, meaning |
Tim's PR is merged and tagged; I concur that it's a strong and stable solution for package negotiations. |
Especially in the presence of multiple dispatch, there are situations where there exists glue code that you want to load only when using both of two packages. For example, the k-nearest neighbors algorithm makes perfect sense to apply just to plain old matrices – but of course one also wants it to work for data frames, data matrices and various other containers of data. Currently the only way to make this work is to have the kNN package depend on DataFrames and add the appropriate DataFrame-specific methods. This is going to get out of hand very quickly.
I can think of two solutions. One way is to write the kNN code in a more generic fashion so that it isn't coupled with the DataFrames package but uses an interface for containers of data which DataFrames happens to provide. This is generally a good idea, but I kind of suspect that it may be rather hard to make work in all cases. The other way is to provide a mechanism for loading glue code only when both kNN and DataFrames are loaded.
The text was updated successfully, but these errors were encountered: