package interactions #2025

StefanKarpinski · 2013-01-12T22:52:44Z

Especially in the presence of multiple dispatch, there are situations where there exists glue code that you want to load only when using both of two packages. For example, the k-nearest neighbors algorithm makes perfect sense to apply just to plain old matrices – but of course one also wants it to work for data frames, data matrices and various other containers of data. Currently the only way to make this work is to have the kNN package depend on DataFrames and add the appropriate DataFrame-specific methods. This is going to get out of hand very quickly.

I can think of two solutions. One way is to write the kNN code in a more generic fashion so that it isn't coupled with the DataFrames package but uses an interface for containers of data which DataFrames happens to provide. This is generally a good idea, but I kind of suspect that it may be rather hard to make work in all cases. The other way is to provide a mechanism for loading glue code only when both kNN and DataFrames are loaded.

johnmyleswhite · 2013-01-12T23:00:30Z

One partial way to cope with this is to establish canonical types used as interfaces between packages: this is part of the reason that we created vector and matrix in DataFrame. Then you can write

knn(a::Any, b::Any) = knn(matrix(a), matrix(b))

The trouble is that so many methods will need to have DataFrame's as the canonical type if the method is robust to missing data.

JeffBezanson · 2013-01-12T23:25:57Z

I think the only thing core julia can do to help with this situation is some kind of conditional loading (your second option).

StefanKarpinski · 2013-01-12T23:34:52Z

Right, but I'm thinking of a very particular kind of conditional loading: require("kNN") when DataFrames has already been loaded or require("DataFrames") when kNN has already been loaded both trigger the loading of the following two files if they exist:

kNN/glue/DataFrames.jl
DataFrames/glue/kNN.jl

StefanKarpinski · 2013-01-12T23:35:58Z

This arrangement allow you to provide glue code for a package to make it work nicely with as many other packages as you want, without any of the packages depending on each other. If you happen to load both, you get the appropriate glue; if you only load one or the other, then you don't.

johnmyleswhite · 2013-01-12T23:36:24Z

This seems like it will put a big burden on DataFrames, no?

JeffBezanson · 2013-01-12T23:41:46Z

It should be more like an optional dependency, so only one of those glue directories is needed.

StefanKarpinski · 2013-01-12T23:43:23Z

Neither glue directory is required – they're only loaded if they exist. The main reason to look for both of them is so that the order in which requires occur doesn't affect what gets loaded. Afaict, "optional dependency" is an oxymoron.

StefanKarpinski · 2013-01-12T23:44:19Z

Typically for a foundational package like DataFrames, the other packages will provide the glue.

timholy · 2013-01-12T23:55:50Z

What's wrong with a separate glue package?

Though see my last comment in #1809. If you need to override (not just extend) the behavior of another module to achieve what you want, I guess

evalfile(fname::String, mod::Module) = eval(mod, parse(readall(fname))[1])

might be useful.

StefanKarpinski · 2013-01-13T00:03:43Z

The issue with a separate glue package is that it we support loading a third package when kNN or DataFrames is used, but not when kNN and DataFrames, which is what you want for glue packages. Glue packages could modify existing code and they could be guaranteed to be loaded after both of the packages they connect.

StefanKarpinski · 2013-01-13T00:13:43Z

I suspect that all this points towards making requirements declarative rather than imperative.

StefanKarpinski · 2013-01-13T00:45:30Z

Let me elaborate on that. I think I've figured out what "optional dependency" means: if A is an optional dependency of B then if A and B are both required, A should be loaded before B. If we can arrange for that to happen, we don't need a special glue mechanism since B can simply check for the presence of A when it's loading and execute "glue code" conditionally. However, it seems to me that this entire notion implies that requirements must be declarative since otherwise you can't know if A is going to be required if B is loaded before A.

diegozea · 2013-01-17T20:30:17Z

This looks related with my actual situation: https://groups.google.com/forum/?hl=es&fromgroups=#!topic/julia-users/wwxKj0QoKzM

I'm thinking on this too... When you need a package on another, you penalized the load. For example:

julia> @elapsed require("DataFrames")
3.3809280395507812

Package Benchmark uses Dataframes [ https://github.com/johnmyleswhite/Benchmark.jl ] :

julia> @elapsed require("Benchmark") 
3.6832501888275146

And the time of load of this small package is huge because the load time of DataFrames.

Maybe the better option is, compile only what its used. In order you don't compile the full DataFrame package if you only call a type and two methods from DataFrames.

When Julia becomes compilable, are this things going to happen?
But maybe check at run time can be useful for avoid this and allow load conditional dependencies only what they are need it ?

diegozea · 2013-01-17T21:08:03Z

You can't use only a few thing of a package/module...

julia> using DataFrames.DataFrame
invalid using statement: name exists but does not refer to a module

StefanKarpinski · 2013-01-17T21:45:06Z

Diego, your focus on how fast Julia programs load hints to me that you may be doing something wrong. Why is starting Julia such a bottleneck for you? That said, the package interactions thing is clearly an issue (hence me opening this issue).

StefanKarpinski · 2013-01-17T21:47:08Z

You can only use a module as a module.

diegozea · 2013-01-17T22:09:22Z

I usually run scripts programs. A lot of times. I know I can't avoid this running everything inside Julia. The problem I see it's not a problem for Julia, it's a problem for a lot of languages. For example... Y created scripts on Python using Bio, for me and for sharing with co-workers in my group. Doing this, I note the load time of modules. Yes... Are seconds! But I use to run them in pipelines 188086 times (at one second... gives me a little more that two days only loading packages). I'm affray of Julia going in that way. Maybe when becomes compilable this it's not going to be a problem.... But at the moment I don't know if design a faster-to-load package or not ?

If the answer if trying to make a little faster to load package (even for a compilable Julia)... interactions between them is a problem for make it possible.

StefanKarpinski · 2013-01-17T22:14:38Z

Loading Julia programs will be fast when we have a compiler. Until then it will continue to be slow.

diegozea · 2013-01-17T22:17:59Z

As fast that I don't have to be worry for load times and size package... Or is it good trying to make it smaller ?

StefanKarpinski · 2013-01-17T22:24:26Z

It's always good to make things smaller. Honestly though, if you're starting a program 1 BILLION TIMES, you should really consider trying to run things in a single long-running process. Starting a C program that just exits is not instantaneous either.

pao · 2013-01-17T22:29:39Z

Julia is a general-purpose language with good support for running external programs--perhaps consider using Julia as the glue for your pipelines?

StefanKarpinski · 2013-01-17T22:30:49Z

+1. Julia is good at this kind of thing: http://docs.julialang.org/en/latest/manual/running-external-programs/

diegozea · 2013-01-17T22:48:21Z

I read about that, but I didn't get a chance yet. I'm used to use bash. It would be a good idea, I'm going to try it ;)

diegozea · 2013-01-22T03:38:43Z

Getting back to the point of this issue (excuse me for the noise)

I think can be great to be able to define a method for a DataFrame without import the package for example. It's going to be useful the declaration of method for types without importing all the packages.

johnmyleswhite · 2013-01-22T04:12:19Z

Are you proposing lazy loading of dependencies?

diegozea · 2013-01-22T14:34:28Z

I don't know if lazy loading it the expression... And I don't know if Stefan it's saying the same with declarative instead of imperative.

I'm saying that if you are going to use k-means on a matrix, you don't need to load DataFrames.
But if you load DataFrames, you can use k-means on DataFrames if the method is defined.

Maybe its more like the ability of define methods for types you don't load, in order that Julia can use it when their are already loading.

Maybe lazy loading (load only when you need it) can be a good option too.

mlubin · 2013-01-23T04:33:54Z

+1 for allowing loading glue when a specific set (pair) of packages is installed

carlobaldassi · 2013-01-23T12:53:20Z

It seems to me that this "glue plan" calls for introducing a "CONFLICTS" file for packages alongside "REQUIRES": suppose packages A and B have some glue code stored within A, then B gets updated in an incompatible way, and the glue code in A doesn't work with the new version of B. Since the two packages do not explicitly depend on each other, the packaging system would have no way to know this, unless told explicitly somehow. Maybe there are better ways to deal with this situation than introducing conflicts, but this is the easiest I can think of.

BTW I'd also like to have the "glue code" feature.

aviks · 2013-02-07T03:16:52Z

There is now a request to add a JSON serialiser for DataFrames. aviks/JSON.jl#10

However, given the relative sizes, and use-cases, of the two packages, I am loath to add a dependency on Dataframes to JSON. Any thoughts on a way out at present?

lobingera · 2017-12-12T15:04:57Z

Is there some visible development on this?

JeffBezanson · 2017-12-31T00:22:23Z

I believe at this point this can and needs to be added as a feature in 1.x.

bjarthur · 2017-12-31T00:32:44Z

so the 30-sec wait until display of first plot in Gadfly waits until then?

JeffBezanson · 2017-12-31T01:06:50Z

I don't believe that delay is related to conditional dependencies? Am I missing something?

ChrisRackauckas · 2017-12-31T01:32:54Z

Yes, Gadfly doesn't even have conditional dependencies. It doesn't use Requires.jl and it doesn't @eval using statements (which is Plots.jl's issue). Gadfly's first time to plot is completely orthogonal to this and just due to precompilation not capturing most of what users "think it would/should".

lobingera · 2017-12-31T08:13:35Z

@ChrisRackauckas Just for the record, Gadfly renders via Compose and Compose has some infrastructure like https://github.com/GiovineItalia/Compose.jl/blob/master/src/Compose.jl#L30-L47, so you're technically right that Gadfly doesn't have conditional dependencies, but it depends on it ...

ChrisRackauckas · 2017-12-31T08:19:20Z

Interesting. I didn't know Compose did that. Then it is the same problem as Plots.jl. Why does it have to be lazy though? It's the lazy loading part that makes it difficult.

lobingera · 2017-12-31T09:41:59Z

Maybe we see (compared to Plots.jl) some covergence here: similar problems bounded by the same constraints lead to similar solutions. Compose actually manages two backends, a homegrown SVG and a link to Cairo for other formats.

ChrisRackauckas · 2017-12-31T12:39:16Z

The ideas for fixing Plots.jl backends is much simpler than having some kind of Base hook for conditional deps. Instead its to pull in the backends with using and work off of that syntax, i.e. using PlotsGR instead of gr() doing that kind of stuff, and then having it add new dispatches to core functions using some abstract type. I think that's a sane thing to do and it fixes the precompilation problem. It just requires a re-write of the backend code to do it.

lobingera · 2017-12-31T12:52:59Z

It just requires a re-write of the backend code to do it.

What do you mean by just? Isn't that just giving up on modularization (which is (imho) re-using code without changing it)?

ChrisRackauckas · 2017-12-31T12:57:35Z

No. It's putting the backend code into a separate package like PlotsGR and having that implement a documented function interface by implementing dispatch on a concrete subtype of some abstract backend dispatching type. It's more modular and allows more code re-use, at the cost of having to have the backend code in a separate repo. But if Pkg3 can handle separate submodules in the same package well (with precompilation), then it can be one repo.

lobingera · 2017-12-31T13:02:24Z

Sorry, i'm lost. I thought, that the backend code is already in a separate package (i.e. GR.jl). And in your example, isn't the PlotsGR the abovementioned glue package? And when is the decision taken to execute/precompile PlotsGR?

ChrisRackauckas · 2017-12-31T13:05:54Z

When the user calls using PlotsGR. That would be how a backend is chosen, then the package's init call could set a global in Plots to make the backend choice reflected in the latest using. Then each plot call can have an optional argument passing through this global that says what the current backend is in terms of a type, and then core functions can be overloaded for specific backends by new dispatches in PlotsGR. So the decision to execute PlotsGR code is done when the user calls using PlotsGR, and the code to precompile are the new dispatches.

timholy · 2018-07-08T19:43:31Z

Now that we have Base.package_callbacks as a "blessed" interface, in my opinion JuliaPackaging/Requires.jl#46 seems like a non-objectionable solution to the other half of this problem. If that gets merged then perhaps we can close this.

nalimilan · 2018-07-10T08:44:16Z

@timholy That sounds great. Could you explain a bit what that PR does? Does it fix all issues with the approach currently adopted by Requires?

timholy · 2018-07-10T12:45:34Z

At one time Requires did a lot of "sneaky stuff" (i.e., overwrite methods in Core and Base), but over time it has worked more harmoniously with base julia; in particular, the addition of Base.package_callbacks in 0.6 gave us an official interface for calling a function whenever a new package has been loaded. The interface for a package callback is f(id::Base.PkgId), thus passing information to the callback about which package just got loaded. The entire list of callbacks gets called every time you load a new package. You'll note a comment that the interface was marked as experimental, but it hasn't changed during the entire 0.7 cycle (lots of stuff about loading has changed, but not the package_callbacks interface) and since we're about to release I think we can consider it safe. At least Revise.jl uses the same interface, so Requires is not the only consumer.

All this is provided by Base; now, on to Requires. First, let me describe the state of Requires master, which is largely the work of @MikeInnes. Requires defines a single callback function and pushes it to Base.package_callbacks. Requires also maintains a Dict of thunks for use by its callback function; the Dict is indexed by PkgId, which does not require that the module itself exists (yet); the values stored in the Dict are just lists of functions (thunks) to call conditional on the loading of that package.

This Dict gets populated through @require calls. @require is a bit complicated in master, so now let me turn to my PR. In my PR, @require just does this (I've edited this heavily so it looks more like regular code):

julia> macroexpand(Main, quote
           @require JSON="682c06a0-de6a-54ab-a142-c8b1cf79cde6" include("morecode.jl")
       end)

        if !Requires.isprecompiling()
            Requires.listenpkg(Base.PkgId(Base.UUID("682c06a0-de6a-54ab-a142-c8b1cf79cde6"), "JSON")) do 
                Requires.withpath(@__DIR__) do 
                    Requires.err(@__MODULE__, "JSON") do 
                        const JSON = Base.require(JSON [682c06a0-de6a-54ab-a142-c8b1cf79cde6])
                        include("morecode.jl")
                    end
                end
            end
        end

All that basically does is register the following:

const JSON = Base.require(JSON [682c06a0-de6a-54ab-a142-c8b1cf79cde6])
include("morecode.jl")

to be executed (whenever JSON gets loaded) inside whichever module you used @require in (that's what the @__MODULE__ is about). The last touch is setting the path (via @__DIR__) for finding "morecode.jl". In my PR, this @require statement must occur inside the module's __init__ function, which means that we register this JSON-dependency at the time of module initialization.

In the master branch of Requires, @require does a little bit more stuff because it supports having a @require statement outside __init__; it basically stores all the @require calls in a module-global array __inits__ and then creates an __init__ function that iterates through the list and registers them. IMO this is a bad idea because it is exclusive with having a user-written __init__ function. (You can iterate over __inits__ yourself, but this appears to be undocumented, and the lack of error output about why it failed is problematic.) So I would describe this as the one dicey remaining feature in Requires, which is why I stripped it out. So on that branch Requires plays well with precompilation, custom initialization, and all the other fancy things we now know we need.

If that gets merged, I think it's fair to say that Requires is a clean and straightforward solution(*) to the problem of executing code that is dependent upon other modules having been loaded. That may not be the full list of ways we want to support interaction among packages, but it's the big one, and the one for which there aren't as good alternatives. Again, most of this progress has been from the work of @MikeInnes and those who designed the Base.package_callbacks interface; all I did was give this a nudge to fix a couple of bugs and strip out the last bit of problematic behavior.

Unfortunately, I don't think it's a deprecatable change, so it's pretty heavily breaking.

(*) some might object to monkeying with task_local_storage to set the path, but by my reading (I could be wrong) it's safe.

StefanKarpinski · 2018-07-10T15:54:39Z

That sounds great, @timholy! In the future it might be good to make this entire business a little nicer to use and more official, but for now it sounds like we have everything we need. I support making the breaking change now so that Requires becomes a "clean and straightforward solution".

lobingera · 2018-07-10T17:14:02Z

@timholy You mention above this a solution of one half of the problem, what would be the other?

timholy · 2018-07-10T18:06:09Z

@lobingera, meaning Base.package_callbacks makes it possible to do this correctly (it's "the backend") and Requires is what implements the specific logic ("the frontend").

MikeInnes · 2018-07-11T15:59:58Z

Tim's PR is merged and tagged; I concur that it's a strong and stable solution for package negotiations.

aviks mentioned this issue Feb 7, 2013

Add support for data frames JuliaIO/JSON.jl#10

Closed

JeffBezanson added the triage This should be discussed on a triage call label Dec 31, 2017

StefanKarpinski modified the milestones: 1.0, 1.x Jan 4, 2018

JeffBezanson removed the triage This should be discussed on a triage call label Jan 4, 2018

timholy mentioned this issue Jul 10, 2018

RFC: test, fix, and redesign Requires JuliaPackaging/Requires.jl#46

Merged

mauro3 mentioned this issue Jul 18, 2018

Reorganizing optional package loading JuliaPlots/Plots.jl#918

Closed

alyst mentioned this issue Jul 25, 2018

0.7updates JuliaData/RData.jl#43

Closed

simonbyrne closed this as completed Aug 10, 2018

kmsquire mentioned this issue Apr 15, 2019

ERROR: error compiling anonymous: unsupported or misplaced expression export in function anonymous #8051

Closed

DilumAluthge mentioned this issue Jul 18, 2019

Implement the MLJ model API without needing to depend on external dependencies such as CSV.jl, CategoricalArrays.jl, etc. JuliaAI/MLJBase.jl#19

Closed

package interactions #2025

package interactions #2025

Comments

StefanKarpinski commented Jan 12, 2013

johnmyleswhite commented Jan 12, 2013

JeffBezanson commented Jan 12, 2013

StefanKarpinski commented Jan 12, 2013

StefanKarpinski commented Jan 12, 2013

johnmyleswhite commented Jan 12, 2013

JeffBezanson commented Jan 12, 2013

StefanKarpinski commented Jan 12, 2013

StefanKarpinski commented Jan 12, 2013

timholy commented Jan 12, 2013

StefanKarpinski commented Jan 13, 2013

StefanKarpinski commented Jan 13, 2013

StefanKarpinski commented Jan 13, 2013

diegozea commented Jan 17, 2013

diegozea commented Jan 17, 2013

StefanKarpinski commented Jan 17, 2013

StefanKarpinski commented Jan 17, 2013

diegozea commented Jan 17, 2013

StefanKarpinski commented Jan 17, 2013

diegozea commented Jan 17, 2013

StefanKarpinski commented Jan 17, 2013

pao commented Jan 17, 2013

StefanKarpinski commented Jan 17, 2013

diegozea commented Jan 17, 2013

diegozea commented Jan 22, 2013

johnmyleswhite commented Jan 22, 2013

diegozea commented Jan 22, 2013

mlubin commented Jan 23, 2013

carlobaldassi commented Jan 23, 2013

aviks commented Feb 7, 2013

lobingera commented Dec 12, 2017

JeffBezanson commented Dec 31, 2017

bjarthur commented Dec 31, 2017

JeffBezanson commented Dec 31, 2017

ChrisRackauckas commented Dec 31, 2017

lobingera commented Dec 31, 2017

ChrisRackauckas commented Dec 31, 2017

lobingera commented Dec 31, 2017 • edited Loading

ChrisRackauckas commented Dec 31, 2017

lobingera commented Dec 31, 2017 • edited Loading

ChrisRackauckas commented Dec 31, 2017

lobingera commented Dec 31, 2017

ChrisRackauckas commented Dec 31, 2017

timholy commented Jul 8, 2018

nalimilan commented Jul 10, 2018

timholy commented Jul 10, 2018 • edited Loading

StefanKarpinski commented Jul 10, 2018

lobingera commented Jul 10, 2018

timholy commented Jul 10, 2018

MikeInnes commented Jul 11, 2018

lobingera commented Dec 31, 2017 •

edited

Loading

lobingera commented Dec 31, 2017 •

edited

Loading

timholy commented Jul 10, 2018 •

edited

Loading