Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triggering extensions from within package #49990

Closed
MilesCranmer opened this issue May 29, 2023 · 8 comments
Closed

Triggering extensions from within package #49990

MilesCranmer opened this issue May 29, 2023 · 8 comments
Labels
packages Package management and loading

Comments

@MilesCranmer
Copy link
Member

MilesCranmer commented May 29, 2023

Right now when a user wants to use an extension, they load the dependency themselves, and this triggers the extension. However, this requires the user to know when for what part of the code need to use it.

I think it would be very useful for downstream users of a package if there was a way for a package maintainer to automatically trigger a particular extension from within their package.

For example, say I want to use Zygote.jl inside my package, but only for some rarely-used branch of my code. Due to the 300ms load time, I don't want to always load it. I could set up an extension, but it would require all downstream users to load Zygote.jl manually. But downstream users might not even know where Zygote is being used inside the package, or if it is being used at all.

So, it would be great if there could be a way to manually trigger an extension to load. Something like Base.load_extension(@__MODULE__, :ZygoteExt), similar to get_extension.

@KristofferC wdyt?

Related discussion: johnnychen94/LazyModules.jl#10


Edit: here's a clarifying post: #49990 (comment)

@inkydragon inkydragon added the packages Package management and loading label May 30, 2023
@KristofferC
Copy link
Member

KristofferC commented May 31, 2023

Extensions work best when the user does not need to know about them. For example, in https://github.com/KristofferC/PGFPlotsX.jl/blob/master/ext/StatsBaseExt.jl there is no need to know that a StatsBase extension exists because it just extends existing functionality and there is no difference in behavior if StatsBase is a normal dependency or used as an extension in the package.

Regarding the suggestion here, it feels equivalent to doing Base.require(Base.UUID("$ZYGOTE_UUID"), "Zygote"). This will load Zygote and also an eventual Zygote extension. But I don't think such an API is very good.

@johnnychen94
Copy link
Member

Extensions work best when the user does not need to know about them.

However, the user does need to know about the triggering package, which is "StatsBase" here...

The current extension UX can sometimes be troublesome because you can't expect users to remember and understand what every triggering package does. For instance, ImageIO exists because we don't want our target users to remember the name of every image IO backends (JpegTurbo, PNGFiles, TiffImages, etc), but instead, we want them to only remember ImageIO and load/save. This is why I don't think we should embrace package extension for ImageIO.

The UX that LazyModules provides is users know the API they care about (and not the triggering packages, unless we say triggering packages is a part of an API 😆), and the modules get loaded when the function gets first called. -- it's even lazier than the loading of extensions. To be honest, I don't know if the current extension design has the ability to support such lazy-loading usage.

@KristofferC
Copy link
Member

KristofferC commented May 31, 2023

However, the user does need to know about the triggering package, which is "StatsBase" here...

No, since the extension does not change any API they don't. The package just supports inputs from StatsBase types and if that support is implemented as an extension or a normal dependency it makes no visible difference to users.

For instance, ImageIO exists because we don't want our target users to remember the name of every image IO backends (JpegTurbo, PNGFiles, TiffImages, etc), but instead, we want them to only remember ImageIO and load/save.

FWIW, I find the FileIO, ImageIO API stuff very confusing. Trying to use it:

julia> using ImageIO

julia> ImageIO.load("/home/kc/Pictures/pro.png")
ERROR: MethodError: no method matching load(::String)

Closest candidates are:
  load(::FileIO.Stream{FileIO.DataFormat{:PNG}}; kwargs...)
   @ ImageIO ~/.julia/packages/ImageIO/xMHN9/src/ImageIO.jl:64

Hm okay, why would ImageIO.load not load an image..? Seems like you are supposed to do:

julia> using FileIO

julia>: FileIO.load("/home/kc/Pictures/pro.png")
833×1064 Array{RGBA{N0f8},2} with eltype ColorTypes.RGBA{FixedPointNumbers.N0f8}:
 RGBA{N0f8}(1.0,1.0,1.0,1.0)  RGBA{N0f8}(1.0,1.0,1.0,1.0)    RGBA{N0f8}(1.0,1.0,1.0,1.0)

So I add a package (ImageIO) but I don't load it or use it? Very strange.

But let's look at what FileIO.load did. Looking at Base.loaded_modules it seems it loaded a bunch of packages behind the scenes. Where do these come from?
Oh, ImageIO declares all those as normal dependencies https://github.com/JuliaIO/ImageIO.jl/blob/015fc91cf2c0c570b1511f19b9a01b70918ff99d/Project.toml#L9-L16 so all of them has to be installed and precompiled when installing ImageIO, even though I just want to load a png.

Let's try with just FileIO this time:

julia> FileIO.load("/home/kc/Pictures/pro.png")
Errors encountered while load File{DataFormat{:PNG}, String}("/home/kc/Pictures/pro.png").
All errors:
===========================================
ArgumentError: Package ImageIO [82e4d734-157c-48bb-816b-45c225c6df19] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.

===========================================
ArgumentError: Package ImageMagick [6218d12a-5da1-5696-b52f-db25d2ecc6d1] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.

===========================================

Fatal error:
ERROR: ArgumentError: Package ImageIO [82e4d734-157c-48bb-816b-45c225c6df19] is required but does not seem to be installed:
 - Run `Pkg.instantiate()` to install all recorded dependencies.
 
Stacktrace:
  [1] _require(pkg::Base.PkgId, env::Nothing)
    @ Base ./loading.jl:1739
...
 [Two 10+ frame long stacktraces here]

Okay, so I do need to know the names of the packages to be able to use FileIO, it's just that ImageIO bundled a bunch of these into a "mega loader". So the only convenience FileIO brings is that the packages gets loaded automatically, but you still need to add them.

Imo, it is better to leave it to the users to explicitly load the package. For example, in FileIO, there could be an extension to ImageMagick that provides e.g. a png loader. If that is not loaded, you get an error message saying that you need ImageMagick, the user adds that, loads it, and off they go.

@johnnychen94
Copy link
Member

johnnychen94 commented May 31, 2023

so all of them has to be installed and precompiled when installing ImageIO, even though I just want to load a png.

If you just want to load a png and you care about the package size, you can just use PNGFiles. In this case, you're not the target user of ImageIO 😆

Okay, so I do need to know the names of the packages to be able to use FileIO, it's just that ImageIO bundled a bunch of these into a "mega loader".

As a developer, you do need to know and add it to your Project.toml. But as a common user, you don't. For instance, users of Images.jl could just using Images and call load("image.png").
This defers the loading of PNGFiles when the user really needs to load a PNG file, and he/she doesn't need to know the existence of PNGFiles or ImageIO or ImageMagick.

Whether ImageIO or ImageMagick is used to support the load function is an implementation detail, and I believe this shouldn't be something that common users should care about.
However, the extension mechanism would require either the name "ImageIO" or "ImageMagick" to be a part of the API.

If one algorithm becomes one standalone package for latency reasons, imagine that users have to remember all the sorting algorithm names to start using sort..

using RadixSort
using QuickSort
ssort(x) # a smart sort function that internally delegates to the most efficient implementation, if available

Revisit JuliaLang/Pkg.jl#2005, it seems that we don't have a clear standard way to really let import A.B to only load B without loading A.
The extension mechanism almost does this but in an indirect way.
When @MilesCranmer opened this issue, I think he meant to ask about this feature...

@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jun 1, 2023

@KristofferC there is no need to know that a StatsBase extension exists because it just extends existing functionality and there is no difference in behavior if StatsBase is a normal dependency or used as an extension in the package.

I think I wasn't clear in my original post.

Extensions are great for extend a user interface to the types of an external package (such as your example of defining a custom PGFPlotsX.TableData method in your package for StatsBase input data), and I have also found them quite useful in my own Julia packages.

What am I more asking about is how one could exploit package extensions for internal use within a package.


Here's an example to help clarify what I mean. Say that my package includes an optimization routine of a user-defined function, using Optim.jl. Optim.jl lets you pass a function f to minimize, as well as a gradient function g! to avoid using finite difference gradients.

But, I don't want the user to have to write g!. I want to create the gradient function for them automatically, based on their passed function f.

The optimal choice depends on various factors; one being on the number of input parameters. I might choose to use ReverseDiff.jl for >10 optimization parameters, or ForwardDiff.jl for <= 10 optimization parameters. But, I don't want to load both ReverseDiff.jl and ForwardDiff.jl. I also don't want the user to have to worry about this.

In code, this would look like:

function optimize_f(f, params)
    n = length(params)
    g! = if n > 10
        @eval import ReverseDiff
        # ... create g! here
    else
        @eval import ForwardDiff
        # ... create g! here
    end
    optim(f, g!, params, ...)
end

This example code is not robust, sadly, due to world age issues. I also encountered world age problems when I tried to use Base.require. It would be great to use extensions here, but it would require the user to load ForwardDiff or ReverseDiff - they would have to know which branch the code will take a priori. It would also be great to exploit precompilation for both branches.

Does this help explain what I mean?

@MilesCranmer
Copy link
Member Author

@KristofferC I tried again with Base.require, with a full extension to conditionally load Zygote. Base.require does not seem to trigger the extension.

Here's a summary (full example here). In my package file src/OperatorEnumConstruction.jl, I call the following function, which uses Zygote to compute some gradients:

generate_diff_operators(args...) = error("`Zygote` not loaded.")

function create_diffs(binary_operators, unary_operators)
    Base.require(@__MODULE__, :Zygote)
    generate_diff_operators(binary_operators, unary_operators)
end

then, in ext/DynamicExpressionsZygoteExt.jl, I define the specialized behavior for generate_diff_operators:

module DynamicExpressionsZygoteExt

import Zygote: gradient
import DynamicExpressions: generate_diff_operators

function generate_diff_operators(
    binary_operators::Vector{Function}, unary_operators::Vector{Function}
)
    ...
end

end

I would hope that running Base.require(@__MODULE__, :Zygote) from within a function call would trigger the extension to load. But it does not seem to trigger it.

Would that be possible for the trigger code to check? If that works, then I think this would already be possible with extensions.

@MilesCranmer MilesCranmer changed the title Manually triggering extensions from within package Triggering extensions from within package Jun 1, 2023
@MilesCranmer
Copy link
Member Author

MilesCranmer commented Jun 1, 2023

Update:

It seems like 1.9 backport #49680 actually fixes the above issue! Because of #49701, extensions now will load after internal loading of a module, like is done here. So after that patch, running Base.require(@__MODULE__, :Zygote) does in fact load the extension! 🎉

The one sharp edge is that you need to still call Base.invokelatest on the extension-overloaded function. If you don't do this, you would have to call it twice to get the updated version. But as long as that overloaded function is the one defining a kernel (as a closure), the overhead of invokelatest won't be an issue.


@KristofferC one bug I noticed is that using an extension like this requires you to simultaneously define the package in both [deps] and [weakdeps] of the Project.toml. I'm not sure if this has to do with how the extension code checks things, but it might be good to check both keys just in case someone is using an extension like this:

julia> using DynamicExpressions
ERROR: KeyError: key "Zygote" not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:484 [inlined]
 [2] _insert_extension_triggers(parent::Base.PkgId, extensions::Dict{String, Any}, weakdeps::Dict{String, Any})
   @ Base ./loading.jl:1236
 [3] insert_extension_triggers(env::String, pkg::Base.PkgId)
   @ Base ./loading.jl:1179
 [4] insert_extension_triggers
   @ ./loading.jl:1164 [inlined]
 [5] _require_prelocked(uuidkey::Base.PkgId, env::String)
   @ Base ./loading.jl:1665
 [6] macro expansion
   @ ./loading.jl:1648 [inlined]
 [7] macro expansion
   @ ./lock.jl:267 [inlined]
 [8] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:1611

However, if I define it in both [deps] and [weakdeps], code which uses my package as a dependency complains about a partially installed environment.

I can raise this as a bug separately.

@MilesCranmer
Copy link
Member Author

Moved to bug report in #50028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
packages Package management and loading
Projects
None yet
Development

No branches or pull requests

4 participants