Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for method-wise cache invalidation. #71

Conversation

willow-ahrens
Copy link

@willow-ahrens willow-ahrens commented Jan 5, 2021

As requested, part 1 of ? of my previous too-big PR. I believe this patch allows for precompilation-safe method-wise cache invalidation needed in #59. Cache lookup works as follows:

Use Julia's method lookup to find a method based on a target signature. Then look in the module the method was defined in for a global dictionary named __memories__. If the dictionary exists, then look for the method based on the structure of its signature, rather than the Method object itself (for some reason, it seems the the hash identity of the methods changes during precompilation).

We determine if we are overwriting a method by looking up the signature in an earlier world age because the function name we are looking for may not yet be defined. Note that finding a method with which is not enough to know we have overwritten the method: their signatures must be ==.

I have tested the effects of precompilation manually, I'm not sure if there's a way to test for this automatically.

@willow-ahrens
Copy link
Author

@cstjean Alright, I've added a few tests for precompilation, so I think this is ready for review, unless you'd like me to rebase first. Method caches are stored in the module that defined the method. Methods will always have a cache in the defining module even after precompilation. If the method is called in a module during precompilation, the result will be preserved after precompilation. If, during precompilation, we call a memoized method defined in a different module, that result will not be preserved, since caches are stored in the defining module, not the calling module. I believe this is also a limitation of the current design on master.

@willow-ahrens
Copy link
Author

This PR can be simplified by using the which(f, args) form included in Base. The reason I included the custom _which(signature) function was to support callable objects in the future. If we have a callable type Foo, I wanted to be able to determine if there was a matching method of a Foo without constructing foo. Note that which(Foo, args...) asks for methods of type Foo rather than object Foo.

I also used the custom version of _which so that I could give it a world age. This allows me to define an anonymous function to get its name, then check if there were any methods with the same name before I just defined the function.

However, since this PR doesn't apply to anonymous functions or callable types, I can simplify it if the reviewers request as such.

@willow-ahrens
Copy link
Author

I find myself unsure of the desired semantics of cache invalidation and scoping for closures. While this PR makes a lot of sense for global functions, I think that closures might want to use local scope to store their caches. Consider the following example:

function currypow(n)
    function innerpow(x)
        return x^n
    end
end

In this example, each call to currypow will invalidate the cache of the last call to currypow, even though that closure might still be callable. Interestingly, I think it will still allocate dictionaries for each closure to be memoized separately. It's hard to imagine a case where one would want the results of an inner function to be memoized globally, but also want the function to remain an inner function.

We can access variables that were closed over via getproperty, so I think it might be possible to solve this by checking all the closure fieldnames that were gensymed by Memoize to see if one matches the appropriate type signature.

@cstjean
Copy link
Collaborator

cstjean commented Jan 24, 2021

This PR can be simplified by using the which(f, args) form included in Base. The reason I included the custom _which(signature) function was to support callable objects in the future. If we have a callable type Foo, I wanted to be able to determine if there was a matching method of a Foo without constructing foo. Note that which(Foo, args...) asks for methods of type Foo rather than object Foo.

Thank you for clearing that up, I wondered. Supporting callables is neat, but I'm not sure it warrants a custom which. Let's leave it in the PR for now...

In this example, each call to currypow will invalidate the cache of the last call to currypow, even though that closure might still be callable.

Yes, that's not ideal. A similar problem is that

function foo()
    @memoize bar(z) = det(z)
    return bar(randn(1000,1000))
end

This should not leak memory (i.e. it shouldn't hold on to the big matrix after foo() has returned)

With #70 (comment), it wouldn't be an issue, because the memo Dict for @memoize inner_pow would just be a local variable. As you say,

I think that closures might want to use local scope to store their caches

👍

@willow-ahrens
Copy link
Author

willow-ahrens commented Jan 24, 2021

Supporting local scope is trickier than I thought, but I think it is accomplished in willow-ahrens#1. The trickiest part of method overwriting, etc. in local scope is that the method is sometimes callable before the definition is executed and the cache gets initialized. Consider

function foo()
    function bar()
        return 1
    end
    bar()
    qux = 2
    function bar()
        return qux
    end
end

which, when called, gives UndefVarError: qux not defined. I think that it would be fair to disallow calling memoized functions before their definitions are evaluated.

@cstjean
Copy link
Collaborator

cstjean commented Jan 24, 2021

Calling a local function before defining it is bad style as far as I'm concerned, so I wouldn't worry about it.

Comment on lines +80 to +108
local $cache = ($tail, $cache_dict)

$scope = nothing

if isdefined($__module__, $(QuoteNode(scope)))
function $f end

# If overwriting a method, empty the old cache.
# Notice that methods are hashed by their stored signature
try
local $meth = which($f, $tail)
if $meth.sig == $sig && isdefined($meth.module, :__memories__)
empty!(pop!($meth.module.__memories__, $meth.sig, (nothing, []))[2])
end
catch
end
end

$(combinedef(def_dict_unmemoized))
Base.@__doc__ $(combinedef(def_dict))
local $result = Base.@__doc__($(combinedef(def_dict)))

if isdefined($__module__, $(QuoteNode(scope)))
if !@isdefined __memories__
__memories__ = Dict()
end
# Store the cache so that it can be emptied later
local $meth = $which($f, $tail)
__memories__[$meth.sig] = $cache
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm wrong, but this is where I believe that my approach will be just a few extra lines (mostly just to build the cache name with Symbol(function_name, arg_names...) instead of ~20. Macro complexity is especially bad, so unless my suggested approach has a fatal flaw, I believe it will be the way I will take.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose our difference of opinion lies in the trade-off between correctness and complexity. I suppose I'll just put my code in a separate package. Thanks for the helpful discussion!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose our difference of opinion lies in the trade-off between correctness and complexity.

Yes, worse is better comes to mind.

I'm sorry that it took so long to reach an impasse. Thank you for all your efforts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants