Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Refactor decorators in git utils #296

Conversation

francesco086
Copy link
Contributor

Contributes to #25

@francesco086
Copy link
Contributor Author

francesco086 commented Oct 26, 2022

Hi @aguschin , here a first draft of what I am changing, for now just worked on the cloning decorator. I am writing few points to guide you:

  • in git_utils:
    • changed the signature of the cloning decorator. It needs: a. the repo b. a controller to decide when to clone c. arguments to feed the controller with
    • renamed @clone_on_remote_repo -> @clone, I think that its arguments make very clear what happens
    • introduced _check_required_args_are_provided, that can be re-used in the other decorators
  • in test_git_utils:
    • having all the tests in a single file is really painful, I tried at least to cluster them in classes. I still think that separate files would have been clearer, but I don't want to refactor all tests
  • in api:
    • update the decorations
  • in test_api
    • just check that the decorator is applied to all the commands that needs it to support remote repos

If you think that this change looks good, I can apply the same to all other decorators.

@francesco086
Copy link
Contributor Author

francesco086 commented Nov 1, 2022

On a second thought, I believe that passing the controller and its arguments to the cloning decorator is too much.
I should have stuck to the original idea. Also, perhaps, clone_if_remote instead of clone is necessary to make the code readable...? (I ended up including the controller in the arguments to keep the name short, because I still found it not explicit enough).

However, while working it out, I realized that all decorators in the git_utils module have the same structure:

  1. they use some arguments given to f, and they should check if they were indeed provided
  2. there is a if condition (what I called controller), if false run f as usual, otherwise do something else

Probably I could abstract it away to avoid code repetition. What do you think?

gto/api.py Outdated
Comment on lines 311 to 314
@clone(repo_arg="repo", controller=is_url_of_remote_repo, controller_args=["repo"])
def check_ref(repo: Union[str, Repo], ref: str):
"""Find out what have been registered/assigned in the provided ref"""
reg = GitRegistry.from_repo(repo)
return reg.check_ref(ref)
Copy link
Member

@skshetry skshetry Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something here, but what is preventing us to do something like follows?

Suggested change
@clone(repo_arg="repo", controller=is_url_of_remote_repo, controller_args=["repo"])
def check_ref(repo: Union[str, Repo], ref: str):
"""Find out what have been registered/assigned in the provided ref"""
reg = GitRegistry.from_repo(repo)
return reg.check_ref(ref)
def check_ref(repo: Union[str, Repo], ref: str):
"""Find out what have been registered/assigned in the provided ref"""
with open_repo(repo) as r:
reg = GitRegistry.from_repo(r)
return reg.check_ref(ref)

Copy link
Contributor

@aguschin aguschin Nov 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you mean to remove decorator at all here? So open_repo will actually clone repo to temp dir if needed? Or didn't get you right here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a long discussion about it in #25. We decided it's better to use decorator instead of writing this. It looked like a good way to do this at that moment. Maybe it's time to revisit that decision.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, if we need to also pass params like --commit and --push (to automatically commit and push commit or tag), could it work like with open_repo(repo, commit=commit, push=push) as r: ...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@francesco086, do you think it would make the tests much easier? I didn't have time to dig into your tests or try to rewrite them, so relying on your opinion here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry but I am lost, what would make tests easier? Option 1, 2, or 3 in #296 (comment)? Could you perhaps rephrase? I am not sure I understand what is it that you mean now.

Copy link
Contributor

@aguschin aguschin Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last suggestion that you formulated, i.e. use with gto.registry.GitRegistry.from_repo(r) as reg: instead of decorators. Asking because tests complexity + "it's not right" feeling were main motivations for this discussion IMO)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now I understand, thanks :)

The answer is: it depends on how we are going to add the remote functionalities to GitRegistry.from_repo(...). For what concern the testing difficulties, this change means only to move the target (if we keep the decorators) :D

This would reduce the number of tests for the cloning functionality, because that would happen only in from_repo(). All other functionalities (commit, push) will have to be checked on all (or at least, most) of the GitRegistry methods. And if we keep using decorators, there will be the same difficulties.

But, to be honest, when I look at the code itself, I very much like the decorator approach, because there is a very neat separation of concern. What I find very cumbersome is to test them, but maybe it's me, I don't know how to do it in the right way. Perhaps this could be formulated in this way (in simple terms):

@g
def f():
    ...

Suppose we are working TDD. Suppose that I fully tested f when g does nothing (e.g. in our case, if the repo is not remote) and I tested that when I decorate any function with g, everything works as expected. So, I can trust f and g taken singularly. What is the "right" way to ensure that f has been actually decorated with g? The tests I have before would pass whether f is decorated or not. What is the smallest test I need to add that would fail, and therefore would force me to decorate f with g?

So far the only answer to this question seemed to be inspect.getsource and check that there is a decorator. But I find this approach of parsing the source code a little weird...

One thing more (I hope not to confuse you, please just skip it if this is the case): I talked about TDD according to the London school. If we take the Chicago school, then probably we could say "we don't check the decorators with unittest because they are private methods, and rather check the functionalities in the public ones (api)." In practice this would mean to leave the tests of the api as they were, and delete the tests of the decorators in git_utils. Even though this may sound simple, it actually would lead to a higher complexity if one wants to test everything properly (one should check all possible combinations of inputs, and by adding remote_or_not, commit, and push, means 8 times more test). Much easier to test 2 cases (remote_or_not), then other 2 cases (commit), then other 2 cases (push), separately, and then make sure that they combine properly.

In short, if we change the decorators to context managers (notice: external to api and registry, so not by including them in GitRegistry), tests may get simpler. At the price of changing the code. If someone has a good idea to solve the testing problem I explained above with f and g, that would be ideal...
With this said, I have nothing against "moving the target" and decorate GitRegistry instead of the api functions, if this helps performance etc.

Copy link
Contributor Author

@francesco086 francesco086 Nov 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion, considering all:

  • move the remote repo functionalities into GitRegistry directly (so, suggestion from @skshetry will be realized)
  • I am unsure whether to keep decorators or to switch to context managers. When I am unsure, I prefer to be lazy, so eventually I think I would stick to the decorators.

But I leave the decision to you :)

What about grouping tests into classes, is that ok? Like in here: https://github.com/iterative/gto/pull/296/files#diff-dca8d5901c13c8fdb3912ed1fa77246f037bdda082b903fac553851476779773R31

@aguschin
Copy link
Contributor

aguschin commented Nov 1, 2022

Thanks for great summary @francesco086. It helped a lot. WDYT about @skshetry suggestion above?

@francesco086
Copy link
Contributor Author

At the beginning I liked very much the idea of the decorator because it is not intrusive (I didn't need to change the functions in api). If I remember right, this was the reason why we decided to go for decorators, we didn't want to add a with statement in each gto command.

However, I found quite painful to work with decorators, especially for what concerns the tests. But perhaps it comes from my inexperience with them.

In the end I don't know what is best :D I would let you decide

@francesco086
Copy link
Contributor Author

PS: instead of with open_repo(repo, commit=commit, push=push) as r: ... one could write a decorator as @open_repo(repo_arg="repo", commit_arg="commit", push_arg="push").

I'm not very convinced about the name open_repo though, but also in this case I would leave the decision to you.

@aguschin
Copy link
Contributor

aguschin commented Nov 1, 2022

we didn't want to add a with statement in each gto command

Yes, this was one of the reasons as I remember it. Besides CLI commands, Studio will have to refactor a lot of code, if using with will be the only option to work with local repos. Maybe using with for remote repos and keeping an option to don't use with for local ones? Not sure will it look good from the GTO internals perspective though.

Ok, let's see what @skshetry have in mind. I'll play around with this in meantime to see how it can be like.

@francesco086
Copy link
Contributor Author

One question/suggestion: what if open_repo(repo) (or the corresponding as decorator), would ALWAYS return a gitpython repo?
Meaning, that one can remove GitRegistry.from_repo(r).

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

Contra I see: tight binding with gitpython (which is there already, but this would reinforce it).

@skshetry
Copy link
Member

skshetry commented Nov 1, 2022

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

That's what I meant by open_repo, that's how we use in dvc with Repo.open() that always gives you a Repo instance:
#296 (comment)

@francesco086
Copy link
Contributor Author

Hi @aguschin ! I am living with a bad conscience about the code I left behind me.

Any decision on which direction to take? Just let me know what you consider best.

@aguschin
Copy link
Contributor

aguschin commented Nov 7, 2022

Hi @francesco086! Sorry, simply didn't have time to investigate this. Will take a look soon.

@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2022

Codecov Report

Base: 86.71% // Head: 86.71% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (7978a8e) compared to base (c2985fd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #296      +/-   ##
==========================================
- Coverage   86.71%   86.71%   -0.01%     
==========================================
  Files          16       16              
  Lines        2025     2032       +7     
==========================================
+ Hits         1756     1762       +6     
- Misses        269      270       +1     
Impacted Files Coverage Δ
gto/cli.py 72.69% <ø> (-0.33%) ⬇️
gto/api.py 93.87% <100.00%> (ø)
gto/git_utils.py 99.30% <100.00%> (+0.03%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@francesco086 francesco086 force-pushed the refactor_decorators_in_git_utils branch from 7f03654 to ca243a9 Compare November 10, 2022 08:17
@francesco086
Copy link
Contributor Author

Hi @aguschin , first of all I wanted to thank you for the discussion we had, it is really expanding my understandings :)

I pushed a minor change with a simplification of the clone decorator, and renamed it remote_to_local.

Following our discussion I thought "for now we took a functional programming style", let's try to see how it would work if pushed to its consequences.
So I made a little experiment that you can see here: https://github.com/francesco086/gto/blob/pushed_functional_programming/gto/git_utils.py#L62 (check the lines above to understand what happens)
I decompose remote_to_local into building blocks ("extract until you drop"), in a functional programming fashion and using decorators. I find it quite enlightening, and I am quite satisfied to see how it worked :)

Don't know if this will help us, just thought it was interesting enough to share :)

@francesco086
Copy link
Contributor Author

francesco086 commented Nov 10, 2022

...aaaaaand I think I got it! This is how you can assert that a decorator has been called: https://github.com/francesco086/gto/blob/test_decorator_call/tests/fp_test.py#L17

I don't know what do you think. In my opinion, it was interesting to see, but I am getting the strong impression that python is not really functional programming friendly :/ For each function you need its "core" implementation separated, eventually using the decorator notation (@...) itself would not work, because you need to be able to reference the non-decorated function....

Shall we switch to object oriented (and in this case, context managers)?

@aguschin
Copy link
Contributor

Sorry, I'm not that fast 😂 was trying to understand your examples 😅

  1. Thanks for thorough explanation about why tests are complex. I checked out your examples and it indeed looks like we lack some simple ways to check that functions are decorated.
  2. Looks like moving clone logic to GitRegistry.from_repo should be easier to test, since we don't need to test all API methods I believe.
  3. Your example with taking functional approach to its limits is very interesting indeed. And the last one with testing that the function is decorated - as well. IMHO, at this level of complexity I'd prefer this to be done with context managers. (TBH, from my side this is just the lack of knowledge on how to do it properly in functional style).
  4. About joining tests in classes. Sure, you can do it like that is that's easier. I hope at some point I'll refactor all the tests to be written in the same fashion - but definitely not today 😂

TDD according to the London school. If we take the Chicago school

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

@francesco086
Copy link
Contributor Author

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

Sure, I would recommend this one about the difference between London and Chicago school: https://youtu.be/_S5iUf0ANyQ
If I don't remember wrong the video also mentions where to learn TDD (Kent Beck book).

I first learned it from Uncle Bob in his video lectures on Clean Code, but don't have a O'Reilly account anymore and can't point to it anymore sry :(

@francesco086
Copy link
Contributor Author

francesco086 commented Nov 12, 2022

In my stubbornness, I came across at again another "discovery" about decorators. I was wondering: can you reference a non-decorated function? How?

...and when you are stubborn enough to check the python-lang source code, you find out that the reference to the original function (the no_decorator) is done by functools.wraps here, it's __wrapped__. So:

from functools import wraps

def g(f):
    @wraps(f)
    def wrapped_f(*args, **kwargs):
        print("decorated!")
        return f(*args, **kwargs)

    return wrapped_f

@g
def f():
    print("I am f")

f.__wrapped__()
>> I am f

Here we go! I think this is good to know

@francesco086
Copy link
Contributor Author

The other piece I needed for what I wanted to do (actually, the most important), a get_applied_decorators function.
Showcase:

import ast
import inspect
from dataclasses import dataclass
from functools import wraps
from typing import Callable


def g(a):
    def wrap(f):
        @wraps(f)
        def wrapped_f(*args, **kwargs):
            return f(*args, **kwargs)
        return wrapped_f
    return wrap


@g("x")
@g(a="x")
def f(x):
    pass


@dataclass
class AppliedDecorator:
    f: str
    args: list
    kwargs: dict

def get_applied_decorators(f: Callable):
    o = ast.parse(inspect.getsource(f))
    ast_f = [c for c in ast.iter_child_nodes(o) if c.name == f.__name__][0]
    return [
        AppliedDecorator(f=d.func.id,
                         args=[a.value for a in d.args],
                         kwargs={k.arg: k.value.value for k in d.keywords})
        for d in ast_f.decorator_list
    ]


dec = get_applied_decorators(f)
print(dec)

>> [AppliedDecorator(f='g', args=['x'], kwargs={}), AppliedDecorator(f='g', args=[], kwargs={'a': 'x'})]

Unfortunately it requires parsing the source code :( But thanks to the module ast it is actually reasonable. Just quite some boilerplate code. I must say that I find strange that such functionalities are not built-in in python, perhaps in the inspect module...

@francesco086 francesco086 force-pushed the refactor_decorators_in_git_utils branch from ca243a9 to 7978a8e Compare November 25, 2022 05:19
@francesco086
Copy link
Contributor Author

Hi @aguschin , I started working out the alternative approach, here a draft PR where you can have a look at the changes:
francesco086#3

Notice how many rather intrusive changes were required. The most annoying part is init_index_manager, which I only started to address.

Notice also that the same exact changes were required for GitRegistry and EnrichmentManager. To avoid code duplication I introduced FromRemoteRepoMixin, which is basically the equivalent of @clone_on_remote_repo, just less general.

How does it look in your opinion? Do you think it's a promising direction?

@francesco086
Copy link
Contributor Author

I close this one in favor of #309

@aguschin aguschin mentioned this pull request Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants