WIP: Refactor decorators in git utils #296

francesco086 · 2022-10-26T16:14:23Z

Contributes to #25

francesco086 · 2022-10-26T16:21:57Z

Hi @aguschin , here a first draft of what I am changing, for now just worked on the cloning decorator. I am writing few points to guide you:

in git_utils:
- changed the signature of the cloning decorator. It needs: a. the repo b. a controller to decide when to clone c. arguments to feed the controller with
- renamed @clone_on_remote_repo -> @clone, I think that its arguments make very clear what happens
- introduced _check_required_args_are_provided, that can be re-used in the other decorators
in test_git_utils:
- having all the tests in a single file is really painful, I tried at least to cluster them in classes. I still think that separate files would have been clearer, but I don't want to refactor all tests
in api:
- update the decorations
in test_api
- just check that the decorator is applied to all the commands that needs it to support remote repos

If you think that this change looks good, I can apply the same to all other decorators.

francesco086 · 2022-11-01T08:45:50Z

On a second thought, I believe that passing the controller and its arguments to the cloning decorator is too much.
I should have stuck to the original idea. Also, perhaps, clone_if_remote instead of clone is necessary to make the code readable...? (I ended up including the controller in the arguments to keep the name short, because I still found it not explicit enough).

However, while working it out, I realized that all decorators in the git_utils module have the same structure:

they use some arguments given to f, and they should check if they were indeed provided
there is a if condition (what I called controller), if false run f as usual, otherwise do something else

Probably I could abstract it away to avoid code repetition. What do you think?

skshetry · 2022-11-01T09:00:46Z

gto/api.py

+@clone(repo_arg="repo", controller=is_url_of_remote_repo, controller_args=["repo"])
 def check_ref(repo: Union[str, Repo], ref: str):
    """Find out what have been registered/assigned in the provided ref"""
    reg = GitRegistry.from_repo(repo)
    return reg.check_ref(ref)


I may be missing something here, but what is preventing us to do something like follows?

Suggested change

@clone(repo_arg="repo", controller=is_url_of_remote_repo, controller_args=["repo"])

def check_ref(repo: Union[str, Repo], ref: str):

"""Find out what have been registered/assigned in the provided ref"""

reg = GitRegistry.from_repo(repo)

return reg.check_ref(ref)

def check_ref(repo: Union[str, Repo], ref: str):

"""Find out what have been registered/assigned in the provided ref"""

with open_repo(repo) as r:

reg = GitRegistry.from_repo(r)

return reg.check_ref(ref)

Don't you mean to remove decorator at all here? So open_repo will actually clone repo to temp dir if needed? Or didn't get you right here?

Sorry, fixed.

We had a long discussion about it in #25. We decided it's better to use decorator instead of writing this. It looked like a good way to do this at that moment. Maybe it's time to revisit that decision.

Btw, if we need to also pass params like --commit and --push (to automatically commit and push commit or tag), could it work like with open_repo(repo, commit=commit, push=push) as r: ...?

@francesco086, do you think it would make the tests much easier? I didn't have time to dig into your tests or try to rewrite them, so relying on your opinion here.

I am sorry but I am lost, what would make tests easier? Option 1, 2, or 3 in #296 (comment)? Could you perhaps rephrase? I am not sure I understand what is it that you mean now.

The last suggestion that you formulated, i.e. use with gto.registry.GitRegistry.from_repo(r) as reg: instead of decorators. Asking because tests complexity + "it's not right" feeling were main motivations for this discussion IMO)

Ok, now I understand, thanks :)

The answer is: it depends on how we are going to add the remote functionalities to GitRegistry.from_repo(...). For what concern the testing difficulties, this change means only to move the target (if we keep the decorators) :D

This would reduce the number of tests for the cloning functionality, because that would happen only in from_repo(). All other functionalities (commit, push) will have to be checked on all (or at least, most) of the GitRegistry methods. And if we keep using decorators, there will be the same difficulties.

But, to be honest, when I look at the code itself, I very much like the decorator approach, because there is a very neat separation of concern. What I find very cumbersome is to test them, but maybe it's me, I don't know how to do it in the right way. Perhaps this could be formulated in this way (in simple terms):

@g def f(): ...

Suppose we are working TDD. Suppose that I fully tested f when g does nothing (e.g. in our case, if the repo is not remote) and I tested that when I decorate any function with g, everything works as expected. So, I can trust f and g taken singularly. What is the "right" way to ensure that f has been actually decorated with g? The tests I have before would pass whether f is decorated or not. What is the smallest test I need to add that would fail, and therefore would force me to decorate f with g?

So far the only answer to this question seemed to be inspect.getsource and check that there is a decorator. But I find this approach of parsing the source code a little weird...

One thing more (I hope not to confuse you, please just skip it if this is the case): I talked about TDD according to the London school. If we take the Chicago school, then probably we could say "we don't check the decorators with unittest because they are private methods, and rather check the functionalities in the public ones (api)." In practice this would mean to leave the tests of the api as they were, and delete the tests of the decorators in git_utils. Even though this may sound simple, it actually would lead to a higher complexity if one wants to test everything properly (one should check all possible combinations of inputs, and by adding remote_or_not, commit, and push, means 8 times more test). Much easier to test 2 cases (remote_or_not), then other 2 cases (commit), then other 2 cases (push), separately, and then make sure that they combine properly.

In short, if we change the decorators to context managers (notice: external to api and registry, so not by including them in GitRegistry), tests may get simpler. At the price of changing the code. If someone has a good idea to solve the testing problem I explained above with f and g, that would be ideal...
With this said, I have nothing against "moving the target" and decorate GitRegistry instead of the api functions, if this helps performance etc.

My suggestion, considering all:

move the remote repo functionalities into GitRegistry directly (so, suggestion from @skshetry will be realized)

I am unsure whether to keep decorators or to switch to context managers. When I am unsure, I prefer to be lazy, so eventually I think I would stick to the decorators.

But I leave the decision to you :)

What about grouping tests into classes, is that ok? Like in here: https://github.com/iterative/gto/pull/296/files#diff-dca8d5901c13c8fdb3912ed1fa77246f037bdda082b903fac553851476779773R31

aguschin · 2022-11-01T09:20:50Z

Thanks for great summary @francesco086. It helped a lot. WDYT about @skshetry suggestion above?

francesco086 · 2022-11-01T11:27:25Z

At the beginning I liked very much the idea of the decorator because it is not intrusive (I didn't need to change the functions in api). If I remember right, this was the reason why we decided to go for decorators, we didn't want to add a with statement in each gto command.

However, I found quite painful to work with decorators, especially for what concerns the tests. But perhaps it comes from my inexperience with them.

In the end I don't know what is best :D I would let you decide

francesco086 · 2022-11-01T11:33:19Z

PS: instead of with open_repo(repo, commit=commit, push=push) as r: ... one could write a decorator as @open_repo(repo_arg="repo", commit_arg="commit", push_arg="push").

I'm not very convinced about the name open_repo though, but also in this case I would leave the decision to you.

aguschin · 2022-11-01T11:56:23Z

we didn't want to add a with statement in each gto command

Yes, this was one of the reasons as I remember it. Besides CLI commands, Studio will have to refactor a lot of code, if using with will be the only option to work with local repos. Maybe using with for remote repos and keeping an option to don't use with for local ones? Not sure will it look good from the GTO internals perspective though.

Ok, let's see what @skshetry have in mind. I'll play around with this in meantime to see how it can be like.

francesco086 · 2022-11-01T12:30:14Z

One question/suggestion: what if open_repo(repo) (or the corresponding as decorator), would ALWAYS return a gitpython repo?
Meaning, that one can remove GitRegistry.from_repo(r).

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

Contra I see: tight binding with gitpython (which is there already, but this would reinforce it).

skshetry · 2022-11-01T12:32:09Z

Basically it would be a generalized GitRegistry.from_repo(r), including the remote repos.

That's what I meant by open_repo, that's how we use in dvc with Repo.open() that always gives you a Repo instance:
#296 (comment)

francesco086 · 2022-11-04T12:35:08Z

Hi @aguschin ! I am living with a bad conscience about the code I left behind me.

Any decision on which direction to take? Just let me know what you consider best.

aguschin · 2022-11-07T11:35:28Z

Hi @francesco086! Sorry, simply didn't have time to investigate this. Will take a look soon.

codecov-commenter · 2022-11-10T08:09:27Z

Codecov Report

Base: 86.71% // Head: 86.71% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (7978a8e) compared to base (c2985fd).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #296      +/-   ##
==========================================
- Coverage   86.71%   86.71%   -0.01%     
==========================================
  Files          16       16              
  Lines        2025     2032       +7     
==========================================
+ Hits         1756     1762       +6     
- Misses        269      270       +1

Impacted Files	Coverage Δ
gto/cli.py	`72.69% <ø> (-0.33%)`	⬇️
gto/api.py	`93.87% <100.00%> (ø)`
gto/git_utils.py	`99.30% <100.00%> (+0.03%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

francesco086 · 2022-11-10T08:59:12Z

Hi @aguschin , first of all I wanted to thank you for the discussion we had, it is really expanding my understandings :)

I pushed a minor change with a simplification of the clone decorator, and renamed it remote_to_local.

Following our discussion I thought "for now we took a functional programming style", let's try to see how it would work if pushed to its consequences.
So I made a little experiment that you can see here: https://github.com/francesco086/gto/blob/pushed_functional_programming/gto/git_utils.py#L62 (check the lines above to understand what happens)
I decompose remote_to_local into building blocks ("extract until you drop"), in a functional programming fashion and using decorators. I find it quite enlightening, and I am quite satisfied to see how it worked :)

Don't know if this will help us, just thought it was interesting enough to share :)

francesco086 · 2022-11-10T09:20:42Z

...aaaaaand I think I got it! This is how you can assert that a decorator has been called: https://github.com/francesco086/gto/blob/test_decorator_call/tests/fp_test.py#L17

I don't know what do you think. In my opinion, it was interesting to see, but I am getting the strong impression that python is not really functional programming friendly :/ For each function you need its "core" implementation separated, eventually using the decorator notation (@...) itself would not work, because you need to be able to reference the non-decorated function....

Shall we switch to object oriented (and in this case, context managers)?

aguschin · 2022-11-10T09:56:20Z

Sorry, I'm not that fast 😂 was trying to understand your examples 😅

Thanks for thorough explanation about why tests are complex. I checked out your examples and it indeed looks like we lack some simple ways to check that functions are decorated.
Looks like moving clone logic to GitRegistry.from_repo should be easier to test, since we don't need to test all API methods I believe.
Your example with taking functional approach to its limits is very interesting indeed. And the last one with testing that the function is decorated - as well. IMHO, at this level of complexity I'd prefer this to be done with context managers. (TBH, from my side this is just the lack of knowledge on how to do it properly in functional style).
About joining tests in classes. Sure, you can do it like that is that's easier. I hope at some point I'll refactor all the tests to be written in the same fashion - but definitely not today 😂

TDD according to the London school. If we take the Chicago school

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

francesco086 · 2022-11-10T10:03:14Z

Could you maybe share something for a novice to read on that? Would be great to get some basic knowledge about TDD and these nuances, if they're important :)

Sure, I would recommend this one about the difference between London and Chicago school: https://youtu.be/_S5iUf0ANyQ
If I don't remember wrong the video also mentions where to learn TDD (Kent Beck book).

I first learned it from Uncle Bob in his video lectures on Clean Code, but don't have a O'Reilly account anymore and can't point to it anymore sry :(

francesco086 · 2022-11-12T18:28:47Z

In my stubbornness, I came across at again another "discovery" about decorators. I was wondering: can you reference a non-decorated function? How?

...and when you are stubborn enough to check the python-lang source code, you find out that the reference to the original function (the no_decorator) is done by functools.wraps here, it's __wrapped__. So:

from functools import wraps

def g(f):
    @wraps(f)
    def wrapped_f(*args, **kwargs):
        print("decorated!")
        return f(*args, **kwargs)

    return wrapped_f

@g
def f():
    print("I am f")

f.__wrapped__()
>> I am f

Here we go! I think this is good to know

francesco086 · 2022-11-12T21:26:46Z

The other piece I needed for what I wanted to do (actually, the most important), a get_applied_decorators function.
Showcase:

import ast
import inspect
from dataclasses import dataclass
from functools import wraps
from typing import Callable


def g(a):
    def wrap(f):
        @wraps(f)
        def wrapped_f(*args, **kwargs):
            return f(*args, **kwargs)
        return wrapped_f
    return wrap


@g("x")
@g(a="x")
def f(x):
    pass


@dataclass
class AppliedDecorator:
    f: str
    args: list
    kwargs: dict

def get_applied_decorators(f: Callable):
    o = ast.parse(inspect.getsource(f))
    ast_f = [c for c in ast.iter_child_nodes(o) if c.name == f.__name__][0]
    return [
        AppliedDecorator(f=d.func.id,
                         args=[a.value for a in d.args],
                         kwargs={k.arg: k.value.value for k in d.keywords})
        for d in ast_f.decorator_list
    ]


dec = get_applied_decorators(f)
print(dec)

>> [AppliedDecorator(f='g', args=['x'], kwargs={}), AppliedDecorator(f='g', args=[], kwargs={'a': 'x'})]

Unfortunately it requires parsing the source code :( But thanks to the module ast it is actually reasonable. Just quite some boilerplate code. I must say that I find strange that such functionalities are not built-in in python, perhaps in the inspect module...

… git_utils.clone to take repo_arg

…erface

francesco086 · 2022-11-25T09:21:38Z

Hi @aguschin , I started working out the alternative approach, here a draft PR where you can have a look at the changes:
francesco086#3

Notice how many rather intrusive changes were required. The most annoying part is init_index_manager, which I only started to address.

Notice also that the same exact changes were required for GitRegistry and EnrichmentManager. To avoid code duplication I introduced FromRemoteRepoMixin, which is basically the equivalent of @clone_on_remote_repo, just less general.

How does it look in your opinion? Do you think it's a promising direction?

francesco086 · 2022-12-06T08:50:08Z

I close this one in favor of #309

aguschin requested review from skshetry and aguschin November 1, 2022 08:54

aguschin mentioned this pull request Nov 1, 2022

set black line-length to 120 to improve readibility #291

Closed

skshetry reviewed Nov 1, 2022

View reviewed changes

francesco086 force-pushed the refactor_decorators_in_git_utils branch from 7f03654 to ca243a9 Compare November 10, 2022 08:17

Francesco Calcavecchia added 5 commits November 25, 2022 06:18

rename git_utils.clone_on_remote_repo -> git_utils.clone and refactor…

3607112

… git_utils.clone to take repo_arg

clone receives a controller and a controller_args

67c6499

introduce and use _check_required_args_are_provided in git_utils

2562f58

skip test for windows

5b59f3d

rename "clone" -> "remote_to_local" in git_utils and simplify its int…

7978a8e

…erface

francesco086 force-pushed the refactor_decorators_in_git_utils branch from ca243a9 to 7978a8e Compare November 25, 2022 05:19

francesco086 closed this Dec 6, 2022

aguschin mentioned this pull request Dec 6, 2022

Remote git registry #309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Refactor decorators in git utils #296

WIP: Refactor decorators in git utils #296

francesco086 commented Oct 26, 2022

francesco086 commented Oct 26, 2022 •

edited

Loading

francesco086 commented Nov 1, 2022 •

edited

Loading

skshetry Nov 1, 2022 •

edited

Loading

aguschin Nov 1, 2022 •

edited

Loading

skshetry Nov 1, 2022

aguschin Nov 1, 2022

aguschin Nov 1, 2022

aguschin Nov 9, 2022

francesco086 Nov 9, 2022

aguschin Nov 9, 2022 •

edited

Loading

francesco086 Nov 9, 2022

francesco086 Nov 9, 2022 •

edited

Loading

aguschin commented Nov 1, 2022

francesco086 commented Nov 1, 2022

francesco086 commented Nov 1, 2022

aguschin commented Nov 1, 2022

francesco086 commented Nov 1, 2022

skshetry commented Nov 1, 2022

francesco086 commented Nov 4, 2022

aguschin commented Nov 7, 2022

codecov-commenter commented Nov 10, 2022 •

edited

Loading

francesco086 commented Nov 10, 2022

francesco086 commented Nov 10, 2022 •

edited

Loading

aguschin commented Nov 10, 2022

francesco086 commented Nov 10, 2022

francesco086 commented Nov 12, 2022 •

edited

Loading

francesco086 commented Nov 12, 2022

francesco086 commented Nov 25, 2022

francesco086 commented Dec 6, 2022

WIP: Refactor decorators in git utils #296

WIP: Refactor decorators in git utils #296

Conversation

francesco086 commented Oct 26, 2022

francesco086 commented Oct 26, 2022 • edited Loading

francesco086 commented Nov 1, 2022 • edited Loading

skshetry Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

aguschin Nov 1, 2022 • edited Loading

Choose a reason for hiding this comment

skshetry Nov 1, 2022

Choose a reason for hiding this comment

aguschin Nov 1, 2022

Choose a reason for hiding this comment

aguschin Nov 1, 2022

Choose a reason for hiding this comment

aguschin Nov 9, 2022

Choose a reason for hiding this comment

francesco086 Nov 9, 2022

Choose a reason for hiding this comment

aguschin Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

francesco086 Nov 9, 2022

Choose a reason for hiding this comment

francesco086 Nov 9, 2022 • edited Loading

Choose a reason for hiding this comment

aguschin commented Nov 1, 2022

francesco086 commented Nov 1, 2022

francesco086 commented Nov 1, 2022

aguschin commented Nov 1, 2022

francesco086 commented Nov 1, 2022

skshetry commented Nov 1, 2022

francesco086 commented Nov 4, 2022

aguschin commented Nov 7, 2022

codecov-commenter commented Nov 10, 2022 • edited Loading

Codecov Report

francesco086 commented Nov 10, 2022

francesco086 commented Nov 10, 2022 • edited Loading

aguschin commented Nov 10, 2022

francesco086 commented Nov 10, 2022

francesco086 commented Nov 12, 2022 • edited Loading

francesco086 commented Nov 12, 2022

francesco086 commented Nov 25, 2022

francesco086 commented Dec 6, 2022

francesco086 commented Oct 26, 2022 •

edited

Loading

francesco086 commented Nov 1, 2022 •

edited

Loading

skshetry Nov 1, 2022 •

edited

Loading

aguschin Nov 1, 2022 •

edited

Loading

aguschin Nov 9, 2022 •

edited

Loading

francesco086 Nov 9, 2022 •

edited

Loading

codecov-commenter commented Nov 10, 2022 •

edited

Loading

francesco086 commented Nov 10, 2022 •

edited

Loading

francesco086 commented Nov 12, 2022 •

edited

Loading