Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a non-"set" dirwalk::Options method that took self by reference #1721

Merged
merged 2 commits into from
Dec 11, 2024

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Dec 10, 2024

The bug

While looking into the (quite different) bug reported in #1629, I initialized an Options instance like:

let options: gix::dirwalk::Options = repo
    .dirwalk_options()?
    .recurse_repositories(false)
    .emit_pruned(false)
    .emit_ignored(None)
    .for_deletion(None)
    .emit_untracked(gix::dir::walk::EmissionMode::Matching)
    .emit_empty_directories(false)
    .classify_untracked_bare_repositories(true)
    .emit_collapsed(None)
    .symlinks_to_directories_are_ignored_like_directories(false);

Before the symlinks_to_directories_are_ignored_like_directories(false) call was added, I was able to pass it to dirwalk_iter, but then I was unable to do so. Adding that last call changed the type of options from Options to &mut Options.

It turns out that, unlike the other gix::dirwalk::Options methods whose names don't start with set_, which take and return self by value, the symlinks_to_directories_are_ignored_like_directories method takes and returns self by mutable reference, which is to say that it is inadvertently equivalent to its set_* counterpart.

The fix

The methods of gix::dirwalk::Options are paired, where for each option X of Options, a method named like X takes and returns self by value, and a method set_X takes and returns self by mutable reference.

But in symlinks_to_directories_are_ignored_like_directories, both took self by mutable reference. This fixes that. The effect of this fix is to allow building Options with a call to that method as the last factory method call (and using it where Options is accepted but &mut Options is not).

Most code that consumes the crate should be unaffected, but:

  • Code where calls were ordered unnaturally to avoid putting such a call last should be able to be improved.
  • Code that used the method like its set_* countepart set_symlinks_to_directories_are_ignored_like_directories will be broken. That's what makes this fix a breaking change. Any such code can be fixed by modifying it to call the set_* version instead, which is probably what would have been intended anyway.

I didn't write a test

I'm not sure if a test is desired for this. I lean toward thinking that no test is needed. But it shouldn't be hard to write a test that "fails" by failing to compile in the absence of this change, along the lines of the above code.

Other changes included here

This also renames some of those methods' parameters more systematically. This was motivated by those two methods' non-self parameters being named differently, but this second change goes slightly beyond fixing that.

The methods of `gix::dirwalk::Options` are paired, where for each
option `X` of `Options`, a method named like `X` takes and returns
`self` by value, and a method `set_X` takes and returns `self` by
mutable reference.

But in `symlinks_to_directories_are_ignored_like_directories`, both
took `self` by mutable reference. This fixes that. The effect of
this fix is to allow building `Options` with a call to that method
as the last factory method call (and using it where `Options` is
accepted but `&mut Options` is not).

Most code that consumes the crate should be unaffected, but:

- Code where calls were ordered unnaturally to avoid putting such
  a call last should be able to be improved.

- Code that used the method like its `set_*` countepart
  `set_symlinks_to_directories_are_ignored_like_directories` will
  be broken. That's what makes this fix a breaking change. Any such
  code can be fixed by modifying it to call the `set_*` version
  instead, which is probably what would have been intended anyway.
This adjusts the names of parameters to `X` and `set_X` methods of
`gix::dirwalk::Options` (where `X` is an option name) to use a
systematic naming convention:

- For the same option `X`, the `X` and `set_X` methods now always
  have the same name of the parameter that specifies a value for an
  option.

- Options whose type is `bool` are named `toggle`, in keeping with
  the prevailing convention in this code.

- Options of `Option` type are named `value` (this required no
  changes).

- Options of a non-`Option` type `*Mode` -- currently this is just
  `EmissionMode` -- are named `mode`.
@EliahKagan EliahKagan force-pushed the run-ci/dirwalk-options branch from 149d310 to c0f4da5 Compare December 10, 2024 20:17
@EliahKagan EliahKagan marked this pull request as ready for review December 10, 2024 20:18
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, a great catch!

Also just to throw it out there, somewhat unrelated: I feel I have an unsolved problem with 'overhead'. In order to make gix stable, I basically have to wrap every type coming up from plumbing to make it extendable. This can mean to use the builder pattern everywhere, which is quite some busy-work, which in turn is error prone. There are crates for this, but I was afraid of using them even though they might as well be the answer.

Finally, there is a problem with error handling as the deeply nested error types cannot ever be stable for my definition of stability, unless they are all marked non-exhaustive. Also, having to maintain these error types by hand is cumbersome, even though it's doable. The biggest problem I see there is that it's clumsy to match on it, and it's an annoyance to have to find places for all test types.

I feel that choosing a taxonomy of nested modules is the right way, but the cost of it in terms of complexity is very high. Interestingly I can find types often, but not always, and seeing a flat structure like git2 at least gives the impression that one can easily find out what's there. It has other problems, but it's always like gix is more difficult to use and this type hierarchy might be a part of that. Probably that's not going to change though, but I think I won't get around to review each and every Platform that is there and see if there should be platform-less versions of that, even if it means to collect into a Vec right away. In practice, git2 does this a lot and it's fast enough, and most applications have no time to squeeze out every last inkling of performance.

Alright, unrelated brain-dump is done now :D.

@Byron Byron merged commit cd9060a into GitoxideLabs:main Dec 11, 2024
20 checks passed
@EliahKagan EliahKagan deleted the run-ci/dirwalk-options branch December 11, 2024 10:14
@EliahKagan
Copy link
Member Author

EliahKagan commented Dec 11, 2024

There are crates for this, but I was afraid of using them even though they might as well be the answer.

What's the potential downside of using them?

Probably that's not going to change though, but I think I won't get around to review each and every Platform that is there and see if there should be platform-less versions of that, even if it means to collect into a Vec right away.

Why are they called Platform?

Alright, unrelated brain-dump is done now :D.

I hope you will not mind if I brain-dump in turn. :) I had considered opening and issue or discussion question for this, but I think the most relevant context for it is actually the changes in this PR.

In c0f4da5, I'm not sure I picked the best naming convention for builder method parameters. Actually, I did not try to pick the best convention. My goal was to include the change

-    pub fn set_symlinks_to_directories_are_ignored_like_directories(&mut self, value: bool) -> &mut Self {
-        self.symlinks_to_directories_are_ignored_like_directories = value;
+    pub fn set_symlinks_to_directories_are_ignored_like_directories(&mut self, toggle: bool) -> &mut Self {
+        self.symlinks_to_directories_are_ignored_like_directories = toggle;

so it would be clear that the only important distinction between the set and non-set versions was in how self was taken/returned. Since toggle was the parameter name for the other builder method parameters of type bool, both here and elsewhere in gitoxide, I changed value to toggle.

Although toggle is the established convention in this code base for such parameters, I recall having initially been unsure at the semantics of methods with a toggle parameter, because one of the meanings of "toggle" is to change something to whichever of two states it is not in already. For example, if I say "this button toggles light and dark mode," or "this is the light/dark mode toggle," it means that pressing it when in light mode changes to dark mode, and pressing it while in dark mode changes to light mode.

That's not what is going on in bool-valued builder methods. For example, calling emit_pruned(true) does not necessarily change whether pruned entries will be emitted; even if it was already set to true, it remains set to true. Likewise, emit_pruned(false) does not necessarily refrain from changing whether pruned entries will be emitted; if it was set to true before, it will be changed to false.

But other choices of names based on the type being bool, such as enable, enabled, and flag, have their own arguments against them.

I wonder if value should be used for all such parameters of builder methods, of all types including bool, at least when they are always directly setting the corresponding struct member to the same value and type.

Another convention, which personally I like, is to use the same name for the parameter as for the member being set. So, for example:

pub fn emit_pruned(mut self, emit_pruned: bool) -> Self
pub fn set_emit_pruned(&mut self, emit_pruned: bool) -> &mut Self

The question is just whether that would be considered too unwieldy when the name is long:

pub fn symlinks_to_directories_are_ignored_like_directories(mut self, symlinks_to_directories_are_ignored_like_directories: bool) -> Self
pub fn set_symlinks_to_directories_are_ignored_like_directories(&mut self, symlinks_to_directories_are_ignored_like_directories: bool) -> &mut Self

When there are paired X and set_X methods for an option X, this approach of naming the parameter X has the slight further advantage of potentially helping to prevent the mistake of thinking that the "set_" part is part of the option name (i.e. descriptive of future behavior controlled by the option, rather than of the action right now of specifying the option).

Some editors show the parameter name (which provides no extra information in this context), and on many screens one would have to scroll right to see the value. On the other hand, some kinds of mistakes, such as adjusting the wrong method call when several are present, might be less likely to make. Overall, I would consider the increased verbosity a slight disadvantage of this approach.

Anyway, I am not worried about any other occurrences of "toggle" besides those for bool-valued parameters. For example, the phrase "feature toggle" is used throughout the documentation. That doesn't bother me at all. Unlike parameters named toggle, it didn't confuse me or make my doubt my understanding. The only plausible meaning of "feature toggle" is that it's a feature one can turn on or off.

(To be clear, I consider "toggle" to be correct everywhere it is currently used in gitoxide. My concern about it as a parameter name is not that it is incorrect, but that its specific meaning there is not obvious, since it is one of multiple correct meanings that could plausibly apply.)

@Byron
Copy link
Member

Byron commented Dec 12, 2024

Thanks for sharing in return, it's much appreciated!

There are crates for this, but I was afraid of using them even though they might as well be the answer.

What's the potential downside of using them?

Besides adding another non-optional dependency for something that technically is a convenience, it's relying on an external crate to define quite a substantial portion of your API.
It's a matter of trust, but also a severe lack of knowledge about what's out there, and whether it's usable enough.

Probably that's not going to change though, but I think I won't get around to review each and every Platform that is there and see if there should be platform-less versions of that, even if it means to collect into a Vec right away.

Why are they called Platform?

That's the best name I could come up with for something that holds temporary, cached data to support more operations.

I wonder if value should be used for all such parameters of builder methods, of all types including bool, at least when they are always directly setting the corresponding struct member to the same value and type.

I can totally follow you here and interestingly also had value in mind even before reading this. Please do feel free to change the parameter names to establish a better convention.

When there are paired X and set_X methods for an option X, this approach of naming the parameter X has the slight further advantage of potentially helping to prevent the mistake of thinking that the "set_" part is part of the option name (i.e. descriptive of future behavior controlled by the option, rather than of the action right now of specifying the option).

Some editors show the parameter name (which provides no extra information in this context), and on many screens one would have to scroll right to see the value. On the other hand, some kinds of mistakes, such as adjusting the wrong method call when several are present, might be less likely to make. Overall, I would consider the increased verbosity a slight disadvantage of this approach.

I definitely can follow, and usually there are benefits in having descriptive parameters names. In this specific case with setters (consuming or not), I think though that the method name is descriptive enough to compensate for a brief parameter name.

Since three are tool-specific merits, in order to make a decision towards repeating the field name as parameter name, I think one would have to visualize/show how major editors of today handle display this.
It's definitely something that could be done prior to stabilisation, even though it would never be blocking it as changes there can be done at any time.

Personally I'd hope that tools of the future will be smart enough to show relevant context no matter the convention, based on an actual understanding of the codebase.

@EliahKagan
Copy link
Member Author

Since three are tool-specific merits, in order to make a decision towards repeating the field name as parameter name, I think one would have to visualize/show how major editors of today handle display this.

This is not really a visualization, but for VS Code, in Rust with the usual rust-analyzer extension, the parameter name is shown when the value is not a variable of the same name. As far as I can tell, it is not hidden based on matching the method name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants