Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design Doc - Explicit/Implicit and Builder Patterns #112

Closed
zbraniecki opened this issue Jun 4, 2020 · 11 comments · Fixed by #142
Closed

Design Doc - Explicit/Implicit and Builder Patterns #112

zbraniecki opened this issue Jun 4, 2020 · 11 comments · Fixed by #142
Assignees
Labels
C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Milestone

Comments

@zbraniecki
Copy link
Member

zbraniecki commented Jun 4, 2020

This issue is intended to verify that we have a consensus on the two areas I'd like to extend our design doc with.

Explicit vs Implicit memory management (clone/copy)

// BAD
impl PluralRule {
    fn new(locale: &Locale) -> PluralRules {
        PluralRules { locale: locale.clone() }
    }
}

// GOOD
impl PluralRule {
    fn new(locale: Locale) -> PluralRules {
        PluralRules { locale }
    }
}

Rationale: If we know we'll want to store a Clone of a struct, we should construct the API to get a cloned version already. This allows the reader of the user of this API to understand the memory cost.

Builder vs. Constructor Options

Builder

struct MyStruct {
  locale: Option<Locale>,
  minFractionDigits: Option<usize>,
  maxFractionDigits: Option<usize>,
  frozen: bool,
}

impl MyStruct {
    pub fn new() -> Self {
        Self {
            locale: None,
            minFractionDigits: None,
            maxFractionDigits: None,
            frozen: false
        }
    }
    
    pub fn with_locale(&mut self, locale: Locale) -> &mut Self {
        self.locale = Some(locale);
        self
    }

    pub fn build(&mut self) -> Result<&mut Self, _> {
        // validate the options
        self.frozen = true;
        self
    }
}

fn main() {
    let s = MyStruct::new()
        .withLocale(locale)
        .minFractionDigits(5)
        .maxFractionDigits(10)
        .build().expect("Building failed");
}

vs.

Constructor Options

#[derive(Default)]
struct MyStructOptions {
    pub minFractionDigits: usize,
    pub maxFractionDigits: usize,
}

impl MyStructOptions {
    pub fn calculateFractionDigits(min: usize, max: usize) {
        // do some calculations
        min..max
    }
    pub fn validate() -> bool {
         // validate the current options internal consistency
    }
}

struct MyStruct {
  locale: Locale,
  fractionDigits: Range<usize>,
}

impl MyStruct {
    pub fn new(locale: Locale, options: MyStructOptions) -> Result<Self, ()> {
        Self {
            locale: locale,
            fractionDigits: options.calculateFractionDigits()
        }
    }
}

fn main() {
    let mut options = MyStructOptions::default();
    options.setFractionMinMax(5, 10);
    assert!(options.validate());

    let s = MyStruct::new(locale, options).expect("Construction failed.");
}

I'm a proponent of the latter, and I'd recommend it for us, but I know that ICU uses the former and I've seen Rust APIs going both ways.
My reasons are that I like that MyStructOptions have their own logic, calculations and validations and the user can perform the options at any point and in stages, and instead of build step,
it would just get passed to the constructor of the final struct.
Also, the final struct doesn't need fields for intermittent data that gets computed out in the build step, is leaner and has less logic.
Finally, its mutability can be easier controlled separately from mutability of the options struct.

@sffc sffc added C-meta Component: Relating to ICU4X as a whole discuss Discuss at a future ICU4X-SC meeting T-docs-tests Type: Code change outside core library labels Jun 4, 2020
@filmil
Copy link
Contributor

filmil commented Jun 4, 2020

Hi folks. Here's my two cents.

I believe it is helpful to highlight the reasons and situations when one may prefer one construction approach over another. The tradeoffs have different weights depending on the programming language used. Without an attempt to argue any specific choice, here is a laundry list of some concerns that I noticed over the years.

  • Setters. I am seeing a lot of setter methods specifically in older-style C++. And EJB has poisoned Java with accidental setters so much that it's not even funny. I think this may have come out of early OO fascination with embedded state, and the fact that we didn't write a lot of multi-threading code in C++ for a long time.
  • Options-based initialization is useful when you are able to configure your object through data alone (i.e. no transforms needed). I am seeing options-based initialization in older C++ code, specifically in places that have code style forbidding constructors from doing "work". This I think is is because doing "work" could throw exceptions, which if handled incorrectly could lead to memory leaks. And in C++ it is all too easy to initialize an object in a way that exception recovery leads to memory leaks. C++11 and later make it possible to write code that doesn't suffer from that problem, but you can still reach for that footgun.
  • Builder-style initialization is useful in the following scenarios (the list is not exhaustive, just the ones that come to mind):
    1. You want to separate the "mutable API" from an immutable API. I found this useful in Java, which does not have const methods. It's also nice from the perspective of the user to not have to care about the mutable API if they don't need to mutate.
    2. You want to create a family of almost identical, but ever so slightly different objects. In practice, I don't think I ever have had a need to do this.
    3. In general, I guess, if you want to split your APIs up into topics: here's the API you use to create the thing; here's the API you use to use the thing.
    4. You need "generative" initialization, such as: initialize this Unicode set with every Unicode code point divisible by 2.
  • I personally dislike the "freeze" APIs: I believe this was convenient in the pre-C++11 world when making builder-style APIs was annoying. Internally they cause the object to have different representations, but retain the same API. With move semantics, it is as cheap as can be to produce a read-only version of the object, which has a smaller, read-only API, than to throw an exception if a mutable method is called on a frozen object. There should be no need today to have a "freeze" method.
  • Options approach allows tighter coupling between an object and its builder, which is convenient for the implementors, but in a large code base perhaps makes code analysis harder.
  • Options vs Builder: the constructor can mess with Options, and it typically can't mess with a builder.

tl;dr: we should probably check how much any of the above named concerns play a role in Rust. With my library user hat on, I prefer smaller API surfaces: if an object's interface can be separated cleanly into "mutable" and "immutable" parts, I'd go for that.

@zbraniecki
Copy link
Member Author

Thanks @filmil!

I think you're using slightly different vocabulary than I do when you describe your position. Can you help me coalesce on one? I named the two appraoches Builder pattern and Constructor Helper Struct, but I dislike the names, and I hope there are better ones.
You use Options vs Builder, but you say "Options approach alows tighter coupling between an object and its builder" which confuses me.

Can you help me name the two code snippets and if you think there's a third one, add it, so that we can use the consistent naming when expressing opinions?

@filmil
Copy link
Contributor

filmil commented Jun 4, 2020

Can you help me coalesce on one? I named the two appraoches Builder pattern and Constructor Helper Struct, but I dislike the names, and I hope there are better ones.

Well, I'm not a particularly qualified or well read arbiter of style. But you asked. I'll try but take it with a grain of salt.

  1. Builder pattern is widely documented and known, so we should probably keep using the name.
  2. Perhaps "constructor options" or just "options" could be a good common name for what you called "construction helper struct".
  • I think the "constructor" part is important to emphasize when it matters the constructor (as in, the object itself) has the control over what object gets made.
    • Contrast to Builder, where Builder is in control.
  • "options" may be simpler than "helper struct" just because it's usually already called "options" in prior art. Including the ECMA-402 APIs.
  • I am not aware of an established "trade name" for this. "Options" seems popular in libraries because it makes the constructor type signatures uniform, which is ergonomic for the user.

You use Options vs Builder, but you say "Options approach alows tighter coupling between an object and its builder" which confuses me.

Reflecting a bit on this, I guess the distinction I meant to make was that of what object is in control of the creation of Self.

  • In the Options approach, Self creates an instance of its own type by inspecting a passive Options struct and deciding based on that.
  • In the Builder approach, Self instance is constructed by Builder, and Builder gets to pick actively the state of Self, whether to validate or not etc.

So, rewriting my remark quoted above with this in mind, it would become "Options approach allows Self to stay in control of the way a Self instance is constructed, in contrast to a Builder approach".

HTH.

@zbraniecki
Copy link
Member Author

Thank you! Updated the names to make it easier to continue the conv.

@echeran
Copy link
Contributor

echeran commented Jun 5, 2020

I agree with everything said above. The way I've understood the role of builders has been more simplistic (without previous experience of the example of C++ memory leak in a constructor):

  • plug a hole in the language where using maps (associative data) is neither easy or idiomatic. Using associative data is great because it is flexible and extensible, as Filip detailed. Unlike Java, Rust seems to promote struct usage and has syntax for extending structs, etc. In Java, AutoValue fills that gap, and Protobuf does that cross-platform.
  • allow immutable data structures to have small controlled regions of mutability for efficiency's sake. Guava collections does this as do others.

I also like the model of starting off thinking of data/struct of interest as immutable except in controlled regions when necessary vs. the freeze() mechanism.

As far as naming, FWIW, the AutoValue, Protobuf, and Guava collections cases all use the "Builder" name. In the way I've thought about uses of builders, with the discussion of "Options" vs. "Builder", maybe Options corresponds to the first (associative data) and Builder corresponds to the second (building up immutable data structure of arbitrary shape). My opinion & guess is that from a user perspective, since the code using them looks similar, the benefit of having just one term instead of two would outweigh the precision of naming<->semantics of two terms.

@zbraniecki
Copy link
Member Author

@echeran do I understand correctly that from the snippets in my comment you suggest swapping the names? The first is "Options" and the latter is "Builder"?

@echeran
Copy link
Contributor

echeran commented Jun 5, 2020

I think the names you have in those 2 cases sound fine. I guess I'm just pointing out that I hadn't thought of builders/options in terms of where the work occurs (whether in the constructor or not), but rather builders as being both a thing to build up associative data and an immutable collection. I'm slightly leaning towards calling everything one name ("Builder" ?) to incur less mental overhead for the user, but that's a hunch, and so not a strong preference. If you feel more strongly about having "Builder" and "Options", then that's fine with me.

@EvanJP
Copy link
Contributor

EvanJP commented Jun 5, 2020

In terms of the choice between Builder and Options, I am also favored towards the idea of Options. All the Builder functions in ICU4C have a check for if the UnicodeSet isFrozen, so by having the Options instead, we would be performing the same function in a cleaner manner.

I am unsure of how users are currently using UnicodeSet (and I assume not many are doing this), but a potential drawback of using Options is that if a user wishes to use UnicodeSet functions on the UnicodeSet then modify it afterwards, having an Options constructor would mean the user would have to create multiple UnicodeSets. I would assume though, correct me if I am wrong, that most users would create a pattern and freeze the UnicodeSet before using any of the functions such as contains or span. As @filmil pointed out, however, freeze in ICU4C is causing UnicodeSet to optimize/compact itself while keeping the same API, which does not seem to be an issue with Rust.

tldr: What @zbraniecki has in the issue statement currently for Builder vs Options semantics is correct, and I agree with what everyone has said about using Options instead since freeze seems to be archaic. Something to be wary about is whether or not we would be requiring the user to create multiple UnicodeSets in unusual situations and if that is even a problem.

@zbraniecki
Copy link
Member Author

a potential drawback of using Options is that if a user wishes to use UnicodeSet functions on the UnicodeSet then modify it afterwards, having an Options constructor would mean the user would have to create multiple UnicodeSets.

Depending on how much intermediate information we need to store, we may be able to support it.

We could have UnicodeSet::ExtendWith(&mut self, ...) which would allow you to recalculate the inner set to include new data.

If the recalculation requires a lot of data we want to have on the Options but wouldn't want to carry in the UnicodeSet just in case someone wants to extend, then I'd be in favor of making it immutable and have a method to construct a new one from an old one + new options.
If it's possible to recompute from a computed fields + new info, then extend would work.

@sffc sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jun 12, 2020
@zbraniecki
Copy link
Member Author

I'm going to extend this with proposed details on Errors in ICU4X, and file a PR.

@sffc sffc added this to the 2020 Q2 milestone Jun 17, 2020
zbraniecki added a commit that referenced this issue Jun 20, 2020
Fixes #112.

I shied away from extending the `errors` bit because the changes are large enough as is, and I'd prefer to keep #118 for another revision later.
@zbraniecki
Copy link
Member Author

I ended up not including #118 in my PR. Will keep it for a separate PR.

zbraniecki added a commit that referenced this issue Jun 25, 2020
* Style Guide revision 2

Fixes #112.

I shied away from extending the `errors` bit because the changes are large enough as is, and I'd prefer to keep #118 for another revision later.

* Update style-guide.md

* Address Shane's feedback

* Reviewers feedback

* Update style-guide.md

* Update style-guide.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-meta Component: Relating to ICU4X as a whole T-docs-tests Type: Code change outside core library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants