Design Doc - Explicit/Implicit and Builder Patterns #112

zbraniecki · 2020-06-04T18:18:00Z

This issue is intended to verify that we have a consensus on the two areas I'd like to extend our design doc with.

Explicit vs Implicit memory management (clone/copy)

// BAD
impl PluralRule {
    fn new(locale: &Locale) -> PluralRules {
        PluralRules { locale: locale.clone() }
    }
}

// GOOD
impl PluralRule {
    fn new(locale: Locale) -> PluralRules {
        PluralRules { locale }
    }
}

Rationale: If we know we'll want to store a Clone of a struct, we should construct the API to get a cloned version already. This allows the reader of the user of this API to understand the memory cost.

Builder vs. Constructor Options

Builder

struct MyStruct {
  locale: Option<Locale>,
  minFractionDigits: Option<usize>,
  maxFractionDigits: Option<usize>,
  frozen: bool,
}

impl MyStruct {
    pub fn new() -> Self {
        Self {
            locale: None,
            minFractionDigits: None,
            maxFractionDigits: None,
            frozen: false
        }
    }
    
    pub fn with_locale(&mut self, locale: Locale) -> &mut Self {
        self.locale = Some(locale);
        self
    }

    pub fn build(&mut self) -> Result<&mut Self, _> {
        // validate the options
        self.frozen = true;
        self
    }
}

fn main() {
    let s = MyStruct::new()
        .withLocale(locale)
        .minFractionDigits(5)
        .maxFractionDigits(10)
        .build().expect("Building failed");
}

vs.

Constructor Options

#[derive(Default)]
struct MyStructOptions {
    pub minFractionDigits: usize,
    pub maxFractionDigits: usize,
}

impl MyStructOptions {
    pub fn calculateFractionDigits(min: usize, max: usize) {
        // do some calculations
        min..max
    }
    pub fn validate() -> bool {
         // validate the current options internal consistency
    }
}

struct MyStruct {
  locale: Locale,
  fractionDigits: Range<usize>,
}

impl MyStruct {
    pub fn new(locale: Locale, options: MyStructOptions) -> Result<Self, ()> {
        Self {
            locale: locale,
            fractionDigits: options.calculateFractionDigits()
        }
    }
}

fn main() {
    let mut options = MyStructOptions::default();
    options.setFractionMinMax(5, 10);
    assert!(options.validate());

    let s = MyStruct::new(locale, options).expect("Construction failed.");
}

I'm a proponent of the latter, and I'd recommend it for us, but I know that ICU uses the former and I've seen Rust APIs going both ways.
My reasons are that I like that MyStructOptions have their own logic, calculations and validations and the user can perform the options at any point and in stages, and instead of build step,
it would just get passed to the constructor of the final struct.
Also, the final struct doesn't need fields for intermittent data that gets computed out in the build step, is leaner and has less logic.
Finally, its mutability can be easier controlled separately from mutability of the options struct.

The text was updated successfully, but these errors were encountered:

filmil · 2020-06-04T20:24:09Z

Hi folks. Here's my two cents.

I believe it is helpful to highlight the reasons and situations when one may prefer one construction approach over another. The tradeoffs have different weights depending on the programming language used. Without an attempt to argue any specific choice, here is a laundry list of some concerns that I noticed over the years.

Setters. I am seeing a lot of setter methods specifically in older-style C++. And EJB has poisoned Java with accidental setters so much that it's not even funny. I think this may have come out of early OO fascination with embedded state, and the fact that we didn't write a lot of multi-threading code in C++ for a long time.
Options-based initialization is useful when you are able to configure your object through data alone (i.e. no transforms needed). I am seeing options-based initialization in older C++ code, specifically in places that have code style forbidding constructors from doing "work". This I think is is because doing "work" could throw exceptions, which if handled incorrectly could lead to memory leaks. And in C++ it is all too easy to initialize an object in a way that exception recovery leads to memory leaks. C++11 and later make it possible to write code that doesn't suffer from that problem, but you can still reach for that footgun.
Builder-style initialization is useful in the following scenarios (the list is not exhaustive, just the ones that come to mind):
1. You want to separate the "mutable API" from an immutable API. I found this useful in Java, which does not have const methods. It's also nice from the perspective of the user to not have to care about the mutable API if they don't need to mutate.
2. You want to create a family of almost identical, but ever so slightly different objects. In practice, I don't think I ever have had a need to do this.
3. In general, I guess, if you want to split your APIs up into topics: here's the API you use to create the thing; here's the API you use to use the thing.
4. You need "generative" initialization, such as: initialize this Unicode set with every Unicode code point divisible by 2.
I personally dislike the "freeze" APIs: I believe this was convenient in the pre-C++11 world when making builder-style APIs was annoying. Internally they cause the object to have different representations, but retain the same API. With move semantics, it is as cheap as can be to produce a read-only version of the object, which has a smaller, read-only API, than to throw an exception if a mutable method is called on a frozen object. There should be no need today to have a "freeze" method.
Options approach allows tighter coupling between an object and its builder, which is convenient for the implementors, but in a large code base perhaps makes code analysis harder.
Options vs Builder: the constructor can mess with Options, and it typically can't mess with a builder.

tl;dr: we should probably check how much any of the above named concerns play a role in Rust. With my library user hat on, I prefer smaller API surfaces: if an object's interface can be separated cleanly into "mutable" and "immutable" parts, I'd go for that.

zbraniecki · 2020-06-04T20:34:36Z

Thanks @filmil!

I think you're using slightly different vocabulary than I do when you describe your position. Can you help me coalesce on one? I named the two appraoches Builder pattern and Constructor Helper Struct, but I dislike the names, and I hope there are better ones.
You use Options vs Builder, but you say "Options approach alows tighter coupling between an object and its builder" which confuses me.

Can you help me name the two code snippets and if you think there's a third one, add it, so that we can use the consistent naming when expressing opinions?

filmil · 2020-06-04T22:03:58Z

Can you help me coalesce on one? I named the two appraoches Builder pattern and Constructor Helper Struct, but I dislike the names, and I hope there are better ones.

Well, I'm not a particularly qualified or well read arbiter of style. But you asked. I'll try but take it with a grain of salt.

Builder pattern is widely documented and known, so we should probably keep using the name.
Perhaps "constructor options" or just "options" could be a good common name for what you called "construction helper struct".

I think the "constructor" part is important to emphasize when it matters the constructor (as in, the object itself) has the control over what object gets made.
- Contrast to Builder, where Builder is in control.
"options" may be simpler than "helper struct" just because it's usually already called "options" in prior art. Including the ECMA-402 APIs.
I am not aware of an established "trade name" for this. "Options" seems popular in libraries because it makes the constructor type signatures uniform, which is ergonomic for the user.

You use Options vs Builder, but you say "Options approach alows tighter coupling between an object and its builder" which confuses me.

Reflecting a bit on this, I guess the distinction I meant to make was that of what object is in control of the creation of Self.

In the Options approach, Self creates an instance of its own type by inspecting a passive Options struct and deciding based on that.
In the Builder approach, Self instance is constructed by Builder, and Builder gets to pick actively the state of Self, whether to validate or not etc.

So, rewriting my remark quoted above with this in mind, it would become "Options approach allows Self to stay in control of the way a Self instance is constructed, in contrast to a Builder approach".

HTH.

zbraniecki · 2020-06-04T22:38:25Z

Thank you! Updated the names to make it easier to continue the conv.

echeran · 2020-06-05T17:09:13Z

I agree with everything said above. The way I've understood the role of builders has been more simplistic (without previous experience of the example of C++ memory leak in a constructor):

plug a hole in the language where using maps (associative data) is neither easy or idiomatic. Using associative data is great because it is flexible and extensible, as Filip detailed. Unlike Java, Rust seems to promote struct usage and has syntax for extending structs, etc. In Java, AutoValue fills that gap, and Protobuf does that cross-platform.
allow immutable data structures to have small controlled regions of mutability for efficiency's sake. Guava collections does this as do others.

I also like the model of starting off thinking of data/struct of interest as immutable except in controlled regions when necessary vs. the freeze() mechanism.

As far as naming, FWIW, the AutoValue, Protobuf, and Guava collections cases all use the "Builder" name. In the way I've thought about uses of builders, with the discussion of "Options" vs. "Builder", maybe Options corresponds to the first (associative data) and Builder corresponds to the second (building up immutable data structure of arbitrary shape). My opinion & guess is that from a user perspective, since the code using them looks similar, the benefit of having just one term instead of two would outweigh the precision of naming<->semantics of two terms.

zbraniecki · 2020-06-05T18:17:47Z

@echeran do I understand correctly that from the snippets in my comment you suggest swapping the names? The first is "Options" and the latter is "Builder"?

echeran · 2020-06-05T18:50:09Z

I think the names you have in those 2 cases sound fine. I guess I'm just pointing out that I hadn't thought of builders/options in terms of where the work occurs (whether in the constructor or not), but rather builders as being both a thing to build up associative data and an immutable collection. I'm slightly leaning towards calling everything one name ("Builder" ?) to incur less mental overhead for the user, but that's a hunch, and so not a strong preference. If you feel more strongly about having "Builder" and "Options", then that's fine with me.

EvanJP · 2020-06-05T18:57:11Z

In terms of the choice between Builder and Options, I am also favored towards the idea of Options. All the Builder functions in ICU4C have a check for if the UnicodeSet isFrozen, so by having the Options instead, we would be performing the same function in a cleaner manner.

I am unsure of how users are currently using UnicodeSet (and I assume not many are doing this), but a potential drawback of using Options is that if a user wishes to use UnicodeSet functions on the UnicodeSet then modify it afterwards, having an Options constructor would mean the user would have to create multiple UnicodeSets. I would assume though, correct me if I am wrong, that most users would create a pattern and freeze the UnicodeSet before using any of the functions such as contains or span. As @filmil pointed out, however, freeze in ICU4C is causing UnicodeSet to optimize/compact itself while keeping the same API, which does not seem to be an issue with Rust.

tldr: What @zbraniecki has in the issue statement currently for Builder vs Options semantics is correct, and I agree with what everyone has said about using Options instead since freeze seems to be archaic. Something to be wary about is whether or not we would be requiring the user to create multiple UnicodeSets in unusual situations and if that is even a problem.

zbraniecki · 2020-06-06T23:31:50Z

a potential drawback of using Options is that if a user wishes to use UnicodeSet functions on the UnicodeSet then modify it afterwards, having an Options constructor would mean the user would have to create multiple UnicodeSets.

Depending on how much intermediate information we need to store, we may be able to support it.

We could have UnicodeSet::ExtendWith(&mut self, ...) which would allow you to recalculate the inner set to include new data.

If the recalculation requires a lot of data we want to have on the Options but wouldn't want to carry in the UnicodeSet just in case someone wants to extend, then I'd be in favor of making it immutable and have a method to construct a new one from an old one + new options.
If it's possible to recompute from a computed fields + new info, then extend would work.

zbraniecki · 2020-06-12T18:48:37Z

I'm going to extend this with proposed details on Errors in ICU4X, and file a PR.

Fixes #112. I shied away from extending the `errors` bit because the changes are large enough as is, and I'd prefer to keep #118 for another revision later.

zbraniecki · 2020-06-20T20:28:17Z

I ended up not including #118 in my PR. Will keep it for a separate PR.

* Style Guide revision 2 Fixes #112. I shied away from extending the `errors` bit because the changes are large enough as is, and I'd prefer to keep #118 for another revision later. * Update style-guide.md * Address Shane's feedback * Reviewers feedback * Update style-guide.md * Update style-guide.md

zbraniecki mentioned this issue Jun 4, 2020

UnicodeSet implementation & API #91

Closed

sffc added C-meta Component: Relating to ICU4X as a whole discuss Discuss at a future ICU4X-SC meeting T-docs-tests Type: Code change outside core library labels Jun 4, 2020

sffc assigned zbraniecki Jun 4, 2020

sffc mentioned this issue Jun 5, 2020

Add JSON data provider with sample number data #61

Merged

sffc removed the discuss Discuss at a future ICU4X-SC meeting label Jun 12, 2020

zbraniecki mentioned this issue Jun 12, 2020

Should we test for error messages and debug traits? #118

Closed

sffc added this to the 2020 Q2 milestone Jun 17, 2020

zbraniecki added a commit that referenced this issue Jun 20, 2020

Style Guide revision 2

edfd416

Fixes #112. I shied away from extending the `errors` bit because the changes are large enough as is, and I'd prefer to keep #118 for another revision later.

zbraniecki mentioned this issue Jun 20, 2020

Style Guide revision 2 #142

Merged

zbraniecki closed this as completed in #142 Jun 25, 2020

sffc mentioned this issue Jun 29, 2020

OmnICU API surface unicode-org/rust-discuss#15

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Design Doc - Explicit/Implicit and Builder Patterns #112

Design Doc - Explicit/Implicit and Builder Patterns #112

zbraniecki commented Jun 4, 2020 •

edited

Loading

filmil commented Jun 4, 2020 •

edited

Loading

zbraniecki commented Jun 4, 2020

filmil commented Jun 4, 2020

zbraniecki commented Jun 4, 2020

echeran commented Jun 5, 2020

zbraniecki commented Jun 5, 2020

echeran commented Jun 5, 2020

EvanJP commented Jun 5, 2020

zbraniecki commented Jun 6, 2020

zbraniecki commented Jun 12, 2020

zbraniecki commented Jun 20, 2020

Design Doc - Explicit/Implicit and Builder Patterns #112

Design Doc - Explicit/Implicit and Builder Patterns #112

Comments

zbraniecki commented Jun 4, 2020 • edited Loading

Explicit vs Implicit memory management (clone/copy)

Builder vs. Constructor Options

Builder

Constructor Options

filmil commented Jun 4, 2020 • edited Loading

zbraniecki commented Jun 4, 2020

filmil commented Jun 4, 2020

zbraniecki commented Jun 4, 2020

echeran commented Jun 5, 2020

zbraniecki commented Jun 5, 2020

echeran commented Jun 5, 2020

EvanJP commented Jun 5, 2020

zbraniecki commented Jun 6, 2020

zbraniecki commented Jun 12, 2020

zbraniecki commented Jun 20, 2020

zbraniecki commented Jun 4, 2020 •

edited

Loading

filmil commented Jun 4, 2020 •

edited

Loading