Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constructor Options Bag normalization #5839

Closed
zbraniecki opened this issue Nov 19, 2024 · 7 comments · Fixed by #5852
Closed

Constructor Options Bag normalization #5839

zbraniecki opened this issue Nov 19, 2024 · 7 comments · Fixed by #5852
Labels
C-meta Component: Relating to ICU4X as a whole

Comments

@zbraniecki
Copy link
Member

With introduction of Preferences we now have the final shape of the first argument to constructors and implemented the normalized model for how to converge Locale and Preferences.

The remaining question is how to handle the second argument which we currently have three approaches to:

  1. Many constructors take a custom OptionsBag, similar to PreferencesBag but for developer-induced options
  2. Some constructors take no second argument
  3. Finally several take an explicit argument

It looks like this:

// OptionsBag
struct FooOptions {
    type: FooType
}

impl Foo {
    pub fn try_new(prefs: FooPreferences, rule_type: FooOptions) -> Result<Self, Error>;
}

// No second argument
impl Foo {
    pub fn try_new(prefs: PluralRulesPreferences) -> Result<Self, Error>;
}

// Explicit argument
impl Foo {
    pub fn try_new(prefs: FooPreferences, type: FooType) -> Result<Self, Error>;
}

We have an opportunity to normalize it and conform all constructors to Option 1, if we want to, but it doesn't feel like a natural obvious choice, so I'd like to discuss it.

How we modeled preferences, locales and options

The current architecture separates Preferences which represent user-provided values from Options which represent developer-provided values.
This is novel and different from how ECMA-402 handles the constructors. In ECMA-402 we have (locale, options) pair where Options is a bag combining keys set by user and developer and the user-ones get merged with locale extension keys.

In our model the Preferences and Locale get merged as they represent the user-driven keys, and the Options are split into two "types".

First type are options that are affecting data slicing. In that case we push them to the constructor itself to allow for DCE to remove unnecessary code. Second type is options that don't affect DCE. It looks like this:

enum FooType {
    Bar,
    Baz,
}

// Option as argument
impl Foo {
    pub fn try_new(prefs: FooPreferences, type: FooType) -> Result<Self, Error>;
}

// Option as part of the constructor
impl Foo {
    pub fn try_new_bar(prefs: FooPreferences) -> Result<Self, Error>;
}
impl Foo {
    pub fn try_new_baz(prefs: FooPreferences) -> Result<Self, Error>;
}

Value proposition of normalization

If we were to normalize the second argument I see three main value props:

  1. It simplifies the DX by providing consistent API shape for all constructors
  2. It allows for introduction of patterns to normalize options merging, converging, converting and cascading (similar to Preferences)
  3. It allows for an easy introduction of Option keys that are universal across all constructors - ECMA-402 has one like that called localeMatcher

Why we shouldn't do this

There are several reasons for which we may not want to normalize it:

  1. Not all constructors have options. Introducing an empty Bag just for the sake of normalization and asking developers to pass Default::default(), None or options!{} to each constructor just to have consistency may not be worth it.
  2. As for ECMA-402 compatibility we'll have to introduce ECMA-402 OptionsBag anyway and write logic to convert from ECMA-402 behavior to ICU4X, so the value of normalized OptionsBag in ICU4X is reduced.
  3. The only known universal-option across all components in ECMA-402 - LocaleMatcher has questionable value, is basically never used and while we may have to find a way to support it, it is not clear if we should design ICU4X component API around it just because LocaleMatcher exists.

Counters:

  1. While not all constructors have options, it doesn't cost a lot to have two-argument constructors. We can even argue that the argument should be Option<OptionsBag> to allow for None if we think it's better DX than Default::default().
  2. Building conversion logic from ECMA-402 bag to ICU4X Preferences+Options can be abstracted to macros easier than if each constructor has it's own shape.
  3. If we decide that we have to support LocaleMatcher (to pass test-262) then we have to find a way to pass this argument to each constructor somehow... Even if no other Options are needed by that constructor.

Is it urgent?

I'm not sure if it is. It would be nice to have a consistent, ECMA-402 compatible, constructor story for 2.0 and the PR is going to be quite simple to write if we decide to go for it, but it may be that we have different opinions and should give ourselves time to think it through and plan this for 3.0.

@zbraniecki zbraniecki added the discuss-priority Discuss at the next ICU4X meeting label Nov 19, 2024
@sffc
Copy link
Member

sffc commented Nov 19, 2024

About LocaleMatcher specifically: this only affects data loading, right? I don't think most individual component constructors will have anything special to do with LocaleMatcher except for passing it down to the data provider. Things that are passed down to the data provider can be done externally via a custom data provider. This is just to say that I don't think LocaleMatcher by itself is a very strong argument in favor of requiring options.

What we've done in the past, and what I lean toward doing here, is to keep try_new functions the way they are, and if we need a version with options in the future, we add try_new_with_options, and in the next major release, we can switch it to be the default.

@zbraniecki
Copy link
Member Author

This is just to say that I don't think LocaleMatcher by itself is a very strong argument in favor of requiring options.

That's a great point. Basically we can handle ECMA-402 locale matcher even by something like:

let provider;

provider.setLocaleMatcherStrategy(LocaleMatcherStrategy::Foo);

let dtf = FooFormatter::try_new(provider, preferences);

this way we don't need it in Options.

What we've done in the past, and what I lean toward doing here, is to keep try_new functions the way they are, and if we need a version with options in the future, we add try_new_with_options, and in the next major release, we can switch it to be the default.

Yes, I'd like us to decide if we are ok having effectively three patterns for runtime arguments: 1) no arguments, 2) options bag, 3) single option. One half-way convergence would be to migrate (3) to (2) and have only two scenarios.

@sffc
Copy link
Member

sffc commented Nov 19, 2024

I think the following choices are fine:

  1. try_new with options
  2. try_new with no options
    • To add options later: create try_new_with_options and reconcile at next major release
  3. try_new_with_foo with a positional argument named foo
    • To add options later: create a function named try_new

Things we should not approve:

  1. try_new with a positional argument or any arguments other than provider, preferences and options

@zbraniecki
Copy link
Member Author

Thanks! Here's a list of cases that do not follow your heuristics:

./components/plurals/src/lib.rs
322:    pub fn try_new_unstable(
368:    pub fn try_new_cardinal_unstable(
425:    pub fn try_new_ordinal_unstable(
618:    pub fn try_new_unstable(
660:    pub fn try_new_cardinal_unstable(
699:    pub fn try_new_ordinal_unstable(
747:    pub fn try_new_with_rules_unstable(

./components/datetime/src/neo.rs
186:    pub fn try_new(
211:    pub fn try_new_unstable<P>(
236:    fn try_new_internal<P, L>(
411:    pub fn try_new(
436:    pub fn try_new_unstable<P>(
459:    fn try_new_internal<P, L>(

./components/segmenter/src/sentence.rs
149:    pub fn try_new_unstable<D>(provider: &D) -> Result<Self, DataError>
173:    pub fn try_new_with_options_unstable<D>(

./components/segmenter/src/word.rs
238:    pub fn try_new_auto_unstable<D>(provider: &D) -> Result<Self, DataError>
268:    pub fn try_new_auto_with_options_unstable<D>(
362:    pub fn try_new_lstm_unstable<D>(provider: &D) -> Result<Self, DataError>
391:    pub fn try_new_lstm_with_options_unstable<D>(
476:    pub fn try_new_dictionary_unstable<D>(provider: &D) -> Result<Self, DataError>
504:    pub fn try_new_dictionary_with_options_unstable<D>(

./components/segmenter/src/line.rs
415:    pub fn try_new_auto_unstable<D>(provider: &D) -> Result<Self, DataError>
456:    pub fn try_new_lstm_unstable<D>(provider: &D) -> Result<Self, DataError>
494:    pub fn try_new_dictionary_unstable<D>(provider: &D) -> Result<Self, DataError>
534:    pub fn try_new_auto_with_options_unstable<D>(
584:    pub fn try_new_lstm_with_options_unstable<D>(
641:    pub fn try_new_dictionary_with_options_unstable<D>(

./components/calendar/src/any_calendar.rs
619:    pub fn try_new_with_any_provider<P>(
666:    pub fn try_new_with_buffer_provider<P>(
712:    pub fn try_new_unstable<P>(provider: &P, kind: AnyCalendarKind) -> Result<Self, DataError>
782:    pub fn try_new_for_locale_unstable<P>(

./components/calendar/src/week_of.rs
40:    pub fn try_new_unstable<P>(provider: &P, locale: &DataLocale) -> Result<Self, DataError>

./components/experimental/src/transliterate/transliterator/mod.rs
242:    pub fn try_new(locale: &Locale) -> Result<Self, DataError> {
251:    pub fn try_new_with_any_provider(
263:    pub fn try_new_with_buffer_provider(
275:    pub fn try_new_unstable<PT, PN>(
329:    pub fn try_new_with_override<F>(locale: &Locale, lookup: F) -> Result<Self, DataError>
342:    pub fn try_new_with_override_with_any_provider<F>(
359:    pub fn try_new_with_override_with_buffer_provider<F>(
376:    pub fn try_new_with_override_unstable<PT, PN, F>(

./components/experimental/src/dimension/currency/long_formatter.rs
62:    pub fn try_new(
103:    pub fn try_new_unstable<D>(

./components/experimental/src/dimension/units/formatter.rs
95:    pub fn try_new(
133:    pub fn try_new_unstable<D>(

./components/experimental/src/displaynames/displaynames.rs
55:    pub fn try_new_unstable<D: DataProvider<RegionDisplayNamesV1Marker> + ?Sized>(
122:    pub fn try_new_unstable<D: DataProvider<ScriptDisplayNamesV1Marker> + ?Sized>(
190:    pub fn try_new_unstable<D: DataProvider<VariantDisplayNamesV1Marker> + ?Sized>(
253:    pub fn try_new_unstable<D: DataProvider<LanguageDisplayNamesV1Marker> + ?Sized>(
337:    pub fn try_new_unstable<D>(

@sffc sffc added this to the ICU4X 2.0 Beta ⟨P1⟩ milestone Nov 21, 2024
@sffc
Copy link
Member

sffc commented Nov 21, 2024

Thoughts:

  • Plurals: add an options bag
  • Datetime: it is fine as-is; the field sets are options bags
  • Segmenter: please fix. I think there is another issue.
  • Calendar: there is another issue to fix.
  • WeekOf: add preferences here.
  • Transliterator: wait till rob get back
  • Currency, units: add an options bag
  • Display names: needs more discussion and it is experimental

@zbraniecki
Copy link
Member Author

In #5852 I

  • Plurals - fixed with an options bag
  • WeekOf - fixed by switching DataLocale to Preferences
  • DisplayNames - fixed with preferences + options

Left:

  • Segmenter - didn't fix. It doesn't take a locale or options now. I think we should revisit it when we decide on locale in Segmenter
  • Calendar - Didn't know how to fix. It takes a Calendar and Provider, no locale.
  • Transliterator - Left for Rob to eval
  • Currency & Unit - Didn't fix. It takes (Preferences, Currency) & (Preferences, Unit, Options). I think the scenario makes sense as the currency/unit is not an option, but the object that the struct is operating on. (it's a bit confusing in the Currency one where the currency is passed in the constructor and in the format function, assuming it's because it's experimental)

@sffc
Copy link
Member

sffc commented Nov 22, 2024

Split segmenter discussion into #5856.

Opened and merged a PR for AnyCalendar.

The experimental components can be discussed as they continue to iterate. I don't think we need separate issues for them.

sffc added a commit that referenced this issue Nov 22, 2024
Partially fixes #5839

---------

Co-authored-by: Shane F. Carr <[email protected]>
@sffc sffc added C-meta Component: Relating to ICU4X as a whole and removed discuss-priority Discuss at the next ICU4X meeting labels Dec 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-meta Component: Relating to ICU4X as a whole
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants