-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Default impls that load data #5554
Comments
That's not what that says. It says
"This comes up often with structs that define a set of options", but is in no way limited to that. Compiled data is reasonable default data, that's why we're publishing it.
So? Some APIs are gated, and that includes impls.
It only exists when
Call sites can be written using |
The issue title misrepresents what's happening. There is never any "data loading" in a impl Default for SentenceSegmenter {
fn default() -> Self {
Self::new()
}
}
impl SentenceSegmenter {
pub fn new() -> Self {
Self {
payload: DataPayload::from_static_ref(
crate::provider::Baked::SINGLETON_SENTENCE_BREAK_DATA_V2_MARKER,
),
payload_locale_override: None,
}
}
} |
I don't think I agree with any of these points.
I don't agree with this. I've seen
Seems fine. A lot of things are gated on
There's no expectation of
This is a rather subjective style choice and i don't see a need to force it on our clients. They can always call |
Ok so it's not noncontroversial I'm reasonably convinced by the counter-arguments on points 1. On point 4, I'm mainly concerned when it is used for passing these things as unnamed positional arguments, but we can avoid that by not adding such APIs. I'm not convinced by the counter-arguments on points 2 and 3, and there are more I'd like to add:
I'll add to this: the fact that it is gated on
I used SentenceSegmenter as an example because it has two fields, one singleton and one non-singleton. Currently the non-singleton has no data in To respond to @Manishearth's "There's no expectation of Default to be cheap": my expectation is that it is cheap. For example, it is used in bounds on things like constructing an array and other operations that where the impl is called a large number of times. Is your argument an "in my experience" argument or "this is a documented convention" argument? New points:
|
On the point 7: we have generally been conservative with adding impls to our types, and I think it has served us well. We have a very large API surface, and every impl adds cost. The only impl we agreed to universally add has been |
Right, but if they're doing that they presumably need some default, without
Okay, you should not hold on to that expectation because it is not one followed by the ecosystem. My argument is one based on the ecosystem, and the fact that it is not documented as being cheap. I would consider this something that needs to be explicitly documented were it an expectation. Worth noting that our
I don't consider this to be a requirement of the
I'm open to having some types have
I can provide the motivation for the clippy lint: I definitely do not think this is a case in which the clippy lint is wrong. My position is that deviating from clippy here is something that needs justification, not the new impl. |
Example developer journey: they are using postcard data. They apply |
This ... seems like exactly what we want to happen? They now are forced to consider where their data should come from. That's good. That should exactly be a factor of the decision to disable the feature. I don't really think "they don't fully understand so they leave it alone" is that likely. It doesn't seem that confusing. We could add a "enabled by feature foo" tag on the impl if we really want to improve this experience. |
My point is that they should be forced to consider where their data comes from up front by using an API explicitly documented as "with compiled data", which was key to us agreeing to have the compiled data functions have short, convenient names. |
The problem with docs on trait impls is that they are not discoverable. Often, Default is used implicitly, as you noted with |
I mean, if you get a "missing Default impl" when you play with features I don't really see why it would be hard to figure out what happened. The docs on the trait impl are just an added bonus. I'm not convinced of the sketched developer story being particularly onerous.
I think that's fine? For singletons if someone is using a type they almost certainly need the data and 99% of users will get the data from the compiled data; the main benefit of loading data from a file for singletons is if it's some large program that wishes to lazily load everything. I can see a potential additional benefit of wishing to load the latest data. I want most of our singleton users to be able to just People who care about data to that granularity are going to be caeful about bounds anyway. Just out of curiosity, this is the current list of gated
|
I think the fundamental disagreement is: we agree as a TC that offering developers a way to switch data sources is fundamental, but we don't agree on the degree to which opting into specific data sources can and should impact developer ergonomics. I have taken the position that, given its importance to ICU4X's value proposition, we should generally favor being explicit about data sources, even if it takes a hit on developer ergonomics. The decision to name constructors If I had to guess, @Manishearth is likely to take the position that Default impls are, indeed, a high-impact area for developer ergonomics. I'm happy to debate that point, but, from my perspective, it is moot, because we don't have an alternative on the table for developers making explicit decisions about data sources. With |
Based on the above, I could agree to this policy:
|
TLDR: I think the proposed policy works but I think we should talk about what we think of as singletons here a bit more
Okay, thanks for sketching this out, and for the proposed policy. I think I see the disagreement here, and it's subtler and in two parts. ICU4X's initial design didn't include baked data, and as such it was extremely important we gave people a lot of control over switching data sources. Of course, control over data is a core value proposition, but the way in which that control works has changed over time. Specifically, with the advent of baked data, we've gone out of our way to make baked data easy to use. We could have made baked data need a The impression I had, at least from the way prior discussions have gone and especially points made by @robertbastian around the topic of baked data, was that for singletons at least we no longer considered control over data to be so central to the design, mostly because "it makes sense". Rather, we consider it crucial that there be some way to have control over data, and implicitly recognized that people who want control over data for singletons were an ultimately niche (but important!) use case. This was the position I was coming from when recently discussing #5440, where my preference was that the "main entry type" for singletons be the baked, borrowed one, with the owned version having the qualified name because it was more niche. Anyway, hopefully that explains where I stand on "offering developers a way to switch data is fundamental". I think it's fundamental, as you said, but I think specifically for singletons, it is a niche use case. I was also under the impression that this was where everyone else roughly stood, at the very least @robertbastian but I think I've expressed opinions based on this in the past and not received pushback from others. It's potentially worth reaffirming precisely where we stand there. I'm not opposed to a position that data loading is non-niche for every type, that's just not my default (heh) position right now Secondly, and here is where it gets a bit interesting, I think perhaps we've been dancing around the topic of what is truly a "singleton". Previously in the discussion you brought up SentenceSegmenter and I said:
which I was me considering SentenceSegmenter to not be a singleton, though not saying so explicitly perhaps. It's possible that a lot of our types with Default fall into a similar bucket! As I mention later below, I think timezone stuff may be similar. In your new proposal, you say this:
I think, to me, this is where I'm coming from when I'm talking about "singletons", both previously in this thread and in the first half of this post. True singletons, to me, are ones where there is only one possible "best" value, which means runtime data loading is going to be extremely rare. The main use cases for those are:
I feel like properties, casemapper, and the auxiliary normalizer types fit in this . The main normalizer types have a split NFC/NFKC constructor, they're ineligible. The properties set types also are not eligible, because we type erase them and have a generic constructor. Locale directionality probably fits here too, and ... maybe segmenters? Probably not? I'm not sure about that one but curious as to what others think. I think timezone stuff doesn't really fit this: I think "load timezone from system tz files" is going to be a very common use case. There are more in the list I posted, but this should be a rough sketch of my position. I suspect a productive next step for us is to agree on the policy you stated and work on figuring out what it actually means for something to be a "true singleton" (yes, I recognize that this is a confusing distinction I have potentially just made up, I'd love to hear terminology ideas) |
Some discussion:
|
Here's another point. A Default impl could be one of two things: with-default-data or without-data. We have a small number of classes, such as LocaleFallbacker, that work in both modes. The argument "it has a new function so clearly there is a default value" is really precarious to apply here because I still stand by my position that |
I think this is a misrepresentation of my position, I don't think you're arguing against something anyone else here agrees with. I explicitly said early in the discussion:
I think those are some cases where we should perhaps not have a |
Two positions:
More discussion:
Types Shane thinks should not be defualt:
What if: for these types we should call the functions (@sffc and @Manishearth like this)
Conclusion: On a case by case bases, if a thing is in a situation where it is:
we do not provide a Concrete next steps for 2.0:
Agreed: @Manishearth @sffc (@robertbastian) @zbraniecki Additional action items:
Discussion:
|
Currently we have a number of
impl Default
on classes that have anew()
function that load data. The data is usually singleton but doesn't need to be (see SentenceSegmenter for instance).This is bad because:
impl Default
is to fill in default values for fields when there are reasonable default values. But these are opaques, not bag-of-field structs, andimpl Default
is just not in the right spirit.impl Default
is gated on thecompiled_data
feature, which is dangerous since it can be disabled.impl Default
might do work, since it is loading data. It should ideally exist only when POD can be filled in.TimeZone
+ResolvedTimeZone
#5549, where an unnamed positional argument that isn't an options bag is elided with&Default::default()
.We never had these impls until we added compiled_data constructors, and then we added them only because Clippy was telling us to. I don't think we ever really had a thorough discussion about it.
I hope this is a no-brainer and non-controversial.
@Manishearth @robertbastian @zbraniecki @hsivonen
The text was updated successfully, but these errors were encountered: