Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add design doc on function composition #753

Merged
merged 15 commits into from
May 20, 2024

Conversation

catamorphism
Copy link
Collaborator

This design doc partially addresses #515. It covers some of the same ground as the design doc in #645, but should be seen as a prequel to #645.

Before getting into solutions, I thought it would be useful to collect together some examples and questions, omitting the last two sections from the design doc template.

This document is also an attempt to summarize the ICU-TC mailing list discussion that included @macchiati, @markusicu, @echeran and @richgillam . (Some of the comments, which fall more under "proposed solutions", will be included in a future sequel to this design doc.)

My goal is to get this document into a state that everyone can agree on before finishing part 2 (with the proposed solutions).

I invited some reviewers on an "in case you're interested" basis.

@catamorphism catamorphism added design Design principles, decisions LDML46 LDML46 Release (Tech Preview - October 2024) labels Mar 26, 2024
Because the implementations for formatters and selectors
naturally have different type signatures
(a formatter consumes and produces a resolved value,
while a selector produces a list of keys),
Copy link
Member

@macchiati macchiati Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a flaw in the current specification. I think it would be much more general for a selector to match keys, rather than produce them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As currently specified, keys are indeed "matched", but the result of that match is a preferential list of keys.

It is also possible to simplify this API in an implementation so that the resolved value of the selector is expected to have a selectKey(keys) method that returns a single key from its input list, rather than a preferential ordering, though the selection algorithm then gets a bit more complicated as it needs to track additional state (as done here). This works because the method can be called multiple times, each time leaving out the previously-best key from the input, to determine the full preference order.

Copy link
Member

@macchiati macchiati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design document; it sets out the options and their consequences pretty clearly.

Copy link
Collaborator

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some inline comments.

Not having participated in the preceding ICU-TC mailing list discussion, I find the text in the document a little hard to follow. To me, it seems to be muddling together somewhat two spec layers:

  1. What's possible in general with functions and resolved values.
  2. What's the right thing for :number to do.

Untangling those would make it easier to follow what's being considered, as we need answers to both, and the specific answer for :number is unlikely to be right for all other functions.

Comment on lines 730 to 731
* Are named values essentially strings with metadata,
or are they structured? (Model 1 vs. Model 2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I understand what the difference between these is, or what the ambiguity here is. Doesn't a "string with metadata" need to have an object wrapper of some sort providing slots for where to put the metadata? How is that different from being "structured"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is vague. I tried to address this in 09aeff1, which runs the hazard of getting too implementation-specific, but I'm not sure how else to express it.

Comment on lines +789 to +793
However, it _should_ make the requirements
for "resolved values"
clear enough so that implementors can
make a well-informed decision
on what these types and operations should be.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the expected primary outcome of this work? As in, normative text or examples of (custom) functions operating on values with some expected results?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The answer is that I'm not sure, and part of the goal of writing a doc that doesn't present a proposed solution is to narrow down what the outcome should be: normative changes and/or a set of use cases that must be supported (which would also be normative, but would be different from the style of the existing spec.) I was hoping that would happen as part of the discussion of this design doc.

I added some text along those lines in cfa882a. I also added some text at the end about different directions that a proposed design could go in. If there are any broad categories of solution that I left out, I'd like to include them.

I think that to answer your question, we have to pick one of those general directions.

Comment on lines +805 to +808
A second constraint is
the difficulty of developing a precise definition of "resolved value"
that can be made specific in the interface for custom functions,
which is implementation-language-neutral.
Copy link
Collaborator

@eemeli eemeli Mar 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would be very happy for us to include a more concrete definition or description of "resolved values". Historically, that was one of the first parts of the formatting spec that I tried to include, in #198. Unfortunately, that PR ended up being stuck for a year, before needing to be dismissed. The preceding lengthy discussion in #190 is also relevant background, though note of course that these discussions are from over two years ago, so will not fully reflect our current thinking or nomenclature.

Given this past history, I would find it useful to hear @mihnita's current thoughts on defining "resolved value" more explicitly, as he in particular was previously against the idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also be interested to hear from @mihnita -- in terms of the implementations, I took inspiration from his code in defining a FormattedPlaceholder type that represents the return value of a function implementation, and trying to figure out how that type should actually work (in combination with #515) is how I got here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and I'll take a look at those threads.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to extract some of the top-level points from the lengthy discussion and added a "prior work" section: fa625c3.

If you think I've misrepresented anything or left out an important point, please let me know! I'm hoping that some of the questions that were challenging in those prior discussions might be easier to resolve now due to spec changes that have happened in the meantime. But not having been there for those discussions, I can't be certain of it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewing this PR and the changes since my last review, I believe further progress on this is still blocked by waiting on input from @mihnita, as requested above. I would find it important to get buy-in on the general approach before spending more time on this.

@catamorphism
Copy link
Collaborator Author

Added some inline comments.

Not having participated in the preceding ICU-TC mailing list discussion, I find the text in the document a little hard to follow. To me, it seems to be muddling together somewhat two spec layers:

1. What's possible in general with functions and resolved values.

2. What's the right thing for `:number` to do.

Untangling those would make it easier to follow what's being considered, as we need answers to both, and the specific answer for :number is unlikely to be right for all other functions.

:number is just meant to be an example because it's a known function that has several different options. The same issue could potentially come up with any custom function. I tried to clarify this in 8e31845.

I suspect that just this one paragraph might not address your concern about untangling the two concerns, though. I'll try to keep that in mind while answering the other comments.

Copy link
Collaborator

@echeran echeran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This design doc is good -- it covers the space of design decisions well around functions whose return values effectively get passed to other functions.

I left a few comments and small optional suggested edits, but shouldn't be considered a blocker.

These issues are important to decide upon so that we can avoid ambiguity and complexity.

exploration/function-composition-part-1.md Outdated Show resolved Hide resolved
exploration/function-composition-part-1.md Outdated Show resolved Hide resolved
exploration/function-composition-part-1.md Show resolved Hide resolved
exploration/function-composition-part-1.md Outdated Show resolved Hide resolved
exploration/function-composition-part-1.md Show resolved Hide resolved
exploration/function-composition-part-1.md Show resolved Hide resolved
Comment on lines +23 to +29
The objective of this design document is not to make
a concrete proposal, but rather to explore a problem space.
This space is complicated enough that agreement on vocabulary
is desired before defining a solution.

Instead of objectives, we present a primary problem
and a set of subsidiary problems.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good description of this document.

I think that we still will want a concrete design document, but this is a necessary preamble to that effort.


So we must address both problems together:

* Problem 1: Define what it means for functions to compose with each other.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this is actually a problem. I think it is a side-effect of not resolving Problem 2.

I suspect the behavior is that some (but not all) functions and some (but not all) options can affect the operand.

Example:

.input {$d :datetime dateStyle=medium timeZone=|America/Los_Angeles|}
.local $t = {$d :datetime timeStyle=short}
{{What prints? {$d} {$t}}}

In the above, I don't want to have to type the timeZone option every time I touch the value. I probably want that to be transitive. I don't want $t to print both the date and time, though: I asked for the time when assigning the local value. $d is probably some Temporal value (or even a classical incremental time value). (It doesn't matter if you don't agree with whether I'm right about the date/time styles being transitive.)

Similarly, think about this message:

.local $i0 = {|Ϊ́| :casefold}  // literal is U+03AA U+0301
.local $i1 = {$i0 :normalize form=nfkc}
.local $i2 = {$i1 :casefold}
.local $i3 = {$i2 :normalize form=nfkc}
{{Prints the intended U+0390: {$i3}? see https://www.w3.org/TR/charmod-norm/#normalizationAndCasefold}}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that this is actually a problem. I think it is a side-effect of not resolving Problem 2.

I disagree with this, because I think that even if we resolved Problem 2, we would still have ambiguities.

In your first example: suppose we solve problem 2 by adopting a representation of function results that includes all the options that were passed in. Without further spec work, though, it's still unclear whether $t is printed with the time zone or not. Problem 2 is "what do functions return?", while problem 1 is "what do functions do with their inputs?" Maybe the latter question can only be answered in a way that's specific to each function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think example 1 is an exemplar of "what do functions do with their inputs" as I don't think that options are necessarily the same thing as "inputs". Also, it's probably not a great example, since Temporal types provide ways to float/unfloat time values (making the time zone a part of the datetime value--that it, modifying the operand, as it were) rather than merely being a random option.

My concern is that, whenever I see "we have to solve two problems at the same time", that suggests extra effort to ensure that there's been complete decomposition of the requirements.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re: decomposition of the requirements, fair enough... what do you think about the change to this section I made in 86c26ff (emphasizing that this document is mainly about solving the resolved value problem, which is only a part of the bigger function composition problem)?

@aphillips aphillips merged commit a3d13dc into unicode-org:main May 20, 2024
1 check passed
@macchiati
Copy link
Member

The following seems contradictory:

Model 1:
...
2. Let X be the result of the function. X is an object encapsulating the following fields:

- The source value, "0.33333"
- The fully-evaluated options, {"maxFrac": "2"}
- The formatted result, a FormattedNumber object representing the string "0.33"

Model 2:
2. Let F be the result of the function. F is a FormattedNumber object representing the string "0.33"
  • Alternative 1: A function returns a "formatted value". This matches model 1, where formatted values are bound to names.

  • Alternative 2: A function returns a composite value that conceptually pairs a base value (possibly the operand of the function, but possibly not; see Example B1) with options. This matches model 2. If we preserve the single usage of "resolved value" in the spec, this implies that the (base value, options) representation applies to all resolved values, not just those returned by functions.

That is, Alternative 1 (formatted value only) seem to match Model 2, not Model 1.

This happens in a few other places as well. I think the readability would be improved dramatically if the Models had names instead of numbers, something like:

Model 1 --> Bundle Model
Model 2 --> Formattable Model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design principles, decisions LDML46 LDML46 Release (Tech Preview - October 2024)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants