Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change AIP-149 guidance from "should" to "should not" for proto3 optional on primitives #863

Open
SirGitsalot opened this issue Apr 12, 2022 · 13 comments

Comments

@SirGitsalot
Copy link

AIP-149 says:

Services defined in protocol buffers should use the optional keyword for primitives
if and only if it is necessary to distinguish setting the field to its default value
(0, false, or empty string) from not setting it at all

I propose changing this to "Services defined in protocol buffers should not use the optional keyword". While it's easy for a service that's built using protocol buffers and gRPC to use optional for field presence detection, it's difficult (and sometimes impossible) for clients using other encodings and transports (e.g., JSON over HTTP). As a concrete example, most GCP APIs use proto3 JSON mapping to convert protos to and from JSON. That mapping explicitly says that omitting a field entirely, setting it to null, and setting the field to its default value are equivalent. That's not true for optional fields, and in turn that means that JSON-comsuming clients must create special case code to support APIs that use optional.

Instead of using optional APIs should:

  • Follow the additional guidance in the AIP ("Services should not need to distinguish between the default value and unset most of the time...")
  • If that's not possible, be explicit about a field's presence by including in-band information about that field's presence

Following this new guidance the AIP-149 example would look like:

// A representation of a book in a library.
message Book {
    // The name of the book.
    string name = 1;

    // Whether or not the the book has a rating.
    bool is_rated = 2;

    // The rating for the book, from 0 to 5.
    // 0 is distinct from no rating. Only valid
    // when is_rated == true.
    int32 rating = 3;
}
@SirGitsalot
Copy link
Author

AIP-149 should also add the guidance:

Important: Declarative-friendly resources (AIP-128) must not use optional in their protocol buffer definitions

@rileykarson
Copy link

rileykarson commented Apr 14, 2022

I'm strongly supportive of this. In addition to @SirGitsalot's points:

  • optional fields are not well documented in API references (at least in GCP ones), and it's a breaking change for a client to change from a primitive to an optional primitive. It's also possible that a harmful setting will be enabled on user resources accidentally, if a client sets an optional scalar to its zero value.
  • Not all clients are language client libraries with fully-fledged type systems, and REST/RESTful/JSON inspired clients will often map empty and null into the same state. This is true of two declarative clients, KCC and Terraform. Terraform is unable to express null without wrapping the type into an object, and KCC uses null to indicate a field should not be managed.

@rileykarson
Copy link

rileykarson commented Apr 14, 2022

For a concrete example of the danger of optional fields, Google Cloud Storage Buckets' lifecycle rules (https://cloud.google.com/storage/docs/lifecycle) has an isLive optional bool (reference. Given a GCS bucket with versioning disabled, a client that sends false for isLive when the user intended for null (i.e. nothing) to be sent would apply the lifecycle action to no objects, incurring higher than anticipated storage costs as none of their objects would get deleted.

This was a real bug that happened to us in Terraform! hashicorp/terraform-provider-google#1608

@devchas
Copy link

devchas commented Apr 14, 2022

Just playing devil's advocate here. The way @SirGitsalot worded the proposed AIPs, it sounds to me like the recommendation is never to use optional under any circumstances. If that's the case, why do we need the optional keyword at all? You did bring up one example, where field presence detection is important for services built using protocol buffers and gRPC. While I would imagine the AIP should be implementation agnostic, is there a way we can account for the cases in which the optional keyword should be used?

I don't know if there are other implementations similar to the proto3 JSON mapping mentioned, but it sounds to me like the proposed change is specifically meant to cater to that implementation. Are there other, similar tools that fall victim to the same problem?

To that end, is there another alternative where the proto3 JSON mapping implementation processes the proto messages differently as not to cause this problem. It also sounds like part of the problem is that this isn't documented correctly since

That mapping explicitly says that omitting a field entirely, setting it to null, and setting the field to its default value are equivalent. That's not true for optional fields

I would think that, at a minimum, that updating that documentation to correctly reflect the behavior with optional fields would help mitigate the issue.

Finally, regarding the comment of @rileykarson - I'm not familiar with Terraform, but in reading through the real-world case you brought up, it seems like the request in question was intended to target objects in the bucket that were live (is_live set to true) and that the underlying API assumed that excluding the parameter (or setting it to null) targeted live buckets by default, which is why the documentation said the is_live parameter isn't required. If that's the case, fixing that documentation to state that is_live should be set to true would presumably fix the problem, and that's what it sounds like was done here. Again, I may be off on this given my lack of Terrform knowledge, but that's how I read it.

Just some food for thought, but the TLDR is - does making the proposed change simply solve a problem with one implementation at the expense of creating a problem in another? And if so, is there a way we can word this to explain why we should ever use optional and address the needs of all consumers regardless of their choice of implementation?

@rileykarson
Copy link

I'm not familiar with Terraform, but in reading through the real-world case you brought up...

It's a little messier than that! The field can be unset meaning match both live and noncurrent objects, true meaning match only live objects, and false meaning match only noncurrent objects. However, if a bucket doesn't have versioning turned on, all objects are live so unset and true are equivalent. The user thought they'd specified nothing, our client didn't know the field was optional, and sent false instead of nothing. Treating nothing as false is the behaviour of non-optional fields that are semantically optional in the API.

Just some food for thought, but the TLDR is - does making the proposed change simply solve a problem with one implementation at the expense of creating a problem in another?

I suspect we have to consider how users are using these APIs. I believe it to be the case that the majority of GCP users interact with the REST API indirectly (gcloud, Terraform, Config Connector, and other orchestration tools are all REST-based) and some advanced users will interact with the API directly (i.e. through REST client libraries). That may not be as true for other Google APIs, though.

That may mean a Cloud-specific AIP would be appropriate, for example, or a relatively strong should/should not general recommendation based on usage expectations for public APIs in general.

I don't know if there are other implementations similar to the proto3 JSON mapping mentioned, but it sounds to me like the proposed change is specifically meant to cater to that implementation. Are there other, similar tools that fall victim to the same problem?

I want to stress that all Cloud APIs (and other public Google APIs, the most accessible list I know is https://github.com/googleapis/google-api-go-client's service list) are using this proto-to-JSON mapping to create REST APIs. optional presents an issue for Google-wide and not just by a single service or product.

@shwoodard shwoodard changed the title Change AIP-149 guidance from "should" to "should not" Change AIP-149 guidance from "should" to "should not" for proto3 optional on primitives Apr 18, 2022
@AnashOommen
Copy link

Re: user cannot figure out optional: There was no interest from the OnePlatform team to fix the autogenerated documentation to show that a field was optional (either in the REST or the grpc documentation). You cannot figure out this information from the discovery JSON. So this is not new to Terraform.

@AnashOommen
Copy link

AnashOommen commented Apr 18, 2022

I guess in some sense this is a problem with FieldMask being a poorly designed feature.

Today, the user has to do:

var book = new Book();
book.rating = 3;
book.is_rated = true;
var fieldMask = new FieldMask();
fieldMask.mask = ["rating", "is_rated"];
var req = new UpdateBookRequest();
req.book = book;
req.update_mask = fieldMask;

This gets increasingly worse when Book has a dozen fields to update. I mean, the user just set a dozen fields on Book, why should they now create an array with a dozen strings to tell us their intentions?

Now, you could say that you should get() and then overwrite, but that wouldn't work if Book is an expensive resource to retrieve.

Which is why we have a FieldMasks class to examine a proto object / do a proto diff to figure out what the user's FieldMask should be.

var book = new Book();
book.name = "book 1";
var fieldMask = FieldMasks.AllSetFieldsOf(book);

But then, we can't distinguish between

var book = new Book();
book.is_rated = false;
var fieldMask = FieldMasks.AllSetFieldsOf(book);

And

var book = new Book();
var fieldMask = FieldMasks.AllSetFieldsOf(book);

Now, if is_rated is an optional field and we ignore it's tristate nature on the server side, and instead use it only as a presence/dirty flag on the client side, then we can do a proto diff without any issues. Which is how we are using it in Ads today.

@devchas
Copy link

devchas commented Apr 18, 2022

To elaborate on the point that @AnashOommen made, the FieldMask design is an issue in the Ads API because its resources have many fields that are frequently updated, so it is important that users only perform partial updates on the fields they actually want to change.

Generally speaking, if APIs have similar characteristics (resources with many fields that are frequently updated), I would suspect those APIs will encounter similar issues, and as a result, similar issues with removing the optional keyword.

@rileykarson
Copy link

rileykarson commented Apr 18, 2022

Now, if is_rated is an optional field and we ignore it's tristate nature on the server side, and instead use it only as a presence/dirty flag on the client side, then we can do a proto diff without any issues. Which is how we are using it in Ads today.

This sounds like what we've done clientside in DCL and KCC, who represent all types as indirect types (i.e. optional type or *type) clientside, and produce a correct update mask as part of the API side. That's dependent on API types not being optional, as otherwise they need to represent the value as **type in the client.

Representing API types as optional type where the storage type is just type makes client generation moderately more complicated- we've had to contend with fields whose API frontend and storage use unset/true/false as distinct values.

@jskeet
Copy link
Contributor

jskeet commented Oct 27, 2022

A lot of the discussion on this issue seems to be based on the public proto3 documentation, which has not been updated for optional. There's work going into making a more precise public spec, but the gist of it will be this (from an internal doc):

Fields that have explicit presence will be serialized in JSON if and only if they are present.

... where proto3 optional fields do have explicit presence.

How much does that change the objection to using optional? (Basically field presence in the message corresponds exactly to presence in the JSON, which I'd expect to be easy to handle in tooling.)

@jskeet
Copy link
Contributor

jskeet commented Nov 10, 2022

Ping on this, @SirGitsalot, @rileykarson, @AnashOommen, @devchas

@rileykarson
Copy link

Most of my thoughts are based on the difficulty wrt. implementing clients- lack of metadata in GCP specs (c.g.c. docs, discovery docs), lack of pointer types (needed to represent optional) in some clients, and the fact that getting the type wrong forces clients take on a breaking change. This isn't exclusive to a single client- we've seen google-api-go-client break itself numerous times because of similar reasons multiple times, most recently with googleapis/google-api-go-client#1598 a few months ago (addtl discussion in googleapis/google-cloud-go#6204).

fyi @toumorokoshi @slevenick in case you're interested!

@jskeet
Copy link
Contributor

jskeet commented Nov 15, 2022

@rileykarson: Thanks for the extra detail. (I think the choice of whether optional is represented via a pointer or not varies immensely by language, and that affects whether or not changing a field from default to optional is a breaking change.)

If there are concrete aspects (like Discovery, docs and the Go implementation) that can be improved, and additional rules (like considering it to be a breaking change), that feels like a more constructive set of things to look at than a blanket ban based on initially-incorrect expectations around JSON representations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants