Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

any of being used where it should just be an enum of strings #126

Closed
jessfraz opened this issue Aug 18, 2021 · 21 comments
Closed

any of being used where it should just be an enum of strings #126

jessfraz opened this issue Aug 18, 2021 · 21 comments

Comments

@jessfraz
Copy link
Contributor

So I plan on fixing this.

But I've generated a bunch of API clients now and seen a lot of specs and as a matter of making our future life easier we need to fix how we are generating these. We are using AnyOf when what we really want is just a simple enum of strings. We can still have descriptions we can still make things uuids, but now we won't have the weird "objects" with one property in it. That's just noise and its going to be hard to make any nice client from it.

Expect a PR.

@jessfraz jessfraz changed the title any of being usedd where it should just be an enum of strings any of being used where it should just be an enum of strings Aug 18, 2021
@ahl
Copy link
Collaborator

ahl commented Aug 18, 2021

I'm not sure if I understand this precisely, but I think we're talking about types such as this one:

/**
 * State of a Disk (primarily: attached or not)
 */
#[derive(JsonSchema)]
#[serde(rename_all = "lowercase")]
pub enum DiskState {
    /** Disk is being initialized */
    Creating,
    /** Disk is ready but detached from any Instance */
    Detached,
    /** Disk is being attached to the given Instance */
    Attaching(Uuid), /* attached Instance id */
    /** Disk is attached to the given Instance */
    Attached(Uuid), /* attached Instance id */
    /** Disk is being detached from the given Instance */
    Detaching(Uuid), /* attached Instance id */
    /** Disk has been destroyed */
    Destroyed,
    /** Disk is unavailable */
    Faulted,
}

Which schemars turns into this:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "DiskState",
  "description": "State of a Disk (primarily: attached or not)",
  "anyOf": [
    {
      "type": "string",
      "enum": [
        "creating",
        "detached",
        "destroyed",
        "faulted"
      ]
    },
    {
      "description": "Disk is being attached to the given Instance",
      "type": "object",
      "required": [
        "attaching"
      ],
      "properties": {
        "attaching": {
          "type": "string",
          "format": "uuid"
        }
      },
      "additionalProperties": false
    },
    {
      "description": "Disk is attached to the given Instance",
      "type": "object",
      "required": [
        "attached"
      ],
      "properties": {
        "attached": {
          "type": "string",
          "format": "uuid"
        }
      },
      "additionalProperties": false
    },
    {
      "description": "Disk is being detached from the given Instance",
      "type": "object",
      "required": [
        "detaching"
      ],
      "properties": {
        "detaching": {
          "type": "string",
          "format": "uuid"
        }
      },
      "additionalProperties": false
    }
  ]
}

In this case, the anyOf seems important in that different enum values have associated types (i.e. the Uuids), and some of the enum values have additional descriptions (although it does seem like a bug that schemars omits those descriptions for simple values. Do you have a proposal for how we could simplify this type? Or are you talking about a different sort of type?

FWIW I once found some odd output from schemars only to discover it was a shortcoming of the spec: GREsau/schemars#42

@jessfraz
Copy link
Contributor Author

jessfraz commented Aug 18, 2021 via email

@jessfraz
Copy link
Contributor Author

jessfraz commented Aug 18, 2021 via email

@ahl
Copy link
Collaborator

ahl commented Aug 18, 2021

I haven't looked at that many cases, but I have found that cases that can be denoted strictly as an enum seem to be represented that way already (at least with schemars 0.8.3). Is DiskState above one of the types you'd like to see emitted without the anyOf? How would we represent it?

@jessfraz
Copy link
Contributor Author

jessfraz commented Aug 18, 2021 via email

@ahl
Copy link
Collaborator

ahl commented Aug 18, 2021

Well, schemars just does JSON Schema. But OpenAPI hands off most of the definition of their schema object to JSON Schema:

https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.3.md#schemaObject

And they've gotten even closer in 3.1 (not that that matters yet / ever)

https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#schemaObject

I don't see how enum can be used to address these enum variants that include additional data, but I'd love to see it simplify of course!

@jessfraz
Copy link
Contributor Author

jessfraz commented Aug 18, 2021 via email

@jessfraz
Copy link
Contributor Author

And I have no way of knowing from the spec that these should be untagged, otherwise the API when called will never match on anything. And if I set them all as untagged enums well that would fuck up all the other specs I generate, and sure I can make a special case for oxide but I just feel like this is going to lead us to a world of pain with every other language we want to generate as well.

@jessfraz
Copy link
Contributor Author

Honestly just getting rid of the weird object types with a single property would help a lot because then at least this would parse as strings.

@jessfraz
Copy link
Contributor Author

hmm it might work because I do have anyof and allof as untagged but its just gross. and they need titles for the objects or else I have to generate one and it comes out like AllOfObject1

@jessfraz
Copy link
Contributor Author

mostly unless you have better ideas its nearly impossible for me to go from that schema back to the original unless I do a bunch of weirdo logic to make sure all these objects only have one value...

@jessfraz
Copy link
Contributor Author

let me try it and see if that breaks others, in theory it shouldnt

@smklein
Copy link
Contributor

smklein commented Aug 19, 2021

Would it help us to constrain the enum types which could be exposed over this interface? That might limit the number of cases a potential generator would need to handle.

Rust enums can be both named and unnamed:

enum MyEnum {
  A,                // Unit variant
  B(i32),           // Unnamed field
  C { value: i32 }, // Named field
}

I care pretty strongly that we are able to associate data with a single variant, but there is definitely some flexibility in how we go about doing so.

@ahl
Copy link
Collaborator

ahl commented Aug 21, 2021

@smklein this might be a good one to discuss. Without any serde annotations related to enum tags, the enum above would serialize like this:

{
  "my_enum1": "a",
  "my_enum2": { "b": 123 },
  "my_enum3": { "c": { "value": 123 } }
}

(This is referred to in the serde docs as "externally tagged")

I'll argue that that's kind of janky in json and in other, non-rust, languages. In particular, I'd suggest that a given value have a single valid data type whereas A serializes to a string and B serializes to an object.

I think that serde adjacent tagging provides a better interface:

{
  "my_enum1": { "tag": "a" },
  "my_enum2": { "tag": "b", "content": 123 },
  "my_enum3": { "tag": "c", "content": { "value": 123 } }
}

I think that's a nicer interface.

In the case of DiskState above, however, we might just want to have the state be a simple enum with no data payload and have a property of the disk for the associated instance (Option<Uuid>).

A related topic that Jess and I discussed was an openapi-lint crate that we could invoke from tests. The only two lints that I've got written down are both to validate type consistency:

  1. values should have a single valid type i.e. type should not be an array
  2. subschemas (e.g. for allOf, oneOf or anyOf) should not be of different types (e.g. not array and object, but two objects would be fine)

@smklein
Copy link
Contributor

smklein commented Aug 23, 2021

I think that serde adjacent tagging provides a better interface:

{
  "my_enum1": { "tag": "a" },
  "my_enum2": { "tag": "b", "content": 123 },
  "my_enum3": { "tag": "c", "content": { "value": 123 } }
}

I think that's a nicer interface.

Yeah, definitely agreed - the distinction of name / data is much more explicit with the tagging, and adding this tagging would be non-invasive for our Rust backend, so it's pretty much "free".

In the case of DiskState above, however, we might just want to have the state be a simple enum with no data payload and have a property of the disk for the associated instance (Option<Uuid>).

We... can. If there aren't any good solutions in this space, I'd relent, but this comes with a cost.

In the case of DiskState, we have the benefit that the state is just a "Uuid", and that it's the same between variants, which makes it easy to make that conversion.

Consider another enum (full caveat, I made this one up):

pub enum InstanceState {
  Creating(Progress)
  Starting,
  Running(RunState),
  Stopping,
  Failed(Reason)
  ...
}

In this case, would we need Option<Progress>, Option<RunState>, and Option<FailedReason>? It gets grosser when variant-specific state needs to potentially exist in all states.

@ahl
Copy link
Collaborator

ahl commented Aug 23, 2021

Yeah, definitely agreed - the distinction of name / data is much more explicit with the tagging, and adding this tagging would be non-invasive for our Rust backend, so it's pretty much "free".

Cool. I imagine this lint tool would catch this and could advise that adjacent tagging is a good solution.

In this case, would we need Option<Progress>, Option<RunState>, and Option<FailedReason>? It gets grosser when variant-specific state needs to potentially exist in all states.

Totally fair. I don't feel strongly that one way is clearly always correct. We might just keep an eye out for situations that make clients awkward to write. For example, when the console displays information about a disk, will the associated instance (via a UUID, link, or some other info) be a field that's always visible or will it be part of the status.

@david-crespo
Copy link
Contributor

david-crespo commented Sep 2, 2021

The new lint is now enforcing that the arms of an anyOf or oneOf have to be the same shape, but there are spots we're getting anyOf that I think should be oneOf. I think this is related to the above because oneOf is a middle ground between anyOf and a simple string enum in terms of complexity for the generator.

anyOf vs. oneOf makes a big difference to the way the TypeScript generator behaves. By changing DiskState's anyOf to oneOf in the spec, I get the following, which is what I would expect for this type.

export type DiskState =
  | DiskStateOneOf
  | DiskStateOneOf1
  | DiskStateOneOf2
  | DiskStateOneOf3
  | DiskStateOneOf4
  | DiskStateOneOf5
  | DiskStateOneOf6

With anyOf, the generator produces nonsense. You can see the before and after here: https://github.com/oxidecomputer/console/compare/one-of

@ahl
Copy link
Collaborator

ahl commented Sep 8, 2021

@david-crespo apologies for the delayed response on this. Agreed that that's the case. I would have hoped that schemars would generate oneOf in these situations rather than anyOf:

GREsau/schemars#98

I could imagine a few fixes:

  1. PR for schemars; I've contacted the maintainer to see if he'd welcome some assistance
  2. fork schemars
  3. Do some fix up in dropshot i.e. detect and correct this scenario
  4. Build a stand-alone tool to do this fix up
  5. Abandon schemars and build a openapiv3_derive crate to use instead

Number 1 is my preferred choice. I think I'd prefer 5 to 2 i.e. if we're going to maintain our own thing I'd prefer to maintain the thing that's closer to what we actually want and avoid the conversion from schemars to openapiv3::Schema. And I prefer 4 to 3 in that I don't really feel like this should be an intrinsic part of Dropshot, but I don't feel that strongly.

If you have preferences or other options please let me know.

@david-crespo
Copy link
Contributor

I agree that 1 is best! 2 and 5 sound like a lot of work. 3 and 4 seem about equal to me — if it's a 20 line function to post-process the generated schema, it doesn't seem too important to me whether it goes in Dropshot or Nexus, though it does seem pretty useful to users of Dropshot to have oneOfs in there when they're strictly more correct (unless they're not, or some people don't want them).

@ahl
Copy link
Collaborator

ahl commented Sep 8, 2021

    1. is a bunch of annoying work serde-like derives require trait definition for a lot of primitive types
    1. I don't love, but I guess that would be a reasonable kludge
  • Let's give 1 at least a few days

@ahl
Copy link
Collaborator

ahl commented Oct 16, 2021

this is fixed in schemars 0.8.6

GREsau/schemars#108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants