Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding serializable enums (or enums with backing types and specified representations). #615

Merged
merged 7 commits into from
May 25, 2018
111 changes: 106 additions & 5 deletions p4-16/spec/P4-16-spec.mdk
Original file line number Diff line number Diff line change
Expand Up @@ -2026,12 +2026,22 @@ An enumeration type is defined using the following syntax:
~ Begin P4Grammar
enumDeclaration
: optAnnotations ENUM name '{' identifierList '}'
| optAnnotations ENUM typeRef name '{' specifiedIdentifierList '}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this even more specific:

optAnnotations ENUM BIT '<' INTEGER '>' name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You noted this here, but then noted later that the typeRef here worked, do you have a preference?

;

identifierList
: name
| identifierList ',' name
;

specifiedIdentifierList
: specifiedIdentifier
| specifiedIdentifierList ', ' specifiedIdentifier
;

specifiedIdentifier
: name '=' initializer
;
~ End P4Grammar

For example, the declaration
Expand All @@ -2043,10 +2053,47 @@ enum Suits { Clubs, Diamonds, Hearths, Spades }
introduces a new enumeration type, which contains four
constants---e.g., `Suits.Clubs`. An `enum` declaration
introduces a new identifier in the current scope for naming the
created type. The underlying representation of such values is not
created type. The underlying representation of the `Suits` enum is not
specified, so their "size" in bits is not specified (it is
target-specific).

It is also possible to specify an `enum` with an underlying representation.
This requires the programmer provide both the numeric type and an associated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "the numeric type" be "a fixed-width integer type"?

numeric value for each symbolic entry in the enumeration. For example, the
declaration

~ Begin P4Example
enum bit<16> EtherType {
VLAN = 0x8100,
QINQ = 0x9100,
MPLS = 0x8847,
IPV4 = 0x0800,
IPV6 = 0x86dd
// ...
}
~ End P4Example

introduces a new enumeration type, which contains five constants---e.g.,
`EtherType.IPV4`. This `enum` declaration specifies the numeric representation
for each entry in the `enum` and provides a backing type: `bit<16>`.
Implementations are expected to raise an error if the numeric representation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly say "Compiler implementations"

for an enumeration entry falls outside the representation range of the backing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe you should use "underlying type" consistently instead of "backing type"?

type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the issue of multiple entries having the same representation should be discussed somewhere in here. For example, whether this is allowed:

enum bit<16> SomeEnum {
  Case1 = 0x0001,
  Case2 = 0x0001, // related, can I say instead "= Case1"?
  Case3 = 0x0002
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Given the contents of the rest of the proposal, it seems that this should not be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to disallow this


For example, the declaration

~ Begin P4Example
enum bit<8> FailingExample {
first = 1,
second = 2,
third = 3,
unrepresentable = 300
}
~ End P4Example

would result in an error because the numeric value associated with
`FailingExample.unrepresentable` cannot be represented as a `bit<8>` value.

Annotations, represented by the non-terminal `optAnnotations`, are
described in Section [#sec-annotations].

Expand Down Expand Up @@ -2278,14 +2325,17 @@ infinite-precision integer, without a width specified.
| `error` | error | error | allowed |
| `match_kind` | error | error | error |
| `bool` | error | error | allowed |
| `enum` | error | error | allowed |
| `enum` | allowed[^enum_header] | error | allowed |
| `header` | error | allowed | allowed |
| header stack | error | error | allowed |
| `header_union` | error | error | allowed |
| `struct` | error | error | allowed |
| `tuple` | error | error | allowed |
|----------------|-----------|---------|----------|

[^enum_header]: an `enum` type used as a field in a `header` must specify a
backing type and representation for `enum` elements.

Rationale: `int` does not have precise storage requirements,
unlike `bit<>` or `int<>` types. `match_kind`
values are not useful to store in a variable, as they
Expand Down Expand Up @@ -2717,13 +2767,64 @@ X.v1 // reference to v1
v1 // error - v1 is not in the top-level namespace
~ End P4Example

Similar to errors, `enum` expressions only support equality (`==`)
and inequality (`!=`) comparisons. Expressions whose type is an `enum`
Similar to errors, `enum` expressions without a specified backing type only support equality (`==`)
and inequality (`!=`) comparisons. Expressions whose type is an `enum` without a specified backing type
cannot be cast to or from any other type.

Note that if an `enum` value appears in the control-plane API, the
An `enum` that specifies a backing type, such as the following:

~ Begin P4Example
enum bit<8> E {
e1 = 0,
e2 = 1,
e3 = 2
}
~ End P4Example

also supports explicit casts to and from the backing type. For instance, the
following code:

~ Begin P4Example
bit<8> x;
E e1 = E.e2;
E e2;

x = (bit<8>) e1; // sets x to 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be "sets x to 0"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be wrong in my comment, but note that it is fairly confusing to have variables with the exact same names as the enum member names, e.g. e1, e2 are used as both in this example code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, e1 = E.e2, which is 1. I should, however, chose different names for variables e1 and e2 (that was a really confusing choice on my part!)

e2 = (E) x; // sets e to E.e2
~ End P4Example

casts `e1`, which was initialized to `E.e2` to a `bit<8>`, using the specified
numeric representation for `E.e2`, `1`. The variable `e2` is then set to the
symbolic value `E.e2`, which corresponds to the numeric value `1`.

Note that while it is always safe to cast from an `enum` to its numeric type,
it may not be th case that every numeric value specified by the backing type
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: "th" -> "the" ?

has a related symbol in the `enum`. For instance, the following code:

~ Begin P4Example
bit<8> x = 5;
E e = (E) x; // sets e to an unspecified value
~ End P4Example

sets `e` to an unknown value, since there is no symbol corresponding to the
numeric value `5`. While the value of `e` is unspecified, it must not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is simpler to keep this to be unspecified, and make no such promises.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly disagree. I believe that would greatly limit how these enums can be used. Please see the parser example I gave below in a different comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This property is important for programmers expectations. Without it the example that @antoninbas showed in the comments above is not something that a programmer could rely on working across P4 implementations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch, I missed the unspecified in this one. That said, the While the value ... has now been removed.

correspond to any symbol defined within enum `E`. This is because casting from
a numeric type to an `enum` type is required obey the following semantics:
Copy link
Contributor

@jamescoole-cisco jamescoole-cisco May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this has been discussed before and was in the proposal, but I'm not sure where it ended up.

It seems that providing something like bool is_valid<Enum, Back>(Back b) (or bool is_valid<Enum>(Enum e)), either as part of the language or an extern, would be useful.

If a user wants to check that an unsafe cast will (or did) succeed, they're still going to even if these aren't provided, but the code they write will need to be updated whenever the enum definition changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/required obey/required to obey/ ?


Given an enum type $E$ with underlying type `bit<W>` and the set of mappings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer requirements, they are consequences of the definition. They will always hold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all of the language around this.

of $(s, n)$ where $S$ is the set of symbols ($s$) and $N$ is the set of numeric
representations ($n$), the following properties must hold:

1. if an integer $x \notin N$ then $\forall e \in S$, `(E)x != e`
1. $\forall x \in [0, 2^W]$, `(bit<W>)(E)x == x`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' t know why you want this property to hold. Who cares about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to say the property holds by definition of an enum with underlying type and doesn't need to be spelled out explicitly?

Copy link
Collaborator

@jafingerhut jafingerhut May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbudiu-vmw If we are going to the trouble of specifying an underlying type that is bit<W>, are you saying you think a compiler would gain some kind of efficiency advantage in a target by violating that property?

More directly related to your question, if implementations guaranteed the property, it certainly makes explaining exactly how an enum with an underlying type works -- it is really a bit<W> "under the hood", and casts between bit<W> and the enum type never change the underlying bit pattern. Done. End of story. No weird corner cases to think about, except that such an enum value might not be equal to any of the named enum member values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the model, then we should say so explicitly in the spec.
This will preclude other implementations, and will also make it easier for people to understand what is going on. Do not leave freedom to the compiler writers if you think you will be sorry about it later. Like people designing C did with respect to integer overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was requested in the issue #611 related to this and I basically pulled the text from there (with a lite-rewording & turning some text in math). Where do we stand on this?

Personally, I don't feel strongly about it one way or the other.

1. $\forall e \in S$, `(E)(bit<W>)e == e`
Copy link
Collaborator

@jafingerhut jafingerhut May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I roughly translate this into English, if an enum type with a backing type bit<W>, where the numeric values defined by the enum definition do not include all possible 2^W values of the backing type, then it is guaranteed that variables of that enum type might not be equal to any of the named values in the enum type?

Is the comment in the last 'else' branch below correct, given this proposal?

enum bit<2> MyPartialEnum_t {
    VALUE_A = 2w0,
    VALUE_B = 2w1,
    VALUE_C = 2w2
}

MyPartialEnum_t x;

if (x == MyPartialEnum_t.VALUE_A) {
    // some code here
} else if (x == MyPartialEnum_t.VALUE_B) {
    // some code here
} else if (x == MyPartialEnum_t.VALUE_C) {
    // some code here
} else {
    // A P4 compiler MUST ASSUME that this branch can be executed
    // some code here
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the idea. Although a more realistic scenario would probably look like this:

enum bit<16> EtherType {
  VLAN      = 0x8100,
  IPV4      = 0x0800,
  IPV6      = 0x86dd
}

header ethernet {
  ...
  EtherType etherType;
  ...
}

parser my_parser(...) {
  state parse_ethernet {
    packet.extract(hdr.ethernet);
    transition select(hdr.ethernet.etherType) {
      EtherType.VLAN : parse_vlan;
      EtherType.IPV4 : parse_ipv4;
      EtherType.IPV6: parse_ipv6;
      default: reject;
  }
}

I think it's important to have the guarantee that if the etherType extracted from the packet is not one of 0x8100 (VLAN), 0x0800 (IPV4) or 0x86dd (IPV6), then the default transition will be taken.

All of the guarantees listed here are achieved by a "trivial" implementation where an enum with an underlying type is completely taken care of at compile time and doesn't exist at runtime. I think that's what makes the enum useful: it makes it more convenient to write programs and generate runtime APIs, and it doesn't have any packet-processing runtime cost.

That being said I'm fine with introducing extern methods to safely cast enums or to check if a value is valid, as @jamescoole-cisco suggested, as long as they are optional when it comes to actually using enums in a P4 program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should one (or both) of these examples be added as explanation to the math?

Copy link
Contributor

@jamescoole-cisco jamescoole-cisco May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that the first example is unique to enums with backing types. If uninitialized enum variables are intended to be "undefined" only within valid entries of the enum, it might be good to state that explicitly.

However, if that's the case, (compiler) implementations will already need to initialize non-backed enum variable storage in most cases, so why not require the same for enums with backing types?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamescoole-cisco I actually wrote a change to the P4_16 language spec, committed after v1.0.0 was published, that says something about exactly this case, but had forgotten I wrote that part. See #611 (comment) for details. Unless that text is changed further in the spec, P4_16 compilers are already free to make uninitialized enum and error values unequal to any of their named members.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my previous comment, I am not against having example code like these in the spec, but if so, they should be clear that the examples apply for enum's with or without underlying types.

I think having no code examples would also be fine, but adding a sentence that mentions that all values with type enum can be not-equal to any of their named members, with a cross-reference to the section "## Operations on headers { #sec-ops-on-hdrs }" (Madoko syntax copied from the .mdk file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to copy or move the discussion of how undefined both types of enums can be into section, since it applies in situations besides header fields. But IDK if the example is necessary to get the point across.


Note that if an `enum` value lacking a backing representation appears in the control-plane API, the
compiler must select a suitable serialization data type and
representation.
For `enum` values with a backing type and representations, the compiler should
use the specified backing type as the serialization data type and
representation.

## Expressions on Booleans { #sec-bool-exprs }

Expand Down