Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding serializable enums (or enums with backing types and specified representations). #615

Merged
merged 7 commits into from
May 25, 2018
220 changes: 214 additions & 6 deletions p4-16/spec/P4-16-spec.mdk
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Author: The P4 Language Consortium
Heading depth: 5
Pdf Latex: xelatex
Document Class: [11pt]article
Package: [top=1in, bottom=1.25in, left=1in, right=1in]{geometry}
Package: [top=1in, bottom=1.25in, left=1in, right=1in]geometry
Package: fancyhdr

Tex Header:
Expand Down Expand Up @@ -2026,12 +2026,22 @@ An enumeration type is defined using the following syntax:
~ Begin P4Grammar
enumDeclaration
: optAnnotations ENUM name '{' identifierList '}'
| optAnnotations ENUM typeRef name '{' specifiedIdentifierList '}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could make this even more specific:

optAnnotations ENUM BIT '<' INTEGER '>' name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You noted this here, but then noted later that the typeRef here worked, do you have a preference?

;

identifierList
: name
| identifierList ',' name
;

specifiedIdentifierList
: specifiedIdentifier
| specifiedIdentifierList ', ' specifiedIdentifier
;

specifiedIdentifier
: name '=' initializer
;
~ End P4Grammar

For example, the declaration
Expand All @@ -2043,10 +2053,55 @@ enum Suits { Clubs, Diamonds, Hearths, Spades }
introduces a new enumeration type, which contains four
constants---e.g., `Suits.Clubs`. An `enum` declaration
introduces a new identifier in the current scope for naming the
created type. The underlying representation of such values is not
created type. The underlying representation of the `Suits` enum is not
specified, so their "size" in bits is not specified (it is
target-specific).

It is also possible to specify an `enum` with an underlying representation.
This requires the programmer provide both the fixed-width integer type and an associated
fixed-width integer value for each symbolic entry in the enumeration. For example, the
declaration

~ Begin P4Example
enum bit<16> EtherType {
VLAN = 0x8100,
QINQ = 0x9100,
MPLS = 0x8847,
IPV4 = 0x0800,
IPV6 = 0x86dd
// ...
}
~ End P4Example

introduces a new enumeration type, which contains five constants---e.g.,
`EtherType.IPV4`. This `enum` declaration specifies the fixed-width integer representation
for each entry in the `enum` and provides an underlying type: `bit<16>`.
This type of `enum` declaration can be thought of as declaring a new `bit<16>`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow signed enum values as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've used unsigned in all of the examples, but I don't think there is any particular reason to restrict it to this. Do you have an opinion? (I'm happy to add a case for it.)

type, where variables or fields of this type are expected to be unsigned 16-bit
integer values, and the mapping of symbolic to numeric values defined by the
`enum` are effectively constants defined as a part of this type. In this way,
an `enum` with an underlying type can be thought of as being a type derived
from the underlying type carrying equality, assignment, and casts to/from the
underlying type.

Compiler implementations are expected to raise an error if the fixed-width integer representation
for an enumeration entry falls outside the representation range of the underlying
type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the issue of multiple entries having the same representation should be discussed somewhere in here. For example, whether this is allowed:

enum bit<16> SomeEnum {
  Case1 = 0x0001,
  Case2 = 0x0001, // related, can I say instead "= Case1"?
  Case3 = 0x0002
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Given the contents of the rest of the proposal, it seems that this should not be allowed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to disallow this


For example, the declaration

~ Begin P4Example
enum bit<8> FailingExample {
first = 1,
second = 2,
third = 3,
unrepresentable = 300
}
~ End P4Example

would result in an error because the fixed-width integer value associated with
`FailingExample.unrepresentable` cannot be represented as a `bit<8>` value.

Annotations, represented by the non-terminal `optAnnotations`, are
described in Section [#sec-annotations].

Expand Down Expand Up @@ -2278,14 +2333,17 @@ infinite-precision integer, without a width specified.
| `error` | error | error | allowed |
| `match_kind` | error | error | error |
| `bool` | error | error | allowed |
| `enum` | error | error | allowed |
| `enum` | allowed[^enum_header] | error | allowed |
| `header` | error | allowed | allowed |
| header stack | error | error | allowed |
| `header_union` | error | error | allowed |
| `struct` | error | error | allowed |
| `tuple` | error | error | allowed |
|----------------|-----------|---------|----------|

[^enum_header]: an `enum` type used as a field in a `header` must specify a
underlying type and representation for `enum` elements.

Rationale: `int` does not have precise storage requirements,
unlike `bit<>` or `int<>` types. `match_kind`
values are not useful to store in a variable, as they
Expand Down Expand Up @@ -2717,13 +2775,163 @@ X.v1 // reference to v1
v1 // error - v1 is not in the top-level namespace
~ End P4Example

Similar to errors, `enum` expressions only support equality (`==`)
and inequality (`!=`) comparisons. Expressions whose type is an `enum`
Similar to errors, `enum` expressions without a specified underlying type only support equality (`==`)
and inequality (`!=`) comparisons. Expressions whose type is an `enum` without a specified underlying type
cannot be cast to or from any other type.

Note that if an `enum` value appears in the control-plane API, the
An `enum` may also specify an underlying type, such as the following:

~ Begin P4Example
enum bit<8> E {
e1 = 0,
e2 = 1,
e3 = 2
}
~ End P4Example

Note that each symbolic value in the enum must map to a unique fixed-width integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we decide to keep the "unique" requirement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can case names be used while defining the enum?

This is particularly useful if aliases are allowed:

enum bit<32> Authors {
  Clemens = 0,
  Twain = Clemens // is Clemens defined here? is Authors.Clemens?
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear, a declaration is available only after it has been "completed". Not clear whether the Clemens declaration ends with the comma or with the closing brace.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that since it's useful, it should be allowed. But there should be some language to say one way or another.

representation, for example the `enum` in the following example would result in
error:

~ Begin P4Example
enum bit<8> Bad {
b1 = 0,
b2 = 1, // Error: b2 and b3 both map to 1
b3 = 1, // one must be changed.
b4 = 2
}
~ End P4Example

An `enum` with an underlying type also supports explicit casts to and from the
underlying type. For instance, the following code:

~ Begin P4Example
bit<8> x;
E a = E.e2;
E b;

x = (bit<8>) a; // sets x to 1
b = (E) x; // sets b to E.e2
~ End P4Example

casts `a`, which was initialized to `E.e2` to a `bit<8>`, using the specified
fixed-width integer representation for `E.e2`, `1`. The variable `b` is then set to the
symbolic value `E.e2`, which corresponds to the fixed-width integer value `1`.

Note that while it is always safe to cast from an `enum` to its fixed-width integer type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "safe" is not the right word here; with our current definition both casts are actually "safe", but the second one does not create a "named" value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

it may not be the case that every fixed-width integer value specified by the underlying type
has a related symbol in the `enum`. For instance, the following code:

~ Begin P4Example
bit<8> x = 5;
E e = (E) x; // sets e to an unnamed value
~ End P4Example

sets `e` to an unnamed value, since there is no symbol corresponding to the
fixed-width integer value `5`. While the value of `e` is unspecified, it must not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say here that the value of e is unnamed. It certainly is not unspecified, in fact it is 5. The phrase about "must not correspond" is no longer necessary.

correspond to any symbol defined within enum `E`. This is because casting from
a fixed-width integer type to an `enum` type is required to obey the following semantics:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you have the later condition that (bit)(E)x == x for all possible 2^W values of x of type bit<W>. Assuming we keep that condition (seems reasonable to me), it isn't really true that the assignment to e in the example code above assigns e an "unknown" value? I mean, it must be a value not equal to any named member of type E, I agree, but it must also be able to be cast back to a bit<8> type value of 5, and no other, so we know a lot more about it than that it is simply unknown.

I mention this simply because uninitialized values in P4_16 are, I believe, "more unknown" than the value of e in this sample code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you refer to this value as, if not unknown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The phrase starting with "This is because" is probably unnecessary now.


Given an enum type $E$ with underlying type `bit<W>` and the set of mappings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are no longer requirements, they are consequences of the definition. They will always hold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed all of the language around this.

of $(s, n)$ where $S$ is the set of symbols ($s$) and $N$ is the set of fixed-width integer
representations ($n$), the following properties must hold:

1. if an integer $x \notin N$ then $\forall e \in S$, `(E)x != e`
1. $\forall x \in [0, 2^W]$, `(bit<W>)(E)x == x`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don' t know why you want this property to hold. Who cares about it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to say the property holds by definition of an enum with underlying type and doesn't need to be spelled out explicitly?

Copy link
Collaborator

@jafingerhut jafingerhut May 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbudiu-vmw If we are going to the trouble of specifying an underlying type that is bit<W>, are you saying you think a compiler would gain some kind of efficiency advantage in a target by violating that property?

More directly related to your question, if implementations guaranteed the property, it certainly makes explaining exactly how an enum with an underlying type works -- it is really a bit<W> "under the hood", and casts between bit<W> and the enum type never change the underlying bit pattern. Done. End of story. No weird corner cases to think about, except that such an enum value might not be equal to any of the named enum member values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that is the model, then we should say so explicitly in the spec.
This will preclude other implementations, and will also make it easier for people to understand what is going on. Do not leave freedom to the compiler writers if you think you will be sorry about it later. Like people designing C did with respect to integer overflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was requested in the issue #611 related to this and I basically pulled the text from there (with a lite-rewording & turning some text in math). Where do we stand on this?

Personally, I don't feel strongly about it one way or the other.

1. $\forall e \in S$, `(E)(bit<W>)e == e`
Copy link
Collaborator

@jafingerhut jafingerhut May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I roughly translate this into English, if an enum type with a backing type bit<W>, where the numeric values defined by the enum definition do not include all possible 2^W values of the backing type, then it is guaranteed that variables of that enum type might not be equal to any of the named values in the enum type?

Is the comment in the last 'else' branch below correct, given this proposal?

enum bit<2> MyPartialEnum_t {
    VALUE_A = 2w0,
    VALUE_B = 2w1,
    VALUE_C = 2w2
}

MyPartialEnum_t x;

if (x == MyPartialEnum_t.VALUE_A) {
    // some code here
} else if (x == MyPartialEnum_t.VALUE_B) {
    // some code here
} else if (x == MyPartialEnum_t.VALUE_C) {
    // some code here
} else {
    // A P4 compiler MUST ASSUME that this branch can be executed
    // some code here
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's the idea. Although a more realistic scenario would probably look like this:

enum bit<16> EtherType {
  VLAN      = 0x8100,
  IPV4      = 0x0800,
  IPV6      = 0x86dd
}

header ethernet {
  ...
  EtherType etherType;
  ...
}

parser my_parser(...) {
  state parse_ethernet {
    packet.extract(hdr.ethernet);
    transition select(hdr.ethernet.etherType) {
      EtherType.VLAN : parse_vlan;
      EtherType.IPV4 : parse_ipv4;
      EtherType.IPV6: parse_ipv6;
      default: reject;
  }
}

I think it's important to have the guarantee that if the etherType extracted from the packet is not one of 0x8100 (VLAN), 0x0800 (IPV4) or 0x86dd (IPV6), then the default transition will be taken.

All of the guarantees listed here are achieved by a "trivial" implementation where an enum with an underlying type is completely taken care of at compile time and doesn't exist at runtime. I think that's what makes the enum useful: it makes it more convenient to write programs and generate runtime APIs, and it doesn't have any packet-processing runtime cost.

That being said I'm fine with introducing extern methods to safely cast enums or to check if a value is valid, as @jamescoole-cisco suggested, as long as they are optional when it comes to actually using enums in a P4 program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should one (or both) of these examples be added as explanation to the math?

Copy link
Contributor

@jamescoole-cisco jamescoole-cisco May 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that the first example is unique to enums with backing types. If uninitialized enum variables are intended to be "undefined" only within valid entries of the enum, it might be good to state that explicitly.

However, if that's the case, (compiler) implementations will already need to initialize non-backed enum variable storage in most cases, so why not require the same for enums with backing types?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamescoole-cisco I actually wrote a change to the P4_16 language spec, committed after v1.0.0 was published, that says something about exactly this case, but had forgotten I wrote that part. See #611 (comment) for details. Unless that text is changed further in the spec, P4_16 compilers are already free to make uninitialized enum and error values unequal to any of their named members.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given my previous comment, I am not against having example code like these in the spec, but if so, they should be clear that the examples apply for enum's with or without underlying types.

I think having no code examples would also be fine, but adding a sentence that mentions that all values with type enum can be not-equal to any of their named members, with a cross-reference to the section "## Operations on headers { #sec-ops-on-hdrs }" (Madoko syntax copied from the .mdk file)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be good to copy or move the discussion of how undefined both types of enums can be into section, since it applies in situations besides header fields. But IDK if the example is necessary to get the point across.


For example, in the following code, the `else` clause of the `if/else if/else`
block can be reached even though the matches on `x` are complete with respect
to the symbols defined in `MyPartialEnum_t`:

~ Begin P4Example
enum bit<2> MyPartialEnum_t {
VALUE_A = 2w0,
VALUE_B = 2w1,
VALUE_C = 2w2
}

bit<2> y = < some value >;
MyPartialEnum_t x = (MyPartialEnum_t)y;

if (x == MyPartialEnum_t.VALUE_A) {
// some code here
} else if (x == MyPartialEnum_t.VALUE_B) {
// some code here
} else if (x == MyPartialEnum_t.VALUE_C) {
// some code here
} else {
// A P4 compiler MUST ASSUME that this branch can be executed
// some code here
}
~ End P4Example

Additionally, if an enumeration is used as a field of a header, we would expect
the `transition select` to match `default` when the parsed integer value does
not match one of the symbolic values of `EtherType` in the following example:

~ Begin P4Example
enum bit<16> EtherType {
VLAN = 0x8100,
IPV4 = 0x0800,
IPV6 = 0x86dd
}

header ethernet {
...
EtherType etherType;
...
}

parser my_parser(...) {
state parse_ethernet {
packet.extract(hdr.ethernet);
transition select(hdr.ethernet.etherType) {
EtherType.VLAN : parse_vlan;
EtherType.IPV4 : parse_ipv4;
EtherType.IPV6: parse_ipv6;
default: reject;
}
}
~ End P4Example

Any `enum` that contains an unspecifed value, whether as the result of a cast
to an `enum` with an underlying type, parse into the field of an `enum` with an
underlying type, or simply the declaration of any `enum` without a specified
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this explanation can now be also simplified by reducing it to the case of bitstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be changed to "unnamed" as well. This whole paragraph is really unnecessary.
A question is whether we allow matching against raw constants:

select (hdr.ethernet.etherType) { 
     0x8000: something
}

initial value might not be equal to any of the values defined for that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think that "might" is too weak: it "will not" be equal for sure.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original intent of the sentence is that a variable of an enum type that contains at least one unnamed value might not be equal to any of the named values.

We could strengthen the sentence to "When a variable of an enum type with unnamed values is currently equal to one of those unnamed values, it will not be equal to any of the named values." But perhaps that would be redundant at this point in the text (I haven't carefully looked at the rest of the changes compared to this sentence.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just changed "might" to "will", I think that is accurate.

type. Such an unspecified value should still lead to predictable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unspecified -> unnamed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

behavior in cases where any legal value would match, e.g. it should
match in any of these situations:

- If used in a `select` expression, it should match `default` or `_`
in a key set expression.
- If used as a key with `match_kind` `ternary` in a table, it should
match a table entry where the field has all bit positions "don't
care".
- If used as a key with `match_kind` `lpm` in a table, it should match
a table entry where the field has a prefix length of 0.

Note that if an `enum` value lacking an underlying type appears in the control-plane API, the
compiler must select a suitable serialization data type and
representation.
For `enum` values with an underlying type and representations, the compiler should
use the specified underlying type as the serialization data type and
representation.

In addition to these operations a P4 implementation may choose to implement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we need to put this into the spec; it can be in the architecture definition file of the targets which do implement such functions. Also, "is_valid" should be probably called "is_named".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and removed these, I agree having them as part of an architecture description is probably right.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why I cannot separately reply to the question about matching against raw constants, but I would say this:

  1. I don't think we should allow this because I think the enum should be a distinct type that can be cast to the underlying type, but not treated exactly the same as the underlying type. I think if you really want to use the integer value (which really defeats the whole purpose of putting in an enum in the first place to me), it would be:

     select ((bit<16>)hdr.ethernet.etherType) {
        0x8000: something
     }
  2. I think forcing the programmer to be explicit about this is in keeping with the requirement for explicit casts in other places in the language.

  3. I've not yet written anything up about this, I think that is the one outstanding issue.

externs for explicitly checking that a fixed-width integer value corresponds to
a symbolic representation within an `enum` or provide a checked conversion
extern.

~ Begin P4Example
extern bool is_valid<Enum, Int>(Int x);
extern Enum convert<Enum, Int>(Int x, Enum default);
~ End P4Example

The `is_valid` extern returns `true` for integer values that correspond to
symbols in the supplied `Enum` type parameter. The `convert` function returns
either the symbolic value associated with `x` in `Enum` when one exists, or the
provided `default`.

## Expressions on Booleans { #sec-bool-exprs }

Expand Down