Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use different keywords for declaring tagged unions and C-style enums. #27

Closed
wants to merge 4 commits into from

Conversation

brendanzab
Copy link
Member

For example:

enum AB { A = 1, B }
enum CD { C, D(int) }

would become:

enum AB { A = 1, B }
union CD { C, D(int) }

For example:

~~~rust
enum AB { A = 1, B }
enum CD { C, D(int) }
~~~

would become:

~~~rust
enum AB { A = 1, B }
union CD { C, D(int) }
~~~
@emberian
Copy link
Member

emberian commented Apr 1, 2014

I am in favor.

@pnkfelix
Copy link
Member

pnkfelix commented Apr 1, 2014

I am opposed, but I won't fight if a tide forms.

@esummers
Copy link

esummers commented Apr 1, 2014

Should C-Style enums be enum! { A = 1, B = 2 } to show it isn't part of the core language?

@brendanzab
Copy link
Member Author

I feel very bad for bringing up this bikeshed, but it's been irritating me for a while now. I have tried my best to be as comprehensive as possible in order to encourage a focused debate. Let me know if I have missed anything, or whether you have any there are any gaps in the Alternatives section that you feel need to be addressed.

@SimonSapin
Copy link
Contributor

FWIW, I’m in favor of moving away from the name "enum" for things that are not what everyone else calls "enum" (sums of unit types.) I’m less convinced about the separation.

@brendanzab
Copy link
Member Author

@esummers If you have a more comprehensive explanation, I can add it to the Alternatives section. Do note that:

enum, like in C, provides support for groups of constants discriminated by
integer values. Unlike C, the resulting type name is not an alias to an
integer type, rather it is a new type that can only be equal to one of the
declared variants.

Were you arguing to change that?

@esummers
Copy link

esummers commented Apr 1, 2014

@bjz Not necessarily in favor of changing, I just think that interoperability with other languages should not be part of the core language syntax. It was my impression that assigning values to the tags was not recommended for pure rust code. Maybe I am off. I suppose assigning tags may be useful for persistent storage of serialized data.

Another thing: Splitting these may help with unifying with structs. For instance, I think it would be nice to nest a union inside a struct without having to declare separately. C-style enums muddles that possibility. I often want some shared fields mixed with the varients.

@japaric
Copy link
Member

japaric commented Apr 1, 2014

I'm in favor of the separation, and like the current names (enum and union), but wouldn't mind if union was renamed to something else.

@brendanzab
Copy link
Member Author

@esummers I agree with the sentiment that it is slightly unsavory that we are beholden to C in providing this functionality in the core language. A macro could very well be the solution, with the loss of being able to statically check that all cases are covered.

@emberian
Copy link
Member

emberian commented Apr 1, 2014

@esummers it's useful for FFI.

On Tue, Apr 1, 2014 at 11:09 AM, Brendan Zabarauskas <
[email protected]> wrote:

@esummers https://github.com/esummers I agree with the sentiment that
it is slightly unsavory that we are beholden to C in providing this
functionality in the core language.


Reply to this email directly or view it on GitHubhttps://github.com//pull/27#issuecomment-39216718
.

http://octayn.net/

@esummers
Copy link

esummers commented Apr 1, 2014

I had another idea. Maybe assigning integer tags to variants should be an attribute since it is mainly for C interoperability.

union CStyleEnum {
    #[tag=1] Foo,
    #[tag=2] Bar
}

EDIT: @bjz Good point about as. I think I like enum and union separate though. enum is basically creating a new type of integer-like literal. That is a different concept from union.

@brendanzab
Copy link
Member Author

But then you can't cast using as, which is very useful for FFI.

@bstrie
Copy link
Contributor

bstrie commented Apr 1, 2014

I could be in favor of this. I myself have experience with introducing C++ friends to Rust, and mild grievance with using the word enum for something so foreign is definitely common. I can't say how representative my friend group is with the general population, though.

I'm not entirely sold on the union keyword. From the perspective of C/C++ devs, you're just trading one overloaded term ("our enums aren't actually enums") for another ("our unions aren't actually unions"). Even though I do prefer union to enum, the difference between tagged and untagged unions is pretty important.

In ancient Rust, the enum keyword was tag (in fact, changing this keyword was one of the first times I ever grumbled at a design decision :P ). Perhaps we could consider that as well?

@bstrie
Copy link
Contributor

bstrie commented Apr 1, 2014

Following up on my last comment, here's the original mailing list thread where enum was originally chosen over union:

http://article.gmane.org/gmane.comp.lang.rust.devel/823/match=renaming+tag

@esummers
Copy link

esummers commented Apr 1, 2014

I haven't thought this through, but is there any reason we couldn't just use a variation of structs for enums/unions? Just specify variant constructors instead of fields when that is what you want.

struct MyUnion { // not a constructor when variants present
    Foo { // variant
         a: int,
         b: int,
    },
    Bar, // variant
    c: int, // field shared between both variants
}

EDIT: @bjz Clarified. I guess like #24 except not necessarily tied to virtual inheritance.

@brendanzab
Copy link
Member Author

@esummers That sounds suspiciously like #24. In that case I would prefer data or type because it would support generalized ADTs. The good thing is it it unifies some language constructs really nicely. I would still say it might be worthwhile keeping enum separate from that construct though.

@brendanzab
Copy link
Member Author

@esummers Why did you leave off the type identifier in that example? Your comment about there being 'no constructor' makes no sense either - Foo and Bar are the constructors, no?

@nikomatsakis
Copy link
Contributor

@bjz thanks for the careful explanation. I am trending negative on this proposal, let me explain why:

  1. It's a BIG change that affects virtually every Rust program, with relatively little benefit from my point of view.
  2. I think that, generally speaking, C-like enums and algebraic data types are the same thing, and the keyword choice is very astute in that regard. As evidence, I find that it very frequently happens that types begin their life as a simple enumerated list and later grow to include data. For example, I start out with a small set of error codes enum { Error1, Error2 } and then come across a case that needs some details enum { Error1, Error2, Error3(int) }. I am always so happy in Rust because I can accommodate this case so easily, and so miserable in C++ because enum doesn't scale to handle it.

As to the special treatment of C-like enums, I don't see it as anything special. Different types are good for different things. Just like not all enums can implement Clone, not all enums can be cast to integers. If this different behavior is considered such a wart, I'd rather address it by removing the special treatment of C-like enums altogether. It can easily be replaced with other language features:

  • Custom discriminants could be handled via annotations on the variants (like repr).
  • Casting ought to just be a trait with deriving -- I think this trait may even currently exist? I can never remember. Using a trait also allows for reverse casts (integer -> Option).

The main objections were always (1) people like custom discriminants and (2) using a trait with deriving interferes with the ability to cast variants in constant expressions. Oh, constexpr, how tempting you are in your own way.

P.S. For historical note, enums were once defined with the keyword tag. There was a big mailing list discussion where both enum and union were considered and enum won. I don't recall the precise arguments but they might be relevant.

@brendanzab
Copy link
Member Author

with relatively little benefit from my point of view.

Being mindful of where we sit in the canon of programming languages, and trying not to muddy terminology too much is very important. If we hope that Rust makes as big of an impact as we want, then we don't want to be cursed for years to come for confusing terminology.

Also being accessible to systems programmers, who find the use of enumconfusing at first is important too (see @bstrie's anecdotal evidence). I actually remember that I was really confused at first too (I came in after the tag -> enum change), because I was familiar with Java enums. We might be comfortable with it, but that's probably just because we work with the language everyday. It is important to remember to try on the hat of the outsider once in a while.

As evidence, I find that it very frequently happens that types begin their life as a simple enumerated list and later grow to include data.

That workflow would not change. union would be the default thing that you would reach for. So your workflow would become union { Error1, Error2 } -> union { Error1, Error2, Error3(int) }.

I didn't want to muddy the RFC with a further potential change, but I could see enum perhaps being augmented to have:

enum People: (&static str, &static str) {
    Ann = ("Ann", "Bennett"),
    Robert = ("Robert", "Brown"),
}

Apologies for the pretty horrible example. I have seen it used in the D community quite often, but I don't have any examples on hand. If enums were generalised in that way then I would be more comfortable with them being a language construct.

That would also allow you to use type aliases for your reprs (something that is annoyingly not possible today).

@brendanzab
Copy link
Member Author

Also, in regards to C folks being confused by union being unsafe, I am less convinced that will be a problem. They are already coming to Rust with the hope that they can do most of what they already do, but in a safe, statically checked way. If we say that union and enum are both have slightly different semantics to their C counterparts, then the question becomes, "which is closer to the semantics they are familiar with?"

@bstrie
Copy link
Contributor

bstrie commented Apr 1, 2014

That workflow would not change. union would be the default thing that you would reach for. So your workflow would become union { Error1, Error2 } -> union { Error1, Error2, Error3(int) }.

That's not the impression that I get from this proposal. If you leave enum around, you're just as likely to have fresh-faced C and C++ programmers reaching for enums here merely out of familiarity, and then only later realizing that they have to switch to unions for attaching tagged data.

Thinking about it as well, it feels really gross that both these would be possible:

enum Foo {
    Bar = 1,
    Qux = 2,
}

union Foo {
    Bar,
    Qux
}

I can already imagine the endless questions of when to use one or the other. The variant = integer form was always intended merely as a bone to C interop, and I'm not convinced that it deserves its own construct. I'm ambivalent about changing the keyword, but I do think these should remain as a special case of tagged unions (or maybe become a macro).

My anecdotal evidence that my friends don't like the enum keyword was also not intended to sway discussion, because we need a bigger sample size. In the wild, I haven't actually heard very many people complain about it. Perhaps it's simply an initial toe-stub that people put behind them in practice.

@bstrie
Copy link
Contributor

bstrie commented Apr 1, 2014

For posterity, Graydon's original rationale for enum over union:

Between "enum" and "union", I tend to favor "enum", for a simple
reason:

  • attempting to use a variant as a C-style or Java-style enum will work
    flawlessly;
  • by opposition, attempting to use a variant as a C-style union will
    fail for reasons that will be very unclear for C programmers.

If we have to rename it, I'd go with enum as well.

Also for mapping C libs into rust-ese, it'll be nice to support
providing-the-numbers style, as in: "enum x { foo=2; bar=16; }", which
we currently don't support, but should.

I'm not sure it's that weird for "enum" to be overloaded to mean
"newtype" as well. People regularly use enum-in-C++ for just that
reason: to make a type-disjoint int-like-thing. Let's try just renaming
it and see how it goes.

(Note that newtype enums were dropped aeons ago in favor of newtype structs.)

@brendanzab
Copy link
Member Author

That's not the impression that I get from this proposal. If you leave enum around, you're just as likely to have fresh-faced C and C++ programmers reaching for enums here merely out of familiarity, and then only later realizing that they have to switch to unions for attaching tagged data.

That is a concern. Thanks for putting it that way.

Thinking about it as well, it feels really gross that both these would be possible:

Hmm yeah, it does. :(

My anecdotal evidence that my friends don't like the enum keyword was also not intended to sway discussion, because we need a bigger sample size. In the wild, I haven't actually heard very many people complain about it.

You have a far better gauge on how Rust is perceived in the outside of the core community, so I would trust your judgement far more than my own. Perhaps I'm making much ado about nothing.

@brendanzab
Copy link
Member Author

I do think these should remain as a special case of tagged unions (or maybe become a macro).

Using a macro/syntax extension like enum! would signal that discriminated tagged unions are less recommended for general usage, and would simplify the core language. But then the constexpr issue that @nikomatsakis eluded to rears its head.

@bstrie
Copy link
Contributor

bstrie commented Apr 1, 2014

Other considerations aside, I don't think it would be a bad thing exactly to rename enum to union, it might just be more-or-less a lateral move. Once you know the language, union certainly matches what you'd expect a bit better than enum. But given that we want to support C-style enums as well, the following does look a little weirder with union than with enum:

union Foo {
    Bar = 1,
    Baz = 2
}

Maybe it could look less weird with a different syntax:

union Foo {
    Bar(1),
    Qux(2)
}

...although I have no idea if this would cause problems if we later extended the type system to include numerics.

@dobkeratops
Copy link

I've used c/c++ since about 1994 and it took me about 20 seconds to get used to the use of the name; "enum variants" are still values, just with more internal structure (ok, so the tag is seperate). There are things that take time to get used to , thats' no where near one of them.

I think the current naming is fine, because it emphasises that it is NOT a C union - which has the ability to unsafely acess the wrong data; and you could keep union available if you decided you wanted to embed more direct overlap with C/C++*.. like C/C++ unions available in unsafe code, for library data..
I dream of a common compiler middle, a single AST fed by c++ or rust front end. Might sound far fetched but look how c++14 is getting polymorphic lambdas

Still, it wouldn't be a disaster if it was changed.

@nrc
Copy link
Member

nrc commented Apr 1, 2014

-1 from me. Union seems like a bad choice since it also has an existing meaning that is different from the Rust use. Importantly, I don't think we should add more (or split existing) data structures in Rust - it already has a lot. As a matter of opinion I didn't find the use of enum confusing, it is just a different abstraction from other languages. Of course on that point, YMMV.

@SimonSapin
Copy link
Contributor

I dream of a common compiler middle, a single AST fed by c++ or rust front end.

(Like LLVM’s intermediate representation (IR)? clang and rustc are front ends for it.)

@dobkeratops
Copy link

(Like LLVM’s intermediate representation (IR)? clang and rustc are front ends for it.

I mean middle - a single AST that is a superset of what rust and C++ can represent.. add borrow check warnings to your C++ compiler.. emit bindings between rust&c++ libraries translating everything .. unify concepts/traits where possible. I'm not volunteering to write this though :)

@nielsle
Copy link

nielsle commented Apr 2, 2014

Just a thought: It makes sense to iterate over the variants of a c-style "enum", but it doesn't necessarily make sense to iterate over the variants of a "union". (Inspired by rust-lang/rust#5417 )

enum Directons  {North, East, South, West};
for dir in Directions::iter() { ... do stuff ....}

@bill-myers
Copy link

The equivalent to Rust enums in popular languages are class hierarchies, not C unions.

The problem is however that the syntax to define class hierarchy is repetitive, since each variant needs to specify its superclass, so we can't just adopt it as our only syntax for enums.

So, I think there is really no perfect solution to this problem.

On the other hand, mathematically, the obvious name is "disjoint union", while in computer science, it's either called a tagged/discriminated union or sum type

Maybe we could invent a keyword related to either class hierarchies ("hier"?), disjoint unions ("disj"?) or some of the terms used in CS ("tag", "tagged", "disc", "sum", ...), but not sure if that's better.

Also, if we make enum variants first class like #11 and #24 propose, even enums of singletons are not really like C enums, so we may want to use a single keyword different than "enum" for all cases.

@dobkeratops
Copy link

who is the target audience, if its about clarity ?
systems programmers are still likely to have no choice but to learn c&c++ - so keeping the vocabulary compatable is nice - and i've had no problem at all with the idea that the enum has data added ;
however I have not dealt with a variety of languages 'in anger', or even studied compsci formally.

i gather scala has 'case class', or perhaps you could clarify with "enum union {..}" - but it would be a shame to lose the 'one keyword introduces a definiton' property that rust currently has .. so easy to grep all the definitions out of a file & so on.

is there any way to poll this scientifically?

@bstrie
Copy link
Contributor

bstrie commented Apr 3, 2014

In all honesty, my personal scientific poll was going to be "go ask dobkeratops, he knows a thing about C++". But then you somehow found your way here on your own. :P

@brendanzab
Copy link
Member Author

@bill-myers Re. #24, if that was the case I would reiterate what I said before: it seems like it would unify ADTs into one construct, so maybe data might be a a better term to use.

Anyway, I think there have been lots of good arguments in favor of the status quo. I am pretty much convinced that it is probably not worth the change. At least we now have a good, documented discussion on the matter that we can point folks to in the future. Let me know if you would like me to close this.

@bstrie
Copy link
Contributor

bstrie commented Apr 3, 2014

I don't think there's any reason to leave this open for now, we're pretty much just bikeshedding now. You're right in that it's good to have this discussion documented to avert future bikesheds.

@thestinger
Copy link

I don't really like either enum or union for this. It's certainly not an enumerated type in all cases, and works nothing like a C enum even when it is since you can't even XOR them together.

It's a tagged union rather than equivalent to a plain C union so while I think it would be a better name, I don't think it's going to completely avoid confusion and I don't want two names for what is essentially the same concept.

@lilyball
Copy link
Contributor

lilyball commented Apr 4, 2014

It appears as though most people are not in favor of this change. I too am not in favor of this change (I think @bstrie described very well the reasons why). But on the off chance that the tide turns again, renaming enum would not be the end of the world. But I very strongly disagree with splitting it into two different keywords. If it's going to be renamed, just rename it, don't duplicate it. And if it's going to be renamed, I personally think either data or maybe something like varying (to go with the term "variant") makes sense. But the best option is to just leave it as enum.

@dobkeratops
Copy link

I suppose one argument for changing it is for (the more stubborn half of)* C++ people who might just assume, at a glance, its nothing special. "match is just switch++" (no it isn't,its key to how the langauge avoids nullpointers). psychological/political effect. (* i, personally, am 100% happy with enum)

@lilyball
Copy link
Contributor

lilyball commented Apr 4, 2014

@dobkeratops If C++ folks are ok saying "match is just switch", then they'd probably be ok saying "data is just enum" as well.

I think the more important thing to do is just to change the tutorial/reference to de-emphasize C-like enums and emphasize non-C-like enums instead. And don't even talk about explicit discriminants or casting to uint until later.

@xgalaxy
Copy link

xgalaxy commented Apr 4, 2014

I'm a nobody that just happened on this discussion. For what its worth I agree 100% with kballard. This seems like an education issue. In the tutorial, don't teach enum like it is a C/C++ enum. Slap them in the face with the difference.

@brendanzab
Copy link
Member Author

@xgalaxy indeed. I think the tutorial makes a mistake in introducing discriminated enums first. They should be pushed further down, perhaps even out into the FFI tutorial.

@brendanzab
Copy link
Member Author

Perhaps we are wrong even in using enum, as others suggested. Here is a post on reddit showing a new user confused by the behaviour. Perhaps we could just rename enum to data, then have an enum! macro. Maybe something like this? rust-lang/rust#13072

@brendanzab
Copy link
Member Author

Another post on /r/rust: Enum type confusion [Beginner]

@SimonSapin
Copy link
Contributor

Why should data be a sum (tagged union) rather than a product (struct) in our type algebra?

@brendanzab
Copy link
Member Author

struct Foo { x: T }

Is almost the same as:

enum Bar {
    Bar { x: T }
}

The differences being that Bar allows for direct field access, like foo.x, and that destructuring against the variant is irrefutable.

Why should data be a sum (tagged union) rather than a product (struct) in our type algebra?

It would be nice to have a unified data keyword for declaring all ADTs, but I don't know a nice syntax for it and how to resolve the slight semantic differences.

@brendanzab
Copy link
Member Author

I wonder if we could think of:

struct Foo { x: T }

As something like:

#[deriving(Deref, DerefMut)]
data Foo({ x: T })

:/

@brendanzab brendanzab closed this Apr 30, 2014
@brendanzab brendanzab reopened this Apr 30, 2014
@noct
Copy link

noct commented May 7, 2014

I'm not a contributor or anything, but as a C/C++ developer, I'm very surprised at the general support towards keeping enum, given how dissimilar Rust's enums are from C enums, and how close they are to unions, particularly in practical use.

I find the concern about the difference between tagged/untagged especially confusing given that the majority of union use I see are sum types. The remainder are generally type conversions to avoid issues with strict aliasing; irrelevant to Rust.

It follows that Rust being a safe language would have a safe union.

In contrast, enums are generally used as associated sets of constant integers, quite often not default ordered, either as flags or as specified by an external protocol, and occasionally containing duplicate values.

Given that Rust's tagged unions only satisfy a subset of those cases, enum could probably just go away. C developers learning Rust could be advised to use tagged unions where default ordered enums were used, and macros like bitflags! for the others.

@brendanzab
Copy link
Member Author

I find the concern about the difference between tagged/untagged especially confusing

Me too.

enum could probably just go away

Yeah, I think @nikomatsakis is talking about removing c-style enums, making reducing the number arguments in favor of the enum keyword.

Ultimately, I am one voice amongst many. And I am growing a little tired of the fight. However this RFC has shown up on the weekly meeting agenda a few times (I think from @alexcrichton?), so it seems that there is interest in discussing it further.

@alexcrichton
Copy link
Member

We discussed this at a meeting today, and we're going to close this for now. We may be able to revisit this in the near future if the need becomes pressing again, but at this time we think that closing is the best way to go.

@brendanzab
Copy link
Member Author

Cool, thanks for the info Alex!

withoutboats pushed a commit to withoutboats/rfcs that referenced this pull request Jan 15, 2017
Fix period in the middle of a sentence in TUTORIAL.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.