-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify and nest structs and enums #24
Conversation
@nick29581 you didn't modify the template |
Another alternative to RFC rust-lang#5 and an extension/variant of RFC rust-lang#11. Unify enums and structs by allowing enums to have fields, and structs to have variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum pointers. Remove struct variants. Treat enum variants as first class. Possibly remove nullary structs and tuple structs.
Whoops! Thanks @cmr , fixed now. |
Not sure if this is the right place to express this, but I really think our sum types should use a different keyword to C-style union Foo<T, U> {
A(T, U),
B { t: T },
C,
} enum Foo: c_uint {
A = 0x01,
B,
C = 0x10,
D,
} Whilst the intention behind using Below is an exchange on #ada I had a couple of days ago. Whilst I think the person in question was being overly antagonistic and somewhat close minded, it is a good example of the constant confusion I am greeted with when outsiders first encounter Rust's
Edit: Perhaps I should make a separate RFC for this - sorry if I am derailing things. |
On first (and second) read-through, it sits really unwell with me, and the amount of open questions is worrying. I feel this is a drastic increase in complexity for a feature that honestly should be rare. |
@cmr Yeah, I am rather confused by the RFC :/ I understand if structs and enums were unified because structs are basically just single-variant enums, but I am still unclear about the motivation for this specific proposal. |
@bjz I agree that the name "enum" is not great for full sum types (since almost everybody else is using that name for (roughly) sums of unit types), but that discussion does not belong here. I think you should write another RFC. |
@SimonSapin Yeah, sorry. Glad to hear that at least somebody feels the same way though. |
@cmr - yeah, it is a pretty big change, and the change is complex, but I'm not sure it adds complexity - it certainly removes features from the language and in that way removes complexity. Although I can see that it does make enums harder to grok. The number of open question is indicative that it is a big change and I wanted to get early feedback (I could have sat on this for a week and not had any questions, but I don't think that is a good approach). I hope that as well as addressing the inheritance motivation we also address the motivation for refinement types, so we are killing more than one bird here. Also, I think that if it is not used a lot, that is more motivation to fit into existing structures, which I think this does (somewhat) rather than adding new structures, even if the new ones are simpler. |
This only partially fulfills refinement types though, doesn't it? You can On Sun, Mar 30, 2014 at 7:15 PM, Nick Cameron [email protected]:
|
@cmr - no, by using nested enums you can specify a subset, albeit only subsets specified when defining the enum, not any subset (so it is not a total replacement, but I hope can be used to satisfy the common case). |
I think this is better than #5, but I must say I like my own #11 better, which shares the core ideas of using enums, having first class variants, etc. An issue I see in this RFC (and not in #11) is that structs can be both instantiated and inherited. This makes the language less expressive because there is no easy way to distinguish the types "exactly struct S1" and "struct S1 or any derived struct". My proposal in #11 is to instead make structs non-inheritable, which means that one has to create an enum with an empty variant instead of a struct with a struct variant like in RFC, which allows to natrually distinguish the types above (the former is denoted by the empty variant name, and the latter with the enum name). The other major difference with #11 is that this RFC uses virtual methods, and allows overriding non-abstract methods while #11 exclusively uses traits for inheritance, and only allows overriding abstract methods. To sum it up, the idea of #11 is that traits can be implemented using the "impl as match" syntax, which means "derive a trait implementation whose methods match on all variants and call the corresponding method in the impl for the variant" (where the match is likely implemented as vtable dispatch) or explicitly, but you cannot implement a trait explicitly if a base enum also implements it explicitly. I think that leads to an easier to understand and cleaner language, because it forces to give a names to sets of virtual methods, unifies virtual dispatch and enum matching, allows external implementations of virtual methods, and by only allowing to override abstract functions, makes method lookup far simpler. A key insight in this area is that the compiler can convert a match in the same crate of the type into a virtual method dispatch by extracting match arms into functions and assigning a vtable slot, and viceversa can implement virtual functions by matching on type tags, and aside from external ABI interoperability constraints, it is in fact an implementation detail to decide which to use; thus, we should unify those notions. |
@bill-myers re inheritable structs - you don't need a struct with a struct variant, just a struct - in fact struct variants would disappear. We could tweak this so that structs were not inheritable, but I think there is value in being able to instantiate non-leaf 'classes' in an inheritance hierarchy - we definitely need this for the DOM. In fact, if we take #11, I think we would have to change this. re virtual methods, again, I think it is a requirement to allow overriding of non-abstract methods. The rest is just a different syntax really. I'm not really sure if involving traits is an advantage or disadvantage - in particular, it is not clear to me where we would get thin pointers and where fat pointers. I think it is important for it to be clear from the syntax when you fall off a fast path. I agree that match and virtual dispatch are the same from the implementation point of view. I toyed with the idea of only doing dispatch via match, but I think the syntax would be cumbersome. Unless I misunderstand your proposal, #11 does not really unify since you still have separate match statements and |
To put it with an example, regarding struct inheritance:
Is equivalent to:
So you don't need to be able to override structs. But in the former syntax there is no way to distinguish between "exactly A" and "A or B", making the language less expressive. In the latter syntax, the former is called AStruct, and the latter is called A. Now of course you could introduce syntax like "&struct A" to make the distinction, but that complicates the language unnecessarily. That's why I think allowing to inherit structs is bad. |
Regarding overriding methods, the pseudocode:
Is equivalent to:
So there is no need to allow overriding non-abstract methods. In the first snippet, calling "foo" on B could technically refer to both the "foo" on A and the "foo" on B and you now need an explicit notion of virtual dispatch to distinguish between them, while in the second it can only refer to the "foo" on B because the one on "A" is abstract. The implication here is that a human reader cannot get confused and think that the "foo" on A is being called rather than the "foo" on B, because the one on A is abstract. Plus, you need an "override" keyword and concept. Calling the version of foo in A from B is easily done with "a_foo" in the second snippet without needing to introduce "super.foo()" or "A::foo()". The second snippet is more verbose, but one could add some syntax sugar to make it less verbose (namely, allowing to implement A1 and A2 at once). This is to some extent a matter of taste, but I think the second snippet makes a simpler language and fits more with current Rust. |
"impl as match" is proposed to be syntax sugar for implementing each method by doing a match on all variants and calling the corresponding method, plus the exception allowing you to override a trait implemented as "impl as match". I must say I don't like the exception, but I'm not sure how to do it otherwise; the idea is that the exception is fine, because "impl as match" guarantees that there is no difference between calling the function on the parent or on the derived class, since the one in the parent just redirects using match to the one on the derived class. I suppose we could instead specify that the compiler detects when a trait is implemented using a straight redirecting match, and treats it as "impl as match", although that's not so great either. [of course, the idea is that the compiler then optimizes the matches to use a vtable in most cases]
There's no difference: enum pointers are thin, and trait object pointers are fat. The idea of invoking traits is that "virtual methods" are put in a trait instead, which is separately implemented on each variant, and where the implementation on the enum does "virtual dispatch" to the impl for the variant corresponding to the dynamic type (either as a built-in language concept of virtual dispatch, or using "impl as match" syntax sugar). This allows to give a name to sets of virtual methods that must be implemented or overridden together, and makes it naturally possible to define things like "impl as match" that would otherwise have to take raw method names. |
Here is a motivating example for forcing virtual methods to be in traits and not allowing to override them. Let's say you have a web browser with an object hierarchy that supports renderToOpenGL and renderToPixmap, which are supposed to render the same image, but one as an OpenGL texture, and the other as an array of bytes. You are currently printing by printing the pixmap, but that sucks, so you add a renderToPostscript function, hook it so the postscript is sent to the printer, and implement it for a base class. With this RFC, or if you were using C++ or Java, your application now compiles, but it is totally broken, because you forgot to implement renderToPostscript for derived classes, so printing a document now no longer looks the same as the on-screen document (since you are instead overriding renderToPixmap). If instead one were forced to put those methods in a Render trait, then you will be immediately faced with the prospect of changing a trait, and if you do so, all impls will fail to compile until you provide an implementation of the new method. Let's say you decide instead to add a new RenderToPostscript trait and implement it for the base class. If overriding trait impls is allowed, then your application will once again compile, and once again be totally broken, since you forgot to implement it for derived classes. If overriding concrete impls is not allowed, then your implementation will only be for one concrete class, and your program will not compile because you forgot to implement it for the other concrete classes. |
If a method is overridden, we should still be able to call it. C++ uses `::` | ||
syntax to allow this. In the example above we use `Foo::bar(self)` to indicate | ||
static dispatch of an overridden method. I'm not sure if this is currently | ||
valid Rust or if it is the optimal tsolution. But it looks nice to me and we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo tsolution
Hmm, I fear I have missed something here - what I propose adds behaviour for pointer-to-enum that did not previously exist, but I neglected the enum value case. We certainly don't want all the variants of an enum like this to be the same size, so then matching on an enum value could not be supported. That seems bad. I'm not sure if there is a solution. I guess that is a distinction between enums and inheritance, and perhaps makes me feel less bad about the duplication of behaviour there with an approach like #5. |
@nick29581 I haven't followed the entire conversation. I had a hard time understanding the proposal, I fear, but my biggest fear was precisely what you seemed to be hitting on here -- I didn't quite get how the by-value enum case fits in. |
@nick29581 Maybe enums could be fixed size or unsized based on context. A pointer to an enum that doesn't allow the variant to be changed could be unsized. All other instances could be sized. Basically the variant tag is immutable, but the fields inside could be mutable or immutable. |
Interesting to see all this. [1] would any of this facilitate a future enhancement (or unsafe hacks) where immutable enums could be compacted (the tag implies the size of the type, by type-specific lookup; different sized variants can be placed back to back, eg tree leaves and nodes , reducing the number of pointers required to do that sort of thing). of course i can do that now in C or in rust unsafe code. [2] if you lose 'tuple-structs' (i dont mind, tuples and real structs are more useful), are the enum variants still going to be able to look the same .. a tag/variant name and a tuple.. i think those are very handy, even though i haven't wanted to use 'tuple-structs' elsewhere [3] are you going to be declaring actual virtual functions like in C++ classes? it looked to me like you could keep the idea of traits describing vtables, and just use inheritance of structs to say a trait can assume some fields (and vica versa, perhaps inherit or embed the trait in a struct and it would check whats compatable) [4] I likeed the idea of keeping vtables more general , eg adding sugar for accessing them and composing with a struct pointer for a specific call - allowing for layouts and uses beyond whats been formalized in various languages (like 'class-objects' that hold a vtable and metadata applied to a collection of other objects, or using vtable swaps as state-machines.. that could all be done safely if you had propper types for them). I was pleased with the hacks one seemed to be able to do already with transmute. (i saw eddyb's many and had a bash at emulating c++ layout myself) i'm definitely keen on plain struct single inheritance, thats just shortening the paths to the most common data. |
@esummers I don't think that addresses the issue - the problem is with values only (pointer-to-variant is not really an issue). The problem is that some variants might be very small and some very large and we don't want to pad the smaller ones to the size of the larger ones. We must have a size for values to be able to compile, so unsized there isn't an option. |
@dobkeratops 1 - I don't think this would facilitate that, but we would probably need something like that to enable this, that is the downside I noted a few comments up and which I didn't think about initially. 2 - yes, enum variants could still be a name + tuple combo. Struct varitants could still be used, they would just be the same as regular structs, so its not that the idea disappears, only that it is redundant. The syntactic change to a program would just be adding the 3 - Yes, we add virtual fns to impls. We explicitly wanted to avoid traits for this because using traits requires a fat pointer and we want thin ones here. Having this as an optimisation is against Rust's guiding principle of predictable performance. Also, having fields in traits (in any way) further blurs the distinction between data and behaviour. Since we already allow functions (behaviour) for impls for structs, we don't make things worse this way. 4 - this is probably a matter of taste. It is certainly flexible and in some ways elegant. But I am not a fan, I prefer a language to be easier to use and present abstractions for that kind of thing. Having to use unsafe code/transmute for a relatively common and safe use case, seems bad to me. Its not clear to me if you can guarantee the performance characteristics we require that way either, but perhaps you can. |
4- well with just a bit of safe sugar - it wouldn't be an unsafe codepath to do this. Some intrinsic functions..
maybe sleeker syntax is possible (... for .. ) get's the vtable - symetrical with 'impl for..' .... and could any tuple (&St, &VTable<St,T>) be vcallable, '&Ttrait' is just something that coerces to.. I would see adding this type of sugar as leveraging more of rusts' existing character rather than retrofitting a completely different vtable system centred on structs |
@nick29581 I guess I didn't really mean unsized. I meant sized to the variant instead of sized to the enum. A pointer to an enum could be sized to the variant and everything else sized with padding to the largest variant. I think that once we just have a reference we don't care about the other variants because we can never become one of those. Basically the size is statically determined when it is constructed based on the size of the variant (but only when it is a pointer). When using virtual inheritance, you will always pass by reference. Maybe I have a flaw in my reasoning somewhere, but I mean sized to variant when passed by reference. EDIT: I was assuming heap allocations when using virtual inheritance (so size on stack doesn't matter), but maybe that is a bad assumption. |
@esummers - the problematic case is given an enum |
@dobkeratops I'm afraid this is just going to come down to taste. You are right that we can avoid adding a language feature this way, but I don't think it is worth it in exchange for lots of ugly boilerplate all over the place. If I understand your example correctly, you are still passing a tuple - so it is two words per pointer, not one. |
then a temporary 'trait object' is created for a vcall ,by the member function '.as_trait_obj()'. I'm assuming that will inline. (I should add #[inline]). TBAA would cover opt. eddyb's sample is more interesting, it creates a type "Many" along similar lines that has multiple vtable interfaces carried for one struct .. mine could be seen as a special case of that. Well I dont know whats easier to to implement in the compiler, or what would get more demand. I guess people are familiar with C++ behaviour, and virtuals,single-inheritance + traits wouldn't be so different to virtuals+multiplle-inheritance... but this method would keep one vtable concept and make it more versatile |
Regarding sizing, the simplest and default approach should be to have a fixed size like current enums (and thus pad the smaller variants). As an extension, one can add an "unsized" keyword that makes enums unsized (which of course requires to have implemented DST before). However, note that with unsized enums, you must either disallow assigning to an &mut or ~ of an enum, or throw a run-time error if the run-time variant is different (since assignment is impossible if the new variant is of a different size). This is the same restriction that languages like Java or C# have (note that Java or C# also disallow assigning non-overridable classes, which is unnecessary and a bad idea in Rust). You can allow to pass unsized enums by value by padding them, if inheritance is closed; if inheritance is open, then you cannot pass them by value (unless you autobox, but I guess we don't want that). |
Ah, that might be nice. I think we would indeed prevent dereferencing of DST and pointers to struct objects, so that side of things would all work. We would just need to add the keyword as you suggest to indicate the unsized-ness and forbid referring to such values by their enum (as opposed to variant) type. Padding (even for closed inheritance, as here) is a non-starter in general, since some variants might be hugely bigger than others (e.g, in the DOM). |
As a note (which I'll incorporate into the RFC later), JDM pointed out that having all 'classes' in one lexically nested block is impractical. We also need to allow specifying 'classes' in sub-modules (so they can be in different files). Both problems are solvable, but need to be addressed. |
Superseded by #142 |
HTMLBars Bound Attributes
Another alternative to RFC #5 and an extension/variant of RFC #11.
Unify enums and structs by allowing enums to have fields, and structs to have
variants. Allow nested enums/structs. Virtual dispatch of methods on struct/enum
pointers. Remove struct variants. Treat enum variants as first class. Possibly
remove nullary structs and tuple structs.