Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Add bitfields to C structs #3898

Closed
SplittyDev opened this issue Jan 14, 2017 · 41 comments
Closed

[Feature Request] Add bitfields to C structs #3898

SplittyDev opened this issue Jan 14, 2017 · 41 comments

Comments

@SplittyDev
Copy link

SplittyDev commented Jan 14, 2017

Summary

This is an RFC to add bit-fields to structures. This only applies to C structs (i.e. structs defined within a lib).

Motivation

I'm writing an Operating System kernel in Crystal and had to resort to ugly bit twiddling to simulate bit-fields, which significantly reduces readability and maintainability.

I'm aware that this is a very specific use case, but I'm positive that the vast majority of Crystal users, especially those who are used to working with C code, could benefit from this.

Detailed design

Currently, C structs look somewhat like this:

lib C
  struct Foo
    foo : UInt32
    bar : UInt32
  end
  struct Bar
    foo, bar : UInt32
  end
end

Bit-fields could be added in a backward-compatible way by adding another colon after the type, followed by the size in bits. This syntax is currently illegal, and shouldn't be used in any existing codebase, so doing it that way would probably be safe. This approach would look somewhat like the following:

lib C
  struct Foo
    hi : UInt32 : 16
    lo : UInt32 : 16
  end
  struct Bar
    hi, lo : UInt32 : 16
  end
end

Another way would be to prepend the size in bits to the field, followed by a colon and the usual field definition. Since identifiers can't start with a number, this would also currently be considered illegal by the compiler, and also shouldn't break any existing codebase. That approach would look somewhat like the following:

lib C
  struct Foo
    16: hi : UInt32
    16: lo : UInt32
  end
  struct Bar
    16: hi, lo : UInt32
  end
end

I personally prefer the second syntax, because I think that it's more readable, especially when multiple fields are defined on the same line.

Update

@RX14 proposed the following syntax:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    hi : UInt32
    lo : UInt32
  end

  @[Packed]
  struct Bar
    @[BitSize(16)]
    hi, lo : UInt32
  end
end

All of the aforementioned approaches would roughly translate to the following C code:

struct Foo {
  uint32_t hi : 16
  uint32_t lo : 16
} foo_t

Implementation details

Bit-fields should only work with integer primitives for reliability.

The size of the struct should equal the sum of all bits, in bytes (as usual).
That's a little more complicated when using bit-flags and can be confusing if you're not familiar with that concept. The following is an example of how sizes should be calculated:

lib C
  @[Packed]
  struct Foo
    @[BitSize(16)]
    foo_hi, foo_lo : UInt32 # 16 bits + 16 bits = 4 bytes
    bar : UInt32 # 32 bits = 4 bytes
  end # 8 bytes
end
@Papierkorb
Copy link
Contributor

Papierkorb commented Jan 14, 2017

The first notation could work, I don't like the second one. I want to propose a third option, which may look more like a generic type:

lib C
  struct Foo
    foo : Int32(2)
    bar : Int32(2)
  end
  struct Bar
    foo, bar : Int32(2)
  end
end

Many data structures in networking and on-disk files use this to save some bytes here and there. And just like in C, this is not an every-day feature, but one which makes code having to do this so much more readable and safer. Even if none of our notation proposals are accepted, I'd like to see this feature.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@Papierkorb I thought about doing it like that too, but I think there's too much ambiguity between that and actual generics. Especially since numbers are allowed in real generics (e.g. StaticArray(T, N)).
I guess it wouldn't cause any issues since generics aren't allowed in libraries anyway, but still, it'd be easy to confuse that with actual generics.

@Papierkorb
Copy link
Contributor

Especially since numbers are allowed in real generics

Which was exactly my point:

which may look more like a generic type

@SplittyDev
Copy link
Author

@Papierkorb Ah yeah, I see. Well I wouldn't mind either way, I just think it's kinda confusing.

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

Using Int32(2) doesn't make much sense except from c compatibility. It should be BitInt(2).

@SplittyDev
Copy link
Author

@konovod How would BitInt(N) make more sense?
BitInt would be a completely new type with the same functionality as a normal integer. So why add unnecessary complexity?

Neither of field : T : N or N: field : T requires new types.
Also, you really don't wanna access the bit-fields as actual bit arrays. You wanna be able to work with them in the same way as you'd work with a normal integer. That's the whole point of having bit-fields.

@RX14
Copy link
Contributor

RX14 commented Jan 15, 2017

How about making this an attribute like @[Packed]. It's not such a common feature so verbosity should be fine.

@SplittyDev
Copy link
Author

@RX14 sounds good, but wouldn't the attribute be superfluous?
Let's say we have the following struct:

lib C
  @[PackedBits]
  struct Foo
    2 : foo, bar, baz : UInt32
  end
end

Wouldn't just using @[Packed] be sufficient? The compiler should be able to easily detect the presence of a bit-field by just matching against a new rule. So even though an attribute would clearly indicate the presence of bit-fields, I don't know whether it's actually necessary.

@RX14
Copy link
Contributor

RX14 commented Jan 15, 2017

@SplittyDev no, I meant like this:

struct Foo
  @[BitSize(2)]
  foo, bar, baz : UInt32
end

@SplittyDev
Copy link
Author

@RX14 Oh, yeah I think that's a good idea.
Would that cause any issues with the way multiple fields are handled?
I'm not sure how Crystal handles that, but in other languages I've used, attributes only apply to the first element immediately following the attribute, so would it even be possible to make one attribute work with multiple fields?

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

@SplittyDev well, BitInt isn't ideal, but why UInt32? What will change if it is UInt64 or UInt16? And what will change if it will be signed Int32 as in first post? I think the name should show that it is a special field, something like UInt(N), but Int is already used and tells nothing about N is bits, not bytes.
On the other hand i like the idea of feature and any syntax is fine because usages will be somewhere in the low-level implementations, not everyday code.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@konovod There's a reason for that. Consider the following code:

lib C
  @[Packed]
  struct Foo
    @[BitSize(8)]
    high, low : UInt16
  end
end

Now, both high and low are one byte wide, but their type is still UInt16, which is two bytes wide.
The important part here is that sizeof(Foo) is actually equal to sizeof(UInt16).

Without bit-fields you'd do something like that to get the same effect:

struct Foo
  def initialize(@value : UInt16)
  end

  def high
    (@value >> 8) & 0xFF
  end

  def low
    @value & 0xFF
  end
end

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

I know what bitfields are for, i used them. But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32? It is better (both to compiler and to the reader of code) if they will have specific type that has a size of N bits. 16 is a just misleading.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

But in this example what is a point of naming high and low UInt16, not UInt8 or UInt32?

I'm afraid I don't quite understand what you mean by that.
Of course, in that trivial example I could've used two UInt8 fields instead of two UInt16 fields with a width of 8 bits. And in that case it would've also been a lot cleaner to do it that way.

That was only a very basic example though. Here's a more complex one:

lib C
  @[Packed]
  struct Page
    @[BitSize(1)] present, rw, user, accessed, dirty : UInt32
    @[BitSize(7)] unused : UInt32
    @[BitSize(20)] frame : UInt32
  end
end

Which equals the following C code:

struct page {
    uint32_t    present    : 1;
    uint32_t    rw         : 1;
    uint32_t    user       : 1;
    uint32_t    accessed   : 1;
    uint32_t    dirty      : 1;
    uint32_t    unused     : 7;
    uint32_t    frame      : 20;
} __attribute__((packed));

The size of the Page structure equals (5 + 7 + 20) = 32 bits (= 4 bytes).

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

compare with

 lib C
   @[Packed]
   struct Page
     @[BitSize(0x01)] present, rw, user, accessed, dirty : UInt16
     @[BitSize(0x07)] unused : UInt16
     @[BitSize(0x14)] frame : UInt16
   end
 end

the size is still 32 bits = 4 bytes. What have changed for a compiler? Nothing. What UInt16 means in this context - just nothing. So it would be better if it would be some UIntGeneric type.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@konovod By adding a generic type to be used with bit-fields you're basically creating an integer that could hold infinitely many bits. Code like that would break on a wide range of CPU models and architectures. Usually, the max size of a bit-field is limited by its type.

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

@SplittyDev Well, that depends on implementation. It can e.g. be always 32 (64?) bits, or automatically widens to a minimal size that can hold given number of bits (and fails to compile if @[BitSize(65)]).

@SplittyDev
Copy link
Author

@konovod In that case you'd still have the problem that you couldn't work with the generic integer easily. What if a function expects an Int32 and you pass it an UIntGeneric? I think adding something like that just overcomplicates everything.

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

@SplittyDev such problem already exists - if something expects Int16 and you pass 0 there it fails because you must type 0i16. I hope this will be solved eventually and you will be able to pass any integer to function that expects integer without unnecessary conversions. Anyway, right now you still have to convert Int16 to pass it to function that expects Int32, i don't think that converting an UIntGeneric too will add much troubles.

@RX14
Copy link
Contributor

RX14 commented Jan 15, 2017

I guess we could fully embrace LLVM's arbitrary-sized integers with stuff like UInt(7).

@SplittyDev
Copy link
Author

@konovod Crystal doesn't support implicit type casts. Why would you add a type that you'd have to convert to another type every time you wanna use it, instead of just using the actual type?

I just don't see how that would be helpful, at all.

@SplittyDev
Copy link
Author

@RX14 yeah that would be possible. Although I still don't get how one would benefit from doing it that way.

@RX14
Copy link
Contributor

RX14 commented Jan 15, 2017

@SplittyDev well i'm guessing that the underlying struct will be represented as such in the IR. Even on clang.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@RX14 it probably will.

But I think the compiler should handle this, and not the user.
If the user wants the type to behave like a UInt16, then it should behave that way, even if the actual width (sign + value bits) is less than the size of the type.

C does it that way for a reason, and I think that it would be better to be consistent with C there.

@RX14
Copy link
Contributor

RX14 commented Jan 15, 2017

@SplittyDev looks like not actually: https://godbolt.org/g/KVQTxJ

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

@SplittyDev Behavior will be same as Int16\Int8\Float32 - you have to convert it if you want to pass it somewhere where another type is expected. Completely consistent with the rest of language. And even if you declare the field as UInt32 you will have to use explicit conversion to e.g. write there a 0 or 1 or do anything useful.
Sign is another problem btw - i think signed bitfields should be just forbidden. They can create a lot of unexpected behavior and I can't imagine valid uses for them.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@konovod I've updated the 'implementation details' section of the RFC. Only allowing unsigned integers seems sensible.

@RX14, that's very interesting. I'm not very good at reading LLVM code, but it looks like this just allocates a chunk of memory big enough to hold the struct, and uses bitwise arithmetic to do the trick.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@konovod actually, thinking about it, I don't think that allowing signed values would cause any issues.

@konovod
Copy link
Contributor

konovod commented Jan 15, 2017

@SplittyDev after some thoughts i agree, they aren't such evil.

@asterite
Copy link
Member

asterite commented Jan 15, 2017

I'm pretty sure it's possible to write a macro that allows bitfield access to a property. Something like:

@[Extern]
struct Foo
  bitfields @raw : Int32, [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

The bitfields macro would define getters and setters, and using bit manipulation would set/get values through the underlying field.

@ysbaddaden
Copy link
Contributor

I was thinking the same as @asterite.

We can move the struct out of lib, adding the @[Extern] flag so it continues to behave as a C struct we can now add methods, thus add getters and setters abstractions. Not as declarative as C bitfields, more expressive, but not particularly harder to read or write.

struct Foo
  @foo : Int32

  def hi
    @foo >> 4
  end

  def lo
    @foo & 0xffff
  end
end

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@asterite, @ysbaddaden Let's say I have a struct with a size of 128 bits. I'd need two UInt64 fields to represent that and the macro to support that would quickly become extremely complicated. Also, what about packing? Would the struct really look like a C struct in memory?

Imho, a macro is only a temporary solution. Having support for that in the compiler would be a permanent solution and I'm sure that it's gonna be useful for a lot of low-level developers.

@asterite
Copy link
Member

asterite commented Jan 15, 2017

@SplittyDev You can mark a struct with @[Packed] to make it packed. It's possible to do it with arbitrary sizes, my macro was just an example of a possible interface. For example we could have this:

@[Extern]
@[Packed]
struct Foo
  bitfields [
    hi: {Int32, 16},
    low: {Int32, 16},
  ]
end

Here we don't specify the backing field, it can probably be generate by the macro (or we can pass a name for that). Then for the total size we can use UInt8[total_size] and read/write from that.

I personally think that a general solution that doesn't need to modify the core language, if readable/usable enough, is always superior to modifying the language, specially for functionality that is very rarely needed.

@SplittyDev
Copy link
Author

SplittyDev commented Jan 15, 2017

@asterite That would only work for getting and setting aligned bits though.
Consider this:

struct {
  uint32_t a : 3
  uint32_t b : 3
  uint32_t c : 2
} foo_t;

Something like that would be completely impossible with macros.
Update: Forget what I said, it's wrong.

@asterite
Copy link
Member

@SplittyDev I don't understand why

@SplittyDev
Copy link
Author

Oh, I'm sorry @asterite I wrote that from my mobile and didn't pay enough attention. What I said is wrong.

@bew
Copy link
Contributor

bew commented May 6, 2018

Is there something to do here? Do we define a macro in the stdlib or do leave this to a shard?

@Sija
Copy link
Contributor

Sija commented May 6, 2018

That might be solved by using newly introduced (#6063) Annotation type (if merged).

@elorest
Copy link
Contributor

elorest commented Dec 9, 2018

Here's something I wrote inspired by @asterite's suggestion above. It's not actually storing the data in compressed bytes like in C but allows for reading in bitfields and outputting them again with changed values. This is what I needed for my use case of reading in 416 bits with random sized values. I'd appreciate any useful suggestions.

https://github.com/elorest/bitfields

@jhass
Copy link
Member

jhass commented May 22, 2019

Anybody here opposed to closing this? It seems like the usecases can be solved through a shard, which we can consider to include into the standard library if it becomes really popular or otherwise necessary.

@asterite
Copy link
Member

Yes, let's close this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests