Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CT sizeof(T) is wrong if T or any member is empty object #13945

Closed
timotheecour opened this issue Apr 10, 2020 · 5 comments · Fixed by #24469
Closed

CT sizeof(T) is wrong if T or any member is empty object #13945

timotheecour opened this issue Apr 10, 2020 · 5 comments · Fixed by #24469

Comments

@timotheecour
Copy link
Member

timotheecour commented Apr 10, 2020

CT sizeof(T) is wrong if T or any member is empty object.

  • empty objects are (kind of) illegal in C, but made legal using some common C extension; and the size of an empty C object is 0
  • empty objects are legal in C++, and the size is (typically) 1; rationale is discussed in many places, see for eg http://www.stroustrup.com/bs_faq2.html#sizeof-empty and http://www.cantrip.org/emptyopt.html
  • in nim, codegen inserts a dummy char, make the object non-empty (for both C and C++)
  • but CT sizeof doesn't take that into account currently, and as a result, is wrong for both C and C++ targets
  • this is also the case if T is any type that contains an empty object as a member (at any depth)
  • empty objects can definitely occur in Nim, eg:
type Foo = object
  when bar:
    x: int

Example

{.emit:"""
#define c_astToStrImpl(T) #T
"""
.}

import macros
import strutils

template c_astToStr(T: typedesc): string =
  block:
    # proc needed pending https://github.com/nim-lang/Nim/issues/13943 D20200409T215527
    proc fun2(): string =
      var s: cstring
      {.emit:[s, "= c_astToStrImpl(", T, ");"].}
      $s
    fun2()

template c_sizeof(T: typedesc): int =
  block:
    # proc needed pending https://github.com/nim-lang/Nim/issues/13943 D20200409T215527
    proc fun2(): int =
      {.emit:[result," = sizeof(", T, ");"].}
    fun2()

macro test(body: untyped): untyped =
  result = newStmtList()
  for T in body:
    result.add quote do:
      let s1 = `T`.sizeof
      let s2 = `T`.c_sizeof
      if s1 != s2:
        echo "$1 $2 => sizeof: $3 vs c_sizeof: $4" % [$astToStr(`T`), c_astToStr(`T`), $s1, $s2]

type FooEmpty = object
const N = 10
type FooEmptyArr = array[N, FooEmpty]
type Bar = object
  x: FooEmpty
  y: cint

type BarTup = (FooEmpty, cint)
type BarTup2 = (().type, cint)

test:
  FooEmpty
  FooEmptyArr
  Bar
  BarTup
  BarTup2

Current Output

for both C and C++:

FooEmpty tyObject_FooEmpty__kXU9cynb9cneQT6vnueBAtBQ => sizeof: 0 vs c_sizeof: 1
FooEmptyArr tyArray__KKWuQkywUut4wZvNvkBRGQ => sizeof: 0 vs c_sizeof: 10
Bar tyObject_Bar__xS1Fg6ksZzbGcsYohVTKiA => sizeof: 4 vs c_sizeof: 8
BarTup tyTuple__1T8p2WwWr9crLY8GYQT9akBA => sizeof: 4 vs c_sizeof: 8
BarTup2 tyTuple__XGGg1mqyToR7coeXox9aGQA => sizeof: 4 vs c_sizeof: 8

Expected Output

sizeof should match c_sizeof

Possible Solution

  • cgtypes.nim inserts result.addf("char dummy;$n", []) for empty objects, but CT sizeof doesn't take that into account
  • inserting dummy is probably not a good idea because it prevents optimization for empty base class: see http://www.stroustrup.com/bs_faq2.html#sizeof-empty

There is an interesting rule that says that an empty base class need not be represented by a separate byte:
This optimization is safe and can be most useful. It allows a programmer to use empty classes to represent very simple concepts without overhead. Some current compilers provide this "empty base class optimization".

See also http://www.cantrip.org/emptyopt.html which explains how stdlib is implemented to eliminate the corresponding bloat

so probably this dummy should only be inserted for C backend, not C++ backend.

  • if possible, cgen shouldn't even emit empty structs in codegen when safe to do so; but this may not be always possible

Additional Information

@krux02
Copy link
Contributor

krux02 commented Apr 10, 2020

Here is some information that is important when working on a solution here. In C an empty struct has size 0 and in C++ it has size 1. I can only imagine this to cause problems at some point.

#include <stdio.h>

struct MyEmptyStruct{};

int main() {
  printf("%d\n", sizeof(struct MyEmptyStruct)); // output: 0 in C but 1 in C++
}

Maybe it is best to set the compile time size of an empty object to unknown for Nim. As far as I know they are useful for wrapper code only, and here they are already marked as unknown in size by the importc flag (I think that is the reason why they didn't cause any problem yet). It will also allow to play with the dummy field (generate it, don't generate it), without risking to break anything.

@timotheecour
Copy link
Member Author

Maybe it is best to set the compile time size of an empty object to unknown for Nim

that's too pessimistic (and propagates to any containing type that have Empty as a sub-member). We should be able to infer sizeof with 100% reliability in this case (even if it means a when defined(cpp): 1 else: 0)

As far as I know they are useful for wrapper code only

no, see examples I showed above, they don't even involve importc.

This issue shouldnt' be hard to fix, IMO.

@krux02
Copy link
Contributor

krux02 commented Apr 10, 2020

that's too pessimistic (and propagates to any containing type that have Empty as a sub-member). We should be able to infer sizeof with 100% reliability in this case (even if it means a when defined(cpp): 1 else: 0).

Well, I looked it up again. An empty struct in C seems to be undefined behavior. In this case what gcc does isn't necessary what other C compilers do here. So it is correct that the Nim compiler injects a dummy char. In C++ though the size of an empty struct is defined to be 1 byte at least.

I prefer a pessimistic compiler that is correct over a compiler that always has an answer that isn't always correct.

@krux02
Copy link
Contributor

krux02 commented Apr 10, 2020

Another thing that should be mentioned. Inheritance is done differently for C and C++ backends. In C++ inheritance maps to C++ inheritance. In C inheritance puts a value of the inherited type at the top level of the object. This has an affect on the size of the object. Make sure you test inheriting from an empty object in both C and C++, if you implement a fix for this.

Or as you linked this optimization of empty structs causes a big mess and special cases in computing sizes of objects:

#include <stdio.h>
#include <stdint.h>

struct BaseA {
  uint8_t dummy;
};

struct BaseB {
};

struct InheritanceA: public BaseA {
   int64_t data;
};

struct InheritanceB: public BaseB {
   int64_t data;
};

int main() {
  printf("%d\n", sizeof(BaseA));         // 1
  printf("%d\n", sizeof(BaseB));         // 1
  printf("%d\n", sizeof(InheritanceA));  // 16
  printf("%d\n", sizeof(InheritanceB));  // 8
}

Even though InheritanceA and InheritanceB share the same members, and their base class has the exact same size and alignment, They have in the end different sizes at the end. After all I don't even know if this optimization is a requirement in C++ compilers or not. Maybe some c++ won't do the optimization and InheritanceB will have 16 bytes as well.

My proposed solution is still, set the size of empty objects to unknown for the Nim compiler and let the C and C++ compilers do whatever they want to do. And yes, stop emitting this dummy field for the C++ backend.

@metagn
Copy link
Collaborator

metagn commented Nov 3, 2024

Works since 1.4.0

ringabout added a commit that referenced this issue Nov 22, 2024
Araq pushed a commit that referenced this issue Nov 23, 2024
narimiran pushed a commit that referenced this issue Jan 14, 2025
closes #13945

(cherry picked from commit af3181e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants