Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec: Define fixed behavior with null/empty arrays #6707

Closed
stephentoub opened this issue Nov 11, 2015 · 8 comments
Closed

Spec: Define fixed behavior with null/empty arrays #6707

stephentoub opened this issue Nov 11, 2015 · 8 comments
Assignees
Labels
4 - In Review A fix for the issue is submitted for review. Area-Language Design

Comments

@stephentoub
Copy link
Member

The C# spec section 18.6 about the fixed statement has the following sentence:

The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.

I propose we make this behavior well-defined rather than leaving it up to the implementation.

Both the previous native C# compiler and Roslyn set the pointer to null for a null or empty array, e.g. with Roslyn this program prints out 0:

class Test {
    static unsafe void Main() {
        byte[] arr = new byte[0];
        fixed (byte* p = arr) {
            System.Console.WriteLine((long)p);
        }
    }
}

with IL that does the check for null/empty explicitly:

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 2
    .locals init (
        [0] uint8& pinned p,
        [1] uint8[] buffer)
    L_0000: ldc.i4.0 
    L_0001: newarr uint8
    L_0006: dup 
    L_0007: stloc.1 
    L_0008: brfalse.s L_000f
    L_000a: ldloc.1 
    L_000b: ldlen 
    L_000c: conv.i4 
    L_000d: brtrue.s L_0014
    L_000f: ldc.i4.0 
    L_0010: conv.u
    L_0011: stloc.0 
    L_0012: br.s L_001c
    L_0014: ldloc.1 
    L_0015: ldc.i4.0 
    L_0016: ldelema uint8
    L_001b: stloc.0 
    L_001c: ldloc.0 
    L_001d: conv.i 
    L_001e: conv.u8 
    L_001f: call void [mscorlib]System.Console::WriteLine(int64)
    L_0024: ldc.i4.0 
    L_0025: conv.u 
    L_0026: stloc.0 
    L_0027: ret 
}

The Mono mcs compiler looks to have the same behavior, using a null pointer for a null or empty array:

.method private hidebysig static void Main() cil managed
{
    .entrypoint
    .maxstack 2
    .locals init (
        [0] uint8[] buffer,
        [1] uint8* pinned numPtr)
    L_0000: ldc.i4.0 
    L_0001: newarr uint8
    L_0006: stloc.0 
    L_0007: ldloc.0 
    L_0008: brfalse L_0015
    L_000d: ldloc.0 
    L_000e: ldlen 
    L_000f: conv.i4
    L_0010: brtrue L_001c
    L_0015: ldc.i4.0 
    L_0016: conv.u
    L_0017: br L_0023
    L_001c: ldloc.0 
    L_001d: ldc.i4.0 
    L_001e: ldelema uint8
    L_0023: stloc.1 
    L_0024: ldloc.1 
    L_0025: conv.u8 
    L_0026: call void [mscorlib]System.Console::WriteLine(int64)
    L_002b: ldc.i4.0 
    L_002c: conv.u 
    L_002d: stloc.1 
    L_002e: ret 
}

However, because the spec states that the behavior isn't well-defined, we can't rely on it in code we write, and as such there are many instances of code actively defending against the possibility, doing a variety of awkward workarounds, e.g.

  • Using GCHandle.Alloc(arr, GCHandleType.Pinned) with AddressOfPinnedObject.
  • Explicitly checking for empty and then having two different code paths, one for empty, one not.
  • Explicitly checking for empty and then substituting a cached, length-1 array that gets fixed.
  • Etc.
    All of these lead to more expensive and more complicated code, none of which is necessary due to the compiler(s) already doing the right thing.

The spec language currently reads:

An expression of an array-type with elements of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, the initializer computes the address of the first element in the array, and the entire array is guaranteed to remain at a fixed address for the duration of the fixed statement. The behavior of the fixed statement is implementation-defined if the array expression is null or if the array has zero elements.

I propose we change it to:

An expression of an array-type with elements of an unmanaged type T, provided the type T* is implicitly convertible to the pointer type given in the fixed statement. In this case, if the array has one or more elements, the initializer computes the address of the first element in the array, and the entire array is guaranteed to remain at a fixed address for the duration of the fixed statement. If the array expression is null or if the array has zero elements, the initializer computes an address equal to zero.

cc: @MadsTorgersen, @jaredpar

@whoisj
Copy link

whoisj commented Nov 11, 2015

👍

@bondsbw
Copy link

bondsbw commented Nov 12, 2015

Agreed, and perhaps cleaning up other implementation-defined behaviors is warranted as well.

@GSPP
Copy link

GSPP commented Nov 15, 2015

Is it a good idea to make the pointer zero for an empty array? This does seem like a pointless special case. Does it serve any purpose or is it for compat only?

@axel-habermaier
Copy link
Contributor

@GSPP: I guess the reason is that zero-sized arrays naturally map to NULL in C/C++ which don't support them.

In fact, I have often relied on this behavior, not knowing that it is undefined.

@stephentoub
Copy link
Member Author

There's definitely the compat issue, as folks have relied on this behavior.

But it's actually useful behavior... not necessarily the part about the address being null, but at least that it's valid to use an empty array with fixed and that you get some address (and as long as you're getting some address, null is a good choice). Code along the lines of the following (I'm simplifying) is very common:

[DllImport("SomeNativeLibrary")]
private static extern unsafe void SomeFunction(byte* bytes, int bytesLength);
...
unsafe void UseSomeFunction(byte[] bytes)
{
    fixed (byte* ptr = bytes)
    {
        SomeFunction(ptr, bytes.Length);
    }
}

Given an arbitrary bytes array, with how fixed is defined in C# today, this is invalid code. Instead developers end up writing code like the following (if they even realize the above is incorrect):

unsafe void UseSomeFunction(byte[] bytes)
{
    if (bytes.Length > 0)
    {
        fixed (byte* ptr = bytes)
        {
            SomeFunction(ptr, bytes.Length);
        }
    }
    else
    {
        SomeFunction(null, 0);
    }
}

which often ends up duplicating much more code than I have here, or:

unsafe void UseSomeFunction(byte[] bytes)
{
    GCHandle handle = GCHandle.Alloc(bytes, GCHandleType.Pinned);
    try
    {
        IntPtr ptr = handle.AddressOfPinnedObject();
        SomeFunction((byte*)ptr, bytes.Length);
    }
    finally
    {
        handle.Free();
    }
}

which is both more complicated and measureably more expensive, or:

private static readonly byte[] s_nonEmptyArray = new byte[1];

unsafe void UseSomeFunction(byte[] bytes)
{
    byte[] tmp = bytes.Length > 0 ? bytes : s_nonEmptyArray;
    fixed (byte* ptr = tmp)
    {
        SomeFunction(ptr, bytes.Length);
    }
}

which is similarly more complicated with overheads. Etc.

All of that can be avoided with the C# compiler doing what it already does.

@GSPP
Copy link

GSPP commented Nov 16, 2015

@stephentoub yeah, much better to define it. The undefinedness is pointless in this case.

I just realize now that it would be hard for the compiler to obtain some address from the array in case it is zero length because ldelema throws in that case. I guess it could use 1... which is horrible. So that resolves my question, thanks.

@jaredpar
Copy link
Member

👍

I agree it's much better to have defined behavior here. Especially since all the compilers already handle this case in the same fashion.

@gafter gafter added 2 - Ready 4 - In Review A fix for the issue is submitted for review. and removed 2 - Ready labels Nov 21, 2015
@gafter
Copy link
Member

gafter commented Nov 21, 2015

This is already implemented; we just need to update the spec ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - In Review A fix for the issue is submitted for review. Area-Language Design
Projects
None yet
Development

No branches or pull requests

8 participants