Add Encoding.GetBytes(string, offset, count) #19574

bartonjs · 2016-12-07T00:37:44Z

The Encoding class has methods for encoding the middle of a char[], but not for the middle of a string. A caller is forced to switch to unsafe (char*) or to re-allocate as char[].

    public abstract partial class Encoding : System.ICloneable
    {
...
        public unsafe virtual int GetByteCount(char* chars, int count) { throw null; }
        public virtual int GetByteCount(char[] chars) { throw null; }
        public abstract int GetByteCount(char[] chars, int index, int count);
        public virtual int GetByteCount(string s) { throw null; }
+       public int GetByteCount(string s, int index, int count) { throw null; }
...
        public unsafe virtual int GetBytes(char* chars, int charCount, byte* bytes, int byteCount) { throw null; }
        public virtual byte[] GetBytes(char[] chars) { throw null; }
        public virtual byte[] GetBytes(char[] chars, int index, int count) { throw null; }
        public abstract int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex);
        public virtual byte[] GetBytes(string s) { throw null; }
+       public byte[] GetBytes(string s, int index, int count) { throw null; }
        public virtual int GetBytes(string s, int charIndex, int charCount, byte[] bytes, int byteIndex) { throw null; }
...
}

The GetBytes(string, int, int) method would allow for the typical "just give me a big enough array" while the GetByteCount(string, int, int) method would allow for ensuring that an existing buffer is sufficiently big to call the existing GetBytes(string, int, int, byte[], int) method.

The text was updated successfully, but these errors were encountered:

tarekgh · 2016-12-07T00:47:01Z

@bartonjs
we already have

public virtual int GetBytes(String s, int charIndex, int charCount,byte[] bytes, int byteIndex)

I think this address your request here. but it may make sense expose the GetByteCount version you suggested

bartonjs · 2016-12-07T06:13:55Z

No, because you don't know how big the buffer needs to be without GetByteCount. And the version that just returns a byte[] makes the char[] and string versions have the same shapes.

svick · 2016-12-07T12:10:52Z

Should this wait for Span<T> (https://github.com/dotnet/corefx/issues/13892)? With that, the API would look something like:

public virtual byte[] GetBytes(ReadOnlySpan<char> span);

And it would cover the existing use cases of string, char[], char[], int, int, char*, int as well as the use case in question here: string, int, int.

tarekgh · 2016-12-07T16:53:55Z

@svick that make sense to wait for Span.

karelz · 2016-12-07T17:43:25Z

Chatted with @bartonjs - this one is higher priority as there is not good safe alternative. If there's plan to have Span for 1.2, we could wait, otherwise we might add this API.
Let's discuss the Span timeline and our plans in API review meeting next week (keeping 'api-ready-for-review').

tarekgh · 2016-12-07T17:59:04Z

I am not seeing this one is urgent to have so I believe we can just wait for Span. why we are seeing this high priority. for me it is still nice to have feature as there is different ways achieve same results.

bartonjs · 2016-12-07T18:05:13Z

If there's a concrete plan that adds Span to Encoding, this is closable.

If there's no concrete plan, this will allow already existing over-allocating code to be cleaned up.

tarekgh · 2016-12-07T18:13:43Z

@bartonjs right, then we can wait to figure the Span in general. my point is we don't have to rush this as I am not seeing any urgency for it. cleaning up the code is nice but it is not urgent.

@svick any idea who is looking at Span in general or who is following up?

tarekgh · 2016-12-07T18:23:06Z

talked offline with @bartonjs and I am fine if we can take this and discuss it with the design reviewers.

terrajobst · 2016-12-14T19:41:16Z

We shouldn't block API additions on Span<T>. For one, we don't know when it will ship stable. Secondly, we don't know which assembly it's in, and thirdly, even if had Span<T> today, we might still want to add the string version because it represents what the customer actually wants to do, i.e. encode a substring. If anything, we could talk about a StringSegment type :-)

We think API looks fine as proposed.

AlexRadch · 2016-12-15T15:33:07Z

@karelz I am working on this issue.

tarekgh · 2016-12-15T17:17:46Z

thanks @AlexRadch

please include me in your PR.

AlexRadch · 2016-12-15T17:56:03Z

@bartonjs methods should be virtual or not?

AlexRadch · 2016-12-16T07:00:27Z

@karelz Can you answer, methods should be virtual or not? Both reviewers @jkotas, @tarekgh are thinking that methods should NOT be virtual and I do not see any disadvantage to make them non virtual.

karelz · 2016-12-16T18:08:49Z

The answer is driven in the PR: dotnet/coreclr#8651 (comment)
We should update the API proposal here once decision is made ...

karelz · 2016-12-18T02:38:17Z

For the record: The API proposal was updated by @tarekgh - see dotnet/coreclr#8651 (comment).

bartonjs · 2016-12-19T19:41:26Z

Reopening the issue since it needs a ref change and new tests in corefx.

tarekgh · 2017-01-04T01:33:28Z

@AlexRadch , did you have a chance to work on exposing the APIs in the corefx side? thanks.

karelz · 2017-01-29T06:10:15Z

No activity for 1.5 months, unassigning - the issue is back "up for grabs", for anyone to pick it up.
Next steps: Exposed the API in CoreFX and add tests.

hughbe · 2017-03-05T04:57:27Z

Looks like this is already implemented in coreclr/corert. I'll grab exposing and testing this

tarekgh · 2017-03-05T18:52:26Z

@hughbe this is great.

karelz · 2017-03-06T19:37:27Z

Assigning to @hughbe ...

hughbe · 2017-03-07T13:56:51Z

This is technically source breaking right?

var encoding = new UTF8Encoding();
encoding.GetByteCount(null, 0, 0)

String and char[] can both be assigned from null, so this is now ambiguous and fails to compile

JonHanna · 2017-03-07T14:36:57Z

It's only ambiguous with a literal null, and it throws ANE when null is passed to it, so the only code that will be broken is code that would always throw. Source-breaking that code is doing the author a favour.

hughbe · 2017-03-07T16:33:41Z

Yup as I thought - it's unlikely

tarekgh · 2017-03-07T16:57:51Z

we already has the same case before so this is not really something new. for example we have Encoding.GetBytes(char[]) and Encoding.GetBytes(string). I am not really worried about the null case (without casts).

bartonjs assigned AlexRadch Dec 15, 2016

jkotas closed this as completed Dec 19, 2016

bartonjs reopened this Dec 19, 2016

karelz unassigned AlexRadch Jan 29, 2017

karelz assigned hughbe Mar 6, 2017

tarekgh closed this as completed in dotnet/corefx#16810 Mar 9, 2017

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 2.0.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Encoding.GetBytes(string, offset, count) #19574

Add Encoding.GetBytes(string, offset, count) #19574

bartonjs commented Dec 7, 2016

tarekgh commented Dec 7, 2016 •

edited by terrajobst

Loading

bartonjs commented Dec 7, 2016

svick commented Dec 7, 2016

tarekgh commented Dec 7, 2016

karelz commented Dec 7, 2016

tarekgh commented Dec 7, 2016

bartonjs commented Dec 7, 2016

tarekgh commented Dec 7, 2016

tarekgh commented Dec 7, 2016

terrajobst commented Dec 14, 2016

AlexRadch commented Dec 15, 2016

tarekgh commented Dec 15, 2016

AlexRadch commented Dec 15, 2016

AlexRadch commented Dec 16, 2016 •

edited

Loading

karelz commented Dec 16, 2016 •

edited

Loading

karelz commented Dec 18, 2016

bartonjs commented Dec 19, 2016

tarekgh commented Jan 4, 2017

karelz commented Jan 29, 2017

hughbe commented Mar 5, 2017

tarekgh commented Mar 5, 2017 •

edited

Loading

karelz commented Mar 6, 2017

hughbe commented Mar 7, 2017 •

edited by karelz

Loading

JonHanna commented Mar 7, 2017

hughbe commented Mar 7, 2017

tarekgh commented Mar 7, 2017

Add Encoding.GetBytes(string, offset, count) #19574

Add Encoding.GetBytes(string, offset, count) #19574

Comments

bartonjs commented Dec 7, 2016

tarekgh commented Dec 7, 2016 • edited by terrajobst Loading

bartonjs commented Dec 7, 2016

svick commented Dec 7, 2016

tarekgh commented Dec 7, 2016

karelz commented Dec 7, 2016

tarekgh commented Dec 7, 2016

bartonjs commented Dec 7, 2016

tarekgh commented Dec 7, 2016

tarekgh commented Dec 7, 2016

terrajobst commented Dec 14, 2016

AlexRadch commented Dec 15, 2016

tarekgh commented Dec 15, 2016

AlexRadch commented Dec 15, 2016

AlexRadch commented Dec 16, 2016 • edited Loading

karelz commented Dec 16, 2016 • edited Loading

karelz commented Dec 18, 2016

bartonjs commented Dec 19, 2016

tarekgh commented Jan 4, 2017

karelz commented Jan 29, 2017

hughbe commented Mar 5, 2017

tarekgh commented Mar 5, 2017 • edited Loading

karelz commented Mar 6, 2017

hughbe commented Mar 7, 2017 • edited by karelz Loading

JonHanna commented Mar 7, 2017

hughbe commented Mar 7, 2017

tarekgh commented Mar 7, 2017

tarekgh commented Dec 7, 2016 •

edited by terrajobst

Loading

AlexRadch commented Dec 16, 2016 •

edited

Loading

karelz commented Dec 16, 2016 •

edited

Loading

tarekgh commented Mar 5, 2017 •

edited

Loading

hughbe commented Mar 7, 2017 •

edited by karelz

Loading