Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Encoding.GetBytes(string, offset, count) #19574

Closed
bartonjs opened this issue Dec 7, 2016 · 26 comments
Closed

Add Encoding.GetBytes(string, offset, count) #19574

bartonjs opened this issue Dec 7, 2016 · 26 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Encoding help wanted [up-for-grabs] Good issue for external contributors wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Milestone

Comments

@bartonjs
Copy link
Member

bartonjs commented Dec 7, 2016

The Encoding class has methods for encoding the middle of a char[], but not for the middle of a string. A caller is forced to switch to unsafe (char*) or to re-allocate as char[].

    public abstract partial class Encoding : System.ICloneable
    {
...
        public unsafe virtual int GetByteCount(char* chars, int count) { throw null; }
        public virtual int GetByteCount(char[] chars) { throw null; }
        public abstract int GetByteCount(char[] chars, int index, int count);
        public virtual int GetByteCount(string s) { throw null; }
+       public int GetByteCount(string s, int index, int count) { throw null; }
...
        public unsafe virtual int GetBytes(char* chars, int charCount, byte* bytes, int byteCount) { throw null; }
        public virtual byte[] GetBytes(char[] chars) { throw null; }
        public virtual byte[] GetBytes(char[] chars, int index, int count) { throw null; }
        public abstract int GetBytes(char[] chars, int charIndex, int charCount, byte[] bytes, int byteIndex);
        public virtual byte[] GetBytes(string s) { throw null; }
+       public byte[] GetBytes(string s, int index, int count) { throw null; }
        public virtual int GetBytes(string s, int charIndex, int charCount, byte[] bytes, int byteIndex) { throw null; }
...
}

The GetBytes(string, int, int) method would allow for the typical "just give me a big enough array" while the GetByteCount(string, int, int) method would allow for ensuring that an existing buffer is sufficiently big to call the existing GetBytes(string, int, int, byte[], int) method.

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2016

@bartonjs
we already have

public virtual int GetBytes(String s, int charIndex, int charCount,byte[] bytes, int byteIndex)

I think this address your request here. but it may make sense expose the GetByteCount version you suggested

@bartonjs
Copy link
Member Author

bartonjs commented Dec 7, 2016

No, because you don't know how big the buffer needs to be without GetByteCount. And the version that just returns a byte[] makes the char[] and string versions have the same shapes.

@svick
Copy link
Contributor

svick commented Dec 7, 2016

Should this wait for Span<T> (https://github.com/dotnet/corefx/issues/13892)? With that, the API would look something like:

public virtual byte[] GetBytes(ReadOnlySpan<char> span);

And it would cover the existing use cases of string, char[], char[], int, int, char*, int as well as the use case in question here: string, int, int.

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2016

@svick that make sense to wait for Span.

@karelz
Copy link
Member

karelz commented Dec 7, 2016

Chatted with @bartonjs - this one is higher priority as there is not good safe alternative. If there's plan to have Span for 1.2, we could wait, otherwise we might add this API.
Let's discuss the Span timeline and our plans in API review meeting next week (keeping 'api-ready-for-review').

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2016

I am not seeing this one is urgent to have so I believe we can just wait for Span. why we are seeing this high priority. for me it is still nice to have feature as there is different ways achieve same results.

@bartonjs
Copy link
Member Author

bartonjs commented Dec 7, 2016

If there's a concrete plan that adds Span to Encoding, this is closable.

If there's no concrete plan, this will allow already existing over-allocating code to be cleaned up.

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2016

@bartonjs right, then we can wait to figure the Span in general. my point is we don't have to rush this as I am not seeing any urgency for it. cleaning up the code is nice but it is not urgent.

@svick any idea who is looking at Span in general or who is following up?

@tarekgh
Copy link
Member

tarekgh commented Dec 7, 2016

talked offline with @bartonjs and I am fine if we can take this and discuss it with the design reviewers.

@terrajobst
Copy link
Member

We shouldn't block API additions on Span<T>. For one, we don't know when it will ship stable. Secondly, we don't know which assembly it's in, and thirdly, even if had Span<T> today, we might still want to add the string version because it represents what the customer actually wants to do, i.e. encode a substring. If anything, we could talk about a StringSegment type :-)

We think API looks fine as proposed.

@AlexRadch
Copy link
Contributor

@karelz I am working on this issue.

@tarekgh
Copy link
Member

tarekgh commented Dec 15, 2016

thanks @AlexRadch

please include me in your PR.

@AlexRadch
Copy link
Contributor

@bartonjs methods should be virtual or not?

@AlexRadch
Copy link
Contributor

AlexRadch commented Dec 16, 2016

@karelz Can you answer, methods should be virtual or not? Both reviewers @jkotas, @tarekgh are thinking that methods should NOT be virtual and I do not see any disadvantage to make them non virtual.

@karelz
Copy link
Member

karelz commented Dec 16, 2016

The answer is driven in the PR: dotnet/coreclr#8651 (comment)
We should update the API proposal here once decision is made ...

@karelz
Copy link
Member

karelz commented Dec 18, 2016

For the record: The API proposal was updated by @tarekgh - see dotnet/coreclr#8651 (comment).

@jkotas jkotas closed this as completed Dec 19, 2016
@bartonjs
Copy link
Member Author

Reopening the issue since it needs a ref change and new tests in corefx.

@bartonjs bartonjs reopened this Dec 19, 2016
@tarekgh
Copy link
Member

tarekgh commented Jan 4, 2017

@AlexRadch , did you have a chance to work on exposing the APIs in the corefx side? thanks.

@karelz
Copy link
Member

karelz commented Jan 29, 2017

No activity for 1.5 months, unassigning - the issue is back "up for grabs", for anyone to pick it up.
Next steps: Exposed the API in CoreFX and add tests.

@hughbe
Copy link
Contributor

hughbe commented Mar 5, 2017

Looks like this is already implemented in coreclr/corert. I'll grab exposing and testing this

@tarekgh
Copy link
Member

tarekgh commented Mar 5, 2017

@hughbe this is great.

@karelz
Copy link
Member

karelz commented Mar 6, 2017

Assigning to @hughbe ...

@hughbe
Copy link
Contributor

hughbe commented Mar 7, 2017

This is technically source breaking right?

var encoding = new UTF8Encoding();
encoding.GetByteCount(null, 0, 0)

String and char[] can both be assigned from null, so this is now ambiguous and fails to compile

@JonHanna
Copy link
Contributor

JonHanna commented Mar 7, 2017

It's only ambiguous with a literal null, and it throws ANE when null is passed to it, so the only code that will be broken is code that would always throw. Source-breaking that code is doing the author a favour.

@hughbe
Copy link
Contributor

hughbe commented Mar 7, 2017

Yup as I thought - it's unlikely

@tarekgh
Copy link
Member

tarekgh commented Mar 7, 2017

we already has the same case before so this is not really something new. for example we have Encoding.GetBytes(char[]) and Encoding.GetBytes(string). I am not really worried about the null case (without casts).

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Encoding help wanted [up-for-grabs] Good issue for external contributors wishlist Issue we would like to prioritize, but we can't commit we will get to it yet
Projects
None yet
Development

No branches or pull requests

9 participants