Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Consolidate much of the duplicated code across Encoding implementations #6114

Merged
merged 22 commits into from
Jul 7, 2016

Conversation

jamesqo
Copy link

@jamesqo jamesqo commented Jul 4, 2016

(Note: This PR is best reviewed on a commit-by-commit basis.)

Many of the unsafe methods of Encoding which take pointers were introduced after the overloads which take arrays. So the pointer-based overloads had to be defined in terms of arrays, which means allocating a new array and copying all of the data (since there is no way to reinterpret a pointer as an array). This is undesirable for performance, so all the types in mscorlib which subclass Encoding switch it around and define the array-based overloads in terms of pointers. (They also override the pointer-based ones, to call the overload with EncoderNLS/DecoderNLS.) To do this, 11 methods need to be overridden consistently in each class that subclasses Encoding.

EncodingNLS subclasses Encoding and overrides all of these methods for convenience, so any class that inherits from it doesn't have to worry about that. However, the public Encodings (ASCIIEncoding, UTF8Encoding, etc.) can't do that since it is an internal class and public classes can't derive from internal classes, so they all have to re-duplicate the logic.

Originally, all of this code (which is +1,000 lines) was just copied-and-pasted around the 6 classes. I've changed it so that they call a static method in a new class, EncodingForwarder, passing all of their parameters and this to it. The methods in EncodingForwarder do all the argument validation/checks and call the unsafe overload on the Encoding.

I've also updated/removed many of the comments in the classes that say "if you change something in here, change it in the other 5 classes as well..." since the implementation is now shared among all the classes, and added a note about EncodingForwarder to the top of each class to explain the whole thing.

cc @tarekgh @jkotas @AlexGhiondea

// All of our public Encodings that don't use EncodingNLS must have this (including EncodingNLS)
// So if you fix this, fix the others. Currently those include:
// EncodingNLS, UTF7Encoding, UTF8Encoding, UTF32Encoding, ASCIIEncoding, UnicodeEncoding
// parent method is safe

[System.Security.SecuritySafeCritical] // auto-generated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SecuritySafeCritical annotations should be removed here as well. Instead, they should be added to the shared impl.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas I'm assuming this applies to SecurityCritical as well?

Copy link
Member

@jkotas jkotas Jul 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SecurityCritical should stay here. But SecurityCritical should be also added on the new methods that take pointers.

@jkotas
Copy link
Member

jkotas commented Jul 4, 2016

Could you please run the applicable tests from corefx repo on this, if you have not done it already?

@jamesqo
Copy link
Author

jamesqo commented Jul 4, 2016

@jkotas I have run the tests beforehand. There were a few failures, specifically those testing what parameter name was used in the exceptions thrown from certain ASCIIEncoding methods (see here, here, and here). I'm going to submit a pull request to CoreFX shortly to fix those failures by changing the expected parameter names.

(Another alternative would be to alter this PR so it tells EncodingForwarder the name of the parameter, e.g.

return EncodingForwarder.GetString(this, bytes, byteIndex, byteCount, "byteIndex", "byteCount");

but IMHO I doubt that would be worth it. The only reason I can see to do something like that is when someone does new ASCIIEncoding().GetString(foo, bar, byteCount: -1) and is confused when he/she gets an exception for count, but even then most people call it typed as Encoding, like Encoding.ASCII.GetString(foo, bar, count: -1) so this may actually be helping in that case.)

edit: Closed the CoreFX pull request, see the comments there. Will fix test failures.

{
// Validate parameters

Contract.Assert(encoding != null); // this parameter should only be affected internally, so just do a debug check here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it would look better if these forwarders were internal instance methods on the Encoding class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas IMHO that class is big enough already (2000+ lines of code), plus we'd have to name the methods something else (e.g. DefaultGetByteCount, NLSGetByteCount) since the overloads would conflict with the virtual overloads that were already there. I don't think taking it as a parameter and doing a null check is too big of a deal (that's what all extension methods have to do anyway, basically).

Contract.Assert(encoding != null);
if (s == null)
{
string paramName = encoding is ASCIIEncoding ? "chars" : "s"; // ASCIIEncoding calls the string chars
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to pass the param name to this method instead? this can give flexibility for anyone in the future calling this method will pass the right info. I think we can do that for all methods throw similar exception here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be slowing down the main non-exceptional path...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I thought the cost would be really negligible


In reply to: 69593641 [](ancestors = 69593641)

@tarekgh
Copy link
Member

tarekgh commented Jul 5, 2016

LGTM. thanks @jamesqo

@jkotas
Copy link
Member

jkotas commented Jul 7, 2016

@tarekgh Would you like to merge this? There is a chance that this change may need follow up on the TFS side ... I am not on corpnet these days, and so I won't be able to take care of it.

@tarekgh
Copy link
Member

tarekgh commented Jul 7, 2016

@jkotas I'll merge this one today.

@tarekgh tarekgh merged commit 00eca9a into dotnet:master Jul 7, 2016
@tarekgh
Copy link
Member

tarekgh commented Jul 7, 2016

@jkotas I have submitted the following change in TFS side.

Changeset #1616618 checked in.

@adityamandaleeka
Copy link
Member

@jamesqo Thanks for doing this!

@tarekgh Looks like this is already merged, but in the future, please use the option to squash when merging PRs like this which are broken down into many (in this case 22) small incremental commits all working towards the same goal. It helps keep the commit history clean.

@tarekgh
Copy link
Member

tarekgh commented Jul 8, 2016

@adityamandaleeka I'll do that. sorry for this one.

@janvorli
Copy link
Member

janvorli commented Jul 8, 2016

Some time ago, we have ported the UTF-8 encoder to C++ (used in PAL) in a way so that the source code of both the managed and native versions match except for the necessary syntax differences between C++ and C#. This change breaks the match, so we should update the native version to restore it.

@jkotas
Copy link
Member

jkotas commented Jul 8, 2016

This change have not touched the worker method that we have ported to C++. It just touched the wrappers over the worker method. The wrappers were not ported to C++ because of they are very C# specific. I do not see anything that needs updating. Is there anything particular you have in mind?

@janvorli
Copy link
Member

janvorli commented Jul 8, 2016

Ah, ok then. I have seen changes in the Utf8Encoding.cs, so I have just assumed it has touched the ported parts.

picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Consolidate much of the duplicated code across Encoding implementations

Commit migrated from dotnet/coreclr@00eca9a
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants