Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C#: Cleanup and sync StringExtensions with core #67031

Merged
merged 4 commits into from
Nov 28, 2022

Conversation

raulsntos
Copy link
Member

@raulsntos raulsntos commented Oct 7, 2022

I started with the intention of creating separate commits and maybe creating separate PRs but I kind of gave up half way because all of these changes end up being related to each other in one way or another and I don't think it was useful to keep them separated but I can extract some of the simpler less-controversial changes into smaller PRs if that's preferred.

Changes

Methods to consider removing

Adding extension methods can pollute the type so we should consider if the methods we add are really useful or necessary. Many of the existing methods don't add much and their behavior can be achieved with a similar one-liner using the methods provided by the BCL. Often we add methods that already exist in the BCL with a different name, this means IntelliSense will show multiple methods with very similar names and that can be confusing to users. I think it would be better to avoid providing those methods and instead recommend users to use the existing APIs which would also benefit from existing analyzers provided by Microsoft or third-party libraries that understand the existing BCL APIs and can warn users of potential incorrect or non-optimal usage.

  • UnicodeAt since it's just a wrapper over the indexer string[int index].
  • BeginsWith and EndsWith since they're just a wrapper over string.StartsWith and string.EndsWith.
    • It can be confusing for users when there are multiple methods with a very similar name.
    • Also, EndsWith has the same signature as string.EndsWith and the instance method takes precedence so it's already unused.
  • Split since it's just a wrapper over string.Split.
  • Substr since it's just a wrapper over string.Substring.
  • Hash since users should probably use string.GetHashCode instead.
  • Find, FindN, RFind and RFindN since they're just wrappers over string.IndexOf and string.LastIndexOf.
  • CasecmpTo, NocasecmpTo and CompareTo, users should probably use string.Equals and string.Compare both of which take a StringComparison parameter that allows specifying the culture and case sensitivity.
  • LPad and RPad since they are just wrappers over string.PadLeft and string.PadRight.
  • LStrip and RStrip since they are just wrappers over string.TrimStart and string.TrimEnd.
  • TrimPrefix and TrimSuffix in a future .NET version where RemovePrefix and RemoveSuffix may be added (Add overloads to string trimming dotnet/runtime#14386).
  • IsValidFloat and IsValidInt since they're just wrappers over float.TryParse and int.TryParse.
  • ToUTF8Buffer, ToUTF16Buffer, ToUTF32Buffer, ToASCIIBuffer, GetStringFromUTF8, GetStringFromUTF16, GetStringFromUTF32 and GetStringFromASCII since they are just wrappers over System.Text.Encoding.
  • SHA1Buffer, SHA1Text, SHA256Buffer, SHA256Text, MD5Buffer and MD5Text since they are just wrappers over System.Security.Cryptography classes (and using our methods hides CA5350 and CA5351).

Methods not added

These methods are exposed in GDScript but don't currently exist in StringExtensions and haven't been added/exposed in this PR, we could add them in a future PR if we consider them useful.

  • humanize_size because it's a static method and it takes an int. (Bind the String::humanize_size method #32546).
  • get_slice, get_slice_count and get_slicec.
    • GetSliceCount is already implemented because it's used in Capitalize but not exposed.
    • GetSliceCharacter implements get_slicec because it's used in Capitalize but it's not exposed.
  • num, num_int64, num_scientific and num_uint64 because users should use ToString and/or IFormattable.
  • naturalnocasecmp_to
  • rsplit because we also don't really implement split with the same behavior as GDScript.
  • repeat because users should probably use the string constructor or StringBuilder.

Copy link
Contributor

@neikeq neikeq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding names like UTF8, I would prefer to use PascalCase as I explain here. UPPERCASE case makes names harder to read when used as part of PascalCase identifier.

This seems to be one of the cases where the .NET API is inconsistent. The properties in the Encoding class use UPPERCASE, while other methods seem to use PascalCase (e.g., Char.ConvertToUtf32 and Char.IsAscii).

// TODO: Could be more efficient if we get a char version of `IndexOf`.
// See https://github.com/dotnet/runtime/issues/44116
return instance.IndexOf(what.ToString(), from,
caseSensitive ? StringComparison.Ordinal : StringComparison.OrdinalIgnoreCase);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do something like this:

if (caseSensitive)
    return instance.IndexOf(what, from); // Ordinal, case sensitive

 return CultureInfo.InvariantCulture.CompareInfo.IndexOf(instance, what, from, CompareOptions.OrdinalIgnoreCase);

NativeFuncs.godotsharp_string_md5_buffer(instanceStr, out var md5Buffer);
using (md5Buffer)
return Marshaling.ConvertNativePackedByteArrayToSystemArray(md5Buffer);
#pragma warning disable CA5351 // Do Not Use Broken Cryptographic Algorithms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great if our method could be annotated in some way so that callers would get this warning, but it doesn't seem to be possible :(

@neikeq
Copy link
Contributor

neikeq commented Nov 3, 2022

Regarding the removal of methods. I think we should have some kind of table that could be used as reference to look for the equivalent if the method is removed.

The removal of the following methods would be harmless:

  • UnicodeAt
  • BeginsWith and EndsWith
  • LPad and RPad
  • LStrip and RStrip

I have some comments about the other suggestions:

  • Split since it's just a wrapper over string.Split.

string.Split would be more verbose for removing empty entries, and may not be as easy to figure out:

str.Split(",", allowEmpty: false);

str.Split(",", StringSplitOptions.RemoveEmptyEntries);
  • Substr since it's just a wrapper over string.Substring.

Doesn't seem to be just a wrapper, unless our implementation is doing unnecessary things.

  • Hash since users should probably use string.GetHashCode instead.

The purpose of this method is to return the same hash as the GDScript hash function. Not sure if it has any uses, though.

Find, FindN, RFind and RFindN since they're just wrappers over string.IndexOf and string.LastIndexOf.

Just like with Split, it's more verbose. Plus, in this case, it's too easy to forget to use StringComparison.Ordinal. When omitted, it uses CurrentCulture.

CasecmpTo, NocasecmpTo and CompareTo, users should probably use string.Equals and string.Compare both of which take a StringComparison parameter that allows specifying the culture and case sensitivity.

I agree about CasecmpTo and NocasecmpTo. But I disagree with removing CompareTo for the same reason regarding StringComparison and CurrentCulture.

IsValidFloat and IsValidInt since they're just wrappers over float.TryParse and int.TryParse.

Slightly agree. TryParse is very well-known, but the need for out _ is kinda ugly.

  • ToUTF8Buffer, ToUTF16Buffer, ToUTF32Buffer, ToASCIIBuffer, GetStringFromUTF8, GetStringFromUTF16, GetStringFromUTF32 and GetStringFromASCII since they are just wrappers over System.Text.Encoding.

  • SHA1Buffer, SHA1Text, SHA256Buffer, SHA256Text, MD5Buffer and MD5Text since they are just wrappers over System.Security.Cryptography classes (and using our methods hides CA5350 and CA5351).

Our methods are less verbose, and it would be harder for a user to know about the replacement if they were removed, unless they are already very familiar with the .NET API. I agree that hiding these warnings is bad.

@raulsntos
Copy link
Member Author

Regarding names like UTF8, I would prefer to use PascalCase

Agreed, but for this PR I'm following the current naming, since I thought naming changes would grow the scope of this PR too much, but I can rename them if you prefer.

Regarding the removal of methods. I think we should have some kind of table that could be used as reference to look for the equivalent if the method is removed.

I think that's a good idea, should I add it to https://docs.godotengine.org/en/stable/tutorials/scripting/c_sharp/c_sharp_differences.html?

string.Split would be more verbose for removing empty entries, and may not be as easy to figure out

Yes, I'm not sure how common that is though. Would it be enough to add documentation about this?

Substr since it's just a wrapper over string.Substring

Doesn't seem to be just a wrapper, unless our implementation is doing unnecessary things.

Our implementation just clamps the length to the length of the array to avoid ArgumentOutOfRangeException but I think that could easily be done by the user (e.g.: myString.Substring(0, Math.Max(10, myString.Length)) although it is more verbose.

Hash since users should probably use string.GetHashCode instead.

The purpose of this method is to return the same hash as the GDScript hash function. Not sure if it has any uses, though.

I don't know, it doesn't seem useful to me. Would love to see usages of this.

Just like with Split, it's more verbose. Plus, in this case, it's too easy to forget to use StringComparison.Ordinal. When omitted, it uses CurrentCulture.

This is reported by the .NET analyzers (see CA1304 and/or CA1307).

Also, I personally think the R and N preffix and suffix is difficult to understand and not very idiomatic for C# where it's common to avoid abbreviations.

TryParse is very well-known, but the need for out _ is kinda ugly.

I guess it is, I personally don't have an issue with it. Wouldn't it be more common for users to want the parsed value anyway?

ToUTF8Buffer, ToUTF16Buffer, ...

Our methods are less verbose, and it would be harder for a user to know about the replacement if they were removed, unless they are already very familiar with the .NET API. I agree that hiding these warnings is bad.

I agree that these ones are more difficult to find for a less-experienced .NET user, adding documentation could help here though and I'd prefer if users learned about existing .NET APIs that they can use for other non-Godot .NET applications.

@akien-mga
Copy link
Member

Bump, would be good to find consensus and merge.

@raulsntos
Copy link
Member Author

Just to be clear, the discussion is about methods that were not touched by this PR, I tried to keep this PR uncontroversial so it could be merged quickly since it breaks compatibility so I'd want to get this merged as early as possible during the betas.

However, I can update the PR to include the changes that we've already agreed on in the discussion:

The removal of the following methods would be harmless:

  • UnicodeAt
  • BeginsWith and EndsWith
  • LPad and RPad
  • LStrip and RStrip

And:

Regarding names like UTF8, I would prefer to use PascalCase

@neikeq
Copy link
Contributor

neikeq commented Nov 25, 2022

The only pending change is my suggestion for Find(char). I would prefer not to remove that method.

The UTF8 word case is not required to merge this, especially considering some of these were already named that way. It's something I would like to change in the future though.

Lastly, it's fine if you want to include the extra methods that can be trivially removed. But I'm wondering if it would be better to mark them as obsolete (with the error flag, instead of a warning). That could be helpful for users upgrading from 3.x. Then we remove them entirely in 4.1 or later.
EDIT: Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

@raulsntos raulsntos force-pushed the dotnet/string-extensions branch from 5abd16d to 19ccfa8 Compare November 25, 2022 16:30
- Moved `GetBaseName` to keep methods alphabetically sorted.
- Removed `Length`, users should just use the Length property.
- Removed `Insert`, string already has a method with the same signature that takes precedence.
- Removed `Erase`.
- Removed `ToLower` and `ToUpper`, string already has methods with the same signature that take precedence.
- Removed `FindLast` in favor of `RFind`.
- Replaced `RFind` and `RFindN` implemenation with a ca ll to `string.LastIndexOf` to avoid marshaling.
- Added `LPad` and `RPad`.
- Added `StripEscapes`.
- Replaced `LStrip` and `RStrip` implementation with a call to `string.TrimStart` and `string.TrimEnd`.
- Added `TrimPrefix` and `TrimSuffix`.
- Renamed `OrdAt` to `UnicodeAt`.
- Added `CountN` and move the `caseSensitive` parameter of `Count` to the end.
- Added `Indent` and `Dedent`.
- Renamed `IsValidInteger` to `IsValidInt`.
- Added `IsValidFileName`.
- Added `IsValidHexNumber`.
- Added support for IPv6 to `IsValidIPAddress`.
- Added `ValidateNodeName`.
- Updated the documentation of the `IsValid*` methods.
- Replaced `MD5Buffer`, `MD5Text`, `SHA256Buffer` and `SHA256Text` implementation to use the `System.Security.Cryptography` classes and avoid marshaling.
- Added `SHA1Buffer` and `SHA1Text`.
- Renamed `ToUTF8` to `ToUTF8Buffer`.
- Renamed `ToAscii` to `ToASCIIBuffer`.
- Added `ToUTF16Buffer` and `ToUTF32Buffer`.
- Added `GetStringFromUTF16` and `GetStringFromUTF32`.
@raulsntos raulsntos force-pushed the dotnet/string-extensions branch from 19ccfa8 to d0b166d Compare November 25, 2022 16:30
@raulsntos
Copy link
Member Author

Rebased and added Find(char) back.

Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

I'm fine with this option but I'm worried about users that have already upgraded to 4.0 and may miss the deprecation notice, although maybe that's to be expected of using beta software.

@neikeq
Copy link
Contributor

neikeq commented Nov 26, 2022

Or how about marking them obsolete with a warning in 3.x, and removing them in 4.0?

I'm fine with this option but I'm worried about users that have already upgraded to 4.0 and may miss the deprecation notice, although maybe that's to be expected of using beta software.

I'm fine with either of the two options, or even a mix of both. It doesn't really do any harm.

- Removed `UnicodeAt`
- Removed `EndsWith`
- Removed `LPad` and `RPad`
- Deprecated `BeginsWith` in favor of `string.StartsWith`
- Deprecated `LStrip` and `RStrip` in favor of `string.TrimStart` and `string.TrimEnd`
@raulsntos
Copy link
Member Author

I have added a commit that:

  • Removes:
    • UnicodeAt: I think it's unlikely users would be using this over the indexer, also they would have been using OrdAt which is the old name.
    • EndsWith: Since a method with the same signature already exists in string and since instance methods take precedence over extension methods it's unlikely users would have been using it.
    • LPad and RPad: Since these methods were added in this PR so nobody could be using them.
  • Deprecates:
    • BeginsWith in favor of string.StartsWith.
    • LStrip and RStrip in favor of string.TrimStart and string.TrimEnd.

@akien-mga akien-mga merged commit 8253c28 into godotengine:master Nov 28, 2022
@akien-mga
Copy link
Member

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants