-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unless /codepage is specified the build output might depend on the OS code page #44452
Comments
@ryzngard @clairernovotny FYI. For our deterministic rebuild scenario we could capture the OS code page in the PDB. However, that doesn't completely solve the problem since the code page might not be available on the machine when rebuilding. In that case (if this was actually a real problem) we could in theory have a service (database) that provides all existing code pages and download it from there. Another problem with capturing the OS code page is that by doing so we would make the PDB non-deterministic even in the case when all source files are UTF8 encoded but |
We should only capture fallback encoding if the compiler actually used it and thus the compilation is already non-deterministic. |
Slight change:
Curious: why do we feel this is more severe than say the non-determinism that comes from floating point constant folding? Both are host specific forms of non-determinism and it's possible, to some degree, to control both of them. For instance we could force a |
Isn't the floating point deterministic as long as we use the same version of the runtime? Or is there a dependency on the OS/CPU? Adding the OS encoding would unnecessarily make the build dependent on OS configuration, where it may not today (say all files are UTF8 encoded and the OS encoding is never used by the compiler). |
I'm a little confused by the problem described in the issue description. We failed to use a UTF-8 encoding, how can we reasonably fall back to UTF-8? I am not very familiar with CodePages so apologies if I'm missing something obvious. |
We already do force a |
@RikkiGibson we are falling back to the OS ANSI code page (unless |
@gafter So the decimal operations would only depend on the version of corlib (that the compiler uses), correct? |
Our encoding story is complex and strongly influenced by how the native compiler handled encoding. Essentially though in the absence of an explicit encoding we do the following:
In the presence of an explicit What @tmat is asking for here is essentially that when no |
@tmat Yes |
Version Used:
Version 16.7.0 Preview 2.0 [30112.204.master]
Steps to Reproduce:
where
a.cs
is a file that contains bytes that are not valid UTF-8.The compiler attempts to use UTF-8 encoding, fails, and falls back to the default OS ANSI code page.
https://github.com/dotnet/roslyn/blob/master/src/Compilers/Core/Portable/EncodedStringText.cs#L24-L52
Proposal
One option is to set CodePage to UTF-8 by default when the project targets
net5
.The text was updated successfully, but these errors were encountered: