-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[POC] Force MSVC to use UTF-8 to read C++ source files #362
Conversation
Signed-off-by: Michel Hidalgo <[email protected]>
Great finding! Since the specific code using non-ASCII characters is in a header file this approach requires handling this on every usage. How about the alternative to add the BOM to the header file in question. That avoids that users of that header need to apply the logic from this patch and keeps the solution local to the header in question? |
Yeah, that'd work too, but I think both solutions, this patch and adding a BOM, fall short. This took me ~4 days to track down and it only takes a missing BOM in one file of an entire ROS2 stack to break wstring pub/sub in Windows. I wanted to discuss with you if you think there's a way we can apply to flags to all package builds at once. Maybe a manual configuration step during Windows installation? |
I think the BOM in a file which does contain such characters is sufficient. I would rather not apply "black magic" from e.g. |
For sure, that's what IMHO just that will prevent many unaware developers from falling down this (deep) rabbit hole. |
Why do we want users to need to override compilation flags? I am certainly in favor of a note about the specifics of this issue but it shouldn't recommend users to pass custom compile flags imo. |
We can add the BOM mark now to get Dashing out, for sure. I'm not advocating for a manual provisioning of our CI machines halfway API freeze. But just the BOM... I fear even ourselves may fall for this a year from now. |
As I see it, it's not override, it's addition. Though if someone were to use yet another encoding, yeah, you just opened the door to the same kind of obscure bugs. |
Ok, let's add the BOM and update Windows related documentation to make everyone aware of the pitfalls plus maybe some pointers to these charset flags for the advanced user. Does that sound good? |
Can this be closed with ros2/rcl_interfaces#79 merged? |
Yes! |
This pull request fixes #359 but it's actually a proof of concept in the sense that, even though tests pass (and as a side effect, the stack corruption went away though it's unrelated, see #360 (comment)), these compile options probably have to enforced everywhere.
For those interested in the rationale behind this, during compilation
msvc
will attempt to read source files asutf-8
but if it fails to find a BOM mark, it'll switch to whatever is the current default code page. For the VM that I've been using, that iscp1252
. These compile options exist to force it to always read files as if they were encoded inutf-8
. So, in our case, autf-8
encoded kanji character was being decoded ascp1252
during compilation and then encoded asutf-16-le
in the final executable. Thus the difference with Python, which always does the right thing.