Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Force MSVC to use UTF-8 to read C++ source files #362

Closed
wants to merge 1 commit into from

Conversation

hidmic
Copy link
Contributor

@hidmic hidmic commented May 9, 2019

This pull request fixes #359 but it's actually a proof of concept in the sense that, even though tests pass (and as a side effect, the stack corruption went away though it's unrelated, see #360 (comment)), these compile options probably have to enforced everywhere.

For those interested in the rationale behind this, during compilation msvc will attempt to read source files as utf-8 but if it fails to find a BOM mark, it'll switch to whatever is the current default code page. For the VM that I've been using, that is cp1252. These compile options exist to force it to always read files as if they were encoded in utf-8. So, in our case, a utf-8 encoded kanji character was being decoded as cp1252 during compilation and then encoded as utf-16-le in the final executable. Thus the difference with Python, which always does the right thing.

@hidmic hidmic requested a review from dirk-thomas May 9, 2019 15:17
@hidmic hidmic added the in progress Actively being worked on (Kanban column) label May 9, 2019
@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

Windows CI (building for Release and testing up to test_communication with Connext, Fast-RTPS and Opensplice):

  • Windows Build Status

@hidmic hidmic added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 9, 2019
@dirk-thomas
Copy link
Member

Great finding!

Since the specific code using non-ASCII characters is in a header file this approach requires handling this on every usage.

How about the alternative to add the BOM to the header file in question. That avoids that users of that header need to apply the logic from this patch and keeps the solution local to the header in question?

@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

Yeah, that'd work too, but I think both solutions, this patch and adding a BOM, fall short. This took me ~4 days to track down and it only takes a missing BOM in one file of an entire ROS2 stack to break wstring pub/sub in Windows.

I wanted to discuss with you if you think there's a way we can apply to flags to all package builds at once. Maybe a manual configuration step during Windows installation?

@dirk-thomas
Copy link
Member

I think the BOM in a file which does contain such characters is sufficient. I would rather not apply "black magic" from e.g. ament_cmake_ros to magically add these flags to all targets / files without the user noticing.

@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

I would rather not apply "black magic" from e.g. ament_cmake_ros to magically add these flags to all targets / files without the user noticing.

For sure, that's what msvc is doing already and we don't want that either. But a binary, non-printable mark in a file is also somewhat obscure. That's why I think we should add a (giant) note in the documentation instead, guide Windows users to apply this compilation flags themselves and let them know of the potential pitfalls if they'd rather make their own choices w.r.t. encoding.

IMHO just that will prevent many unaware developers from falling down this (deep) rabbit hole.

@dirk-thomas
Copy link
Member

That's why I think we should add a (giant) note in the documentation instead, guide Windows users to apply this compilation flags themselves and let them know of the potential pitfalls if they'd rather make their own choices w.r.t. encoding.

Why do we want users to need to override compilation flags? I am certainly in favor of a note about the specifics of this issue but it shouldn't recommend users to pass custom compile flags imo.

@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

We can add the BOM mark now to get Dashing out, for sure. I'm not advocating for a manual provisioning of our CI machines halfway API freeze. But just the BOM... I fear even ourselves may fall for this a year from now.

@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

Why do we want users to need to override compilation flags?

As I see it, it's not override, it's addition. Though if someone were to use yet another encoding, yeah, you just opened the door to the same kind of obscure bugs.

@hidmic
Copy link
Contributor Author

hidmic commented May 9, 2019

Ok, let's add the BOM and update Windows related documentation to make everyone aware of the pitfalls plus maybe some pointers to these charset flags for the advanced user. Does that sound good?

@dirk-thomas
Copy link
Member

Can this be closed with ros2/rcl_interfaces#79 merged?

@hidmic
Copy link
Contributor Author

hidmic commented May 10, 2019

Yes!

@hidmic hidmic closed this May 10, 2019
@hidmic hidmic deleted the hidmic/fix-wstring-encoding branch May 10, 2019 20:46
@hidmic hidmic removed the in review Waiting for review (Kanban column) label May 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad WStrings encoding when publishing over Connext on Windows
2 participants