[POC] Force MSVC to use UTF-8 to read C++ source files #362

hidmic · 2019-05-09T15:17:45Z

This pull request fixes #359 but it's actually a proof of concept in the sense that, even though tests pass (and as a side effect, the stack corruption went away though it's unrelated, see #360 (comment)), these compile options probably have to enforced everywhere.

For those interested in the rationale behind this, during compilation msvc will attempt to read source files as utf-8 but if it fails to find a BOM mark, it'll switch to whatever is the current default code page. For the VM that I've been using, that is cp1252. These compile options exist to force it to always read files as if they were encoded in utf-8. So, in our case, a utf-8 encoded kanji character was being decoded as cp1252 during compilation and then encoded as utf-16-le in the final executable. Thus the difference with Python, which always does the right thing.

Signed-off-by: Michel Hidalgo <[email protected]>

hidmic · 2019-05-09T16:04:39Z

Windows CI (building for Release and testing up to test_communication with Connext, Fast-RTPS and Opensplice):

Windows

dirk-thomas · 2019-05-09T17:36:52Z

Great finding!

Since the specific code using non-ASCII characters is in a header file this approach requires handling this on every usage.

How about the alternative to add the BOM to the header file in question. That avoids that users of that header need to apply the logic from this patch and keeps the solution local to the header in question?

hidmic · 2019-05-09T17:50:08Z

Yeah, that'd work too, but I think both solutions, this patch and adding a BOM, fall short. This took me ~4 days to track down and it only takes a missing BOM in one file of an entire ROS2 stack to break wstring pub/sub in Windows.

I wanted to discuss with you if you think there's a way we can apply to flags to all package builds at once. Maybe a manual configuration step during Windows installation?

dirk-thomas · 2019-05-09T17:53:49Z

I think the BOM in a file which does contain such characters is sufficient. I would rather not apply "black magic" from e.g. ament_cmake_ros to magically add these flags to all targets / files without the user noticing.

hidmic · 2019-05-09T17:58:55Z

I would rather not apply "black magic" from e.g. ament_cmake_ros to magically add these flags to all targets / files without the user noticing.

For sure, that's what msvc is doing already and we don't want that either. But a binary, non-printable mark in a file is also somewhat obscure. That's why I think we should add a (giant) note in the documentation instead, guide Windows users to apply this compilation flags themselves and let them know of the potential pitfalls if they'd rather make their own choices w.r.t. encoding.

IMHO just that will prevent many unaware developers from falling down this (deep) rabbit hole.

dirk-thomas · 2019-05-09T18:01:28Z

That's why I think we should add a (giant) note in the documentation instead, guide Windows users to apply this compilation flags themselves and let them know of the potential pitfalls if they'd rather make their own choices w.r.t. encoding.

Why do we want users to need to override compilation flags? I am certainly in favor of a note about the specifics of this issue but it shouldn't recommend users to pass custom compile flags imo.

hidmic · 2019-05-09T18:01:43Z

We can add the BOM mark now to get Dashing out, for sure. I'm not advocating for a manual provisioning of our CI machines halfway API freeze. But just the BOM... I fear even ourselves may fall for this a year from now.

hidmic · 2019-05-09T18:05:05Z

Why do we want users to need to override compilation flags?

As I see it, it's not override, it's addition. Though if someone were to use yet another encoding, yeah, you just opened the door to the same kind of obscure bugs.

hidmic · 2019-05-09T18:06:32Z

Ok, let's add the BOM and update Windows related documentation to make everyone aware of the pitfalls plus maybe some pointers to these charset flags for the advanced user. Does that sound good?

dirk-thomas · 2019-05-10T20:45:26Z

Can this be closed with ros2/rcl_interfaces#79 merged?

hidmic · 2019-05-10T20:46:50Z

Yes!

Force MSVC to use UTF-8 to read source files.

3c8c3da

Signed-off-by: Michel Hidalgo <[email protected]>

hidmic requested a review from dirk-thomas May 9, 2019 15:17

hidmic added the in progress Actively being worked on (Kanban column) label May 9, 2019

hidmic added in review Waiting for review (Kanban column) and removed in progress Actively being worked on (Kanban column) labels May 9, 2019

hidmic mentioned this pull request May 9, 2019

Add UTF-8 BOM to message_fixtures.hpp file ros2/rcl_interfaces#79

Merged

hidmic closed this May 10, 2019

hidmic deleted the hidmic/fix-wstring-encoding branch May 10, 2019 20:46

hidmic removed the in review Waiting for review (Kanban column) label May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[POC] Force MSVC to use UTF-8 to read C++ source files #362

[POC] Force MSVC to use UTF-8 to read C++ source files #362

hidmic commented May 9, 2019 •

edited

Loading

hidmic commented May 9, 2019

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019 •

edited

Loading

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019 •

edited

Loading

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019

hidmic commented May 9, 2019

hidmic commented May 9, 2019 •

edited

Loading

dirk-thomas commented May 10, 2019

hidmic commented May 10, 2019

[POC] Force MSVC to use UTF-8 to read C++ source files #362

[POC] Force MSVC to use UTF-8 to read C++ source files #362

Conversation

hidmic commented May 9, 2019 • edited Loading

hidmic commented May 9, 2019

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019 • edited Loading

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019 • edited Loading

dirk-thomas commented May 9, 2019

hidmic commented May 9, 2019

hidmic commented May 9, 2019

hidmic commented May 9, 2019 • edited Loading

dirk-thomas commented May 10, 2019

hidmic commented May 10, 2019

hidmic commented May 9, 2019 •

edited

Loading

hidmic commented May 9, 2019 •

edited

Loading

hidmic commented May 9, 2019 •

edited

Loading

hidmic commented May 9, 2019 •

edited

Loading