-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Win32API-based encoding option to C++ runtime #61
Conversation
058cfdc
to
2939b29
Compare
Well... why not - given that
I support the idea of having a clear idea what encodings Kaitai Struct supports and what their canonical names are (kaitai-io/kaitai_struct#393 (comment)), but I think we should do it in reverse order (first have the compiler canonicalize, then we can potentially take advantage of accepting only canonical names in runtime libraries). |
@generalmimon That's exactly what I've started with, but then I've realized that implementing all that in one single branch in C++ is roughly same amount of work as implementing this in the compiler, so I went ahead and done kaitai-io/kaitai_struct_compiler#254. Please take a look too? |
…cter specific to that encoding
static std::string bytes_to_str(std::string src, std::string src_enc); | ||
static std::string bytes_to_str(const std::string src, const char *src_enc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GreyCat As evidenced by our CI dashboard (https://ci.kaitai.io/), this change breaks half of our tests for target cpp_stl_11/msvc141_windows_x64
(see kaitai-io/ci_artifacts@cf98b22#diff-ddeb848e50b284745134d0b3371b48f0cf63c52ed1d2f1134ff625c1217b57a3). The ci.json
diff starts as (I simplified it a bit and only showed the first 2 tests that regressed, out of a total of 62):
diff --git 1/ci_json-cpp_stl_11-msvc141_windows_x64-f8948e3f-good.json 2/ci_json-cpp_stl_11-msvc141_windows_x64-cf98b222-bad.json
index d9b49a7..fe807ae 100644
--- 1/ci_json-cpp_stl_11-msvc141_windows_x64-f8948e3f-good.json
+++ 2/ci_json-cpp_stl_11-msvc141_windows_x64-cf98b222-bad.json
@@ -1,13 +1,13 @@
{
"$meta": {
"lang": "cpp_stl",
- "timestamp": "2023-07-08T22:54:28Z",
+ "timestamp": "2023-07-14T19:03:17Z",
"ci": {
"ci": "appveyor",
- "build_id": "47500100",
- "job_id": "a5x62yaygwv8fuxc",
+ "build_id": "47552134",
+ "job_id": "1rj2gtnxn51dxib8",
"job_number": "3",
- "url": "https://ci.appveyor.com/project/kaitai-io/ci-targets/builds/47500100/job/a5x62yaygwv8fuxc"
+ "url": "https://ci.appveyor.com/project/kaitai-io/ci-targets/builds/47552134/job/1rj2gtnxn51dxib8"
}
},
"BcdUserTypeBe": "passed",
@@ -29,13 +29,13 @@
"BitsUnalignedB64Le": "passed",
"BufferedStruct": "passed",
"BytesPadTerm": "passed",
- "CastNested": "passed",
+ "CastNested": "format_build_failed",
"CastToImported": "passed",
"CastToTop": "passed",
"CombineBool": "passed",
"CombineBytes": "passed",
"CombineEnum": "passed",
- "CombineStr": "passed",
+ "CombineStr": "format_build_failed",
"Debug0": "passed",
"DebugArrayUser": "passed",
"DebugEnumName": "passed",
@@ ... @@
The error is the following (https://ci.appveyor.com/project/kaitai-io/ci-targets/builds/47552134/job/1rj2gtnxn51dxib8?fullLog=true#L492):
cast_nested.cpp
C:\projects\ci-targets\tests\compiled\cpp_stl_11\cast_nested.cpp(90):
error C2664: 'std::string kaitai::kstream::bytes_to_str(const std::string,const char *)':
cannot convert argument 2 from 'std::string' to 'const char *'
[C:\projects\ci-targets\tests\compiled\cpp_stl_11\bin\ks_tests.vcxproj]
C:\projects\ci-targets\tests\compiled\cpp_stl_11\cast_nested.cpp(90):
note: No user-defined-conversion operator available that can perform this conversion, or the operator cannot be called
Line compiled/cpp_stl_11/cast_nested.cpp:90
is:
m_value = kaitai::kstream::bytes_to_str(m__io->read_bytes_term(0, false, true, true), std::string("ASCII"));
But this apparently isn't a MSVC-specific issue. I've just checked cpp_stl_11/gcc4.8
and the impact looks very much the same (see kaitai-io/ci_artifacts@4ab4ae1#diff-ddeb848e50b284745134d0b3371b48f0cf63c52ed1d2f1134ff625c1217b57a3), again the same 62 regressions in total:
diff --git 1/ci_json-cpp_stl_11-gcc4.8-00689cd1-good.json 2/ci_json-cpp_stl_11-gcc4.8-4ab4ae15-bad.json
index fd80bd6..942318f 100644
--- 1/ci_json-cpp_stl_11-gcc4.8-00689cd1-good.json
+++ 2/ci_json-cpp_stl_11-gcc4.8-4ab4ae15-bad.json
@@ -1,13 +1,13 @@
{
"$meta": {
"lang": "cpp_stl",
- "timestamp": "2023-07-08T22:37:56Z",
+ "timestamp": "2023-07-14T18:46:24Z",
"ci": {
"ci": "github",
- "run_id": "5496725152",
- "run_number": "254",
- "job_id": "14883558372",
- "url": "https://github.com/kaitai-io/ci_targets/actions/runs/5496725152/jobs/10016910686"
+ "run_id": "5557284492",
+ "run_number": "255",
+ "job_id": "15054460788",
+ "url": "https://github.com/kaitai-io/ci_targets/actions/runs/5557284492/jobs/10150864944"
}
},
"BcdUserTypeBe": "passed",
@@ -29,13 +29,13 @@
"BitsUnalignedB64Le": "passed",
"BufferedStruct": "passed",
"BytesPadTerm": "passed",
- "CastNested": "passed",
+ "CastNested": "format_build_failed",
"CastToImported": "passed",
"CastToTop": "passed",
"CombineBool": "passed",
"CombineBytes": "passed",
"CombineEnum": "passed",
- "CombineStr": "passed",
+ "CombineStr": "format_build_failed",
"Debug0": "passed",
"DebugArrayUser": "passed",
"DebugEnumName": "passed",
@@ ... @@
The cpp_stl_11/gcc4.8
target isn't visible at https://ci.kaitai.io/ though, because the CI dashboard is configured to display cpp_stl_11/gcc4.8_linux
, see src/App.vue:63
:
["cpp_stl_11", "gcc4.8_linux"],
I assume pushing to cpp_stl_11/gcc4.8
was an oversight? I guess keeping the _linux
suffix makes sense - we want to test C++ on different platforms, so it's a good idea to specify the platform in all C++ target names.
This adds capability to convert string encodings in KS runtime C++ on Windows using only system-supplied Win32 APIs. This offers alternative to bringing external iconv library (which is the only way we support string conversion in C++ now).
Other changes worth mentioning:
std::string src_enc
toconst char* src_enc
, which paves the way to optimize it further.