-
Notifications
You must be signed in to change notification settings - Fork 879
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read lines with default encoding from os #12942
base: main
Are you sure you want to change the base?
Conversation
To me using |
Right, this PR is exactly a workaround for that specific case on older platforms (like z/os apparently) that don't use utf8 as a default. In other words, when we call the single-argument
🤯 You're blowing my mind with this recommendation. Just no. Maybe you have confidence in this, but I sure don't. |
Let me rephrase, your proposed fix does not work on jdk18 and newer and might not work on older jdks if user configures utf-8 as the default encoding for jvm.
An alternative would be to read the file as bytes. If it decodes as utf-8 then all is fine, if it doesn't try another encoding. Considering that we only care about the 7-bit chars that are the same in utf-8 and iso-8859-1 imo just using iso-8859-1 to read the file would be simpler. |
Can you help me to understand why you think that this won't work? In the case of newer JVMs, as you pointed out, the default is already utf8, so This PR isn't trying to work-around the case where the user misconfigures the default encoding for the jvm (to mismatch the os, for example). It only works around the case where the jvm default is not utf-8 because the system default is not utf-8.
Not only am I not sure that we only care about the chars that are the same, I have not done the charset comparison to verify that the even ARE the same. |
This is all correct, the only thing you are forgetting is that reading the file as
If you look at https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/CgroupV1ContainerIdExtractor.java and https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/instrumentation/resources/library/src/main/java/io/opentelemetry/instrumentation/resources/CgroupV2ContainerIdExtractor.java you'll see that we only search for simple characters like |
1789deb
to
7e2fcaa
Compare
Right, this is literally what is stated in the description of #12418 and what this PR addresses. I've added a test, maybe that helps to add some clarity. |
Repeat of #12419 which fixes #12418
@laurit had previously asked if a file could be provided to reproduce this. Unfortunately, it's problematic because it depends on the os encoding. The original problem was from z/OS.....which I have no reasonable way of reproducing with tests etc. It's odd that z/os wouldn't support or use UTF-8, but here we are.
This change should at least handle those edge cases.