Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with utf-8 #7

Closed
id1945 opened this issue Jun 2, 2023 · 13 comments
Closed

Issue with utf-8 #7

id1945 opened this issue Jun 2, 2023 · 13 comments

Comments

@id1945
Copy link

id1945 commented Jun 2, 2023

The image is confidential, so it should be removed

@undecaf
Copy link
Owner

undecaf commented Jun 2, 2023

Hmm... the raw bytes of the QR code are decoded by the browser's TextDecoder according to the selected character encoding, so there is not much that I can do about false decoding.

On the other hand, are you sure that the QR code is UTF-8 encoded? I googled and found several Vietnamese character encodings.

@id1945
Copy link
Author

id1945 commented Jun 2, 2023

The image is confidential, so it should be removed

@undecaf
Copy link
Owner

undecaf commented Jun 2, 2023

Let me explain in more detail: zbar-wasm delegates decoding the raw bytes contained in the QR code to the text displayed as rawValue to the TextDecoder built into each browser. By default, the TextDecoder uses UTF-8 encoding, but a several other encodings are available.

If the text is not what is expected then I see two possible reasons:

  1. A bug in the browser's TextDecoder -- unlikely but can be verified by trying whether a different browser produces the same text.
  2. The text in the QR code is in an encoding different from UTF-8, so decoding the raw bytes as UTF-8 will likely yield a wrong result.
    Therefore I am kindly asking you again whether it can be confirmed that your QR code is actually UTF-8-encoded. The producer of the QR code should be able to provide this information.

@undecaf
Copy link
Owner

undecaf commented Jun 2, 2023

I copied your expected text, pasted in into an online QR generator and obtained this QR code:
QR code generated from expected text

Scanning this QR code yields the following (correct) result:
Result of scanning the generated QR code

This indicates that the text in your QR code is most likely not UTF-8-encoded.

@id1945
Copy link
Author

id1945 commented Jun 2, 2023

The image is confidential, so it should be removed

@undecaf
Copy link
Owner

undecaf commented Jun 2, 2023

Hmmm... the BarcodeDetector built into mobile Chrome also reads that code correctly.

Please give me some time to look into that.

@id1945
Copy link
Author

id1945 commented Jun 2, 2023

@undecaf

I really hope you get this fixed soon. Since I am developing a library for Angular ngx-scanner-qrcode based on yours.
And finally, please allow me to delete the photo of my citizenship card in the comment above.
Waiting to hear from you soon ^^!

@undecaf
Copy link
Owner

undecaf commented Jun 5, 2023

@id1945 Your original qr code can be read correctly by configuring zbar-wasm to return binary data and delegating text decoding to the native TextDecoder e.g. like so:

import { ZBarConfigType, ZBarImage, ZBarScanner, ZBarSymbolType } ...;

const scanner = await ZBarScanner.create();
scanner.setConfig(ZBarSymbolType.ZBAR_NONE, ZBarConfigType.ZBAR_CFG_BINARY, 1); // <-- important
scanner.scan(image);
const symbols = scanner.getResults();

However, in order to read the code that I generated correctly, the scanner.config() statement has to be omitted.

Thus, with a particular setup, either one or the other QR code can be read correctly but not both.

This appears to be one of the peculiarities of the ZBar bar code reader on which zbar-wasm is based. Unfortunately, there is nothing that I can do about that.

@id1945
Copy link
Author

id1945 commented Jun 8, 2023

@undecaf Thank you very much. I've also tried and it doesn't work at all. It is still a headache for me.

@id1945 id1945 closed this as completed Jun 8, 2023
@pascalschoeni
Copy link

@undecaf jschardet (https://www.npmjs.com/package/jschardet) may help:

const detectedCharSet = jschardet.detect(Buffer.from(symbol.data))
const text = foundQr.decode(detectedCharSet.encoding)

@undecaf undecaf reopened this Jun 12, 2023
@undecaf
Copy link
Owner

undecaf commented Jun 12, 2023

Thanks @pascalschoeni !
jschardet did help, although indirectly: it showed me that the raw data of one of my QR code test strings ('ÄÖÜ äöü ß ÁÉÍÓÚ áéíóú ÀÈÌÒÙ àéíóú') was not UTF-8 encoded (as it should have been) and therefore was not correctly decoded by the TextDecoder in UTF-8 mode.
I suspect that the online QR encoder that I used for the test image did not handle the test string properly. Encoding that string again here produced an image which zbar-wasm decodes correctly.

@id1945 :
The code snippet from my previous comment actually works not only for both variants of your QR code but also for all other barcodes in the unit tests of zbar-wasm.
Could you please provide me with an image of a QR code that is similar to your original one but does not contain personal data? Before closing this issue, I would like to include such an QR code in the unit tests if possible.
Also, please feel free to delete your original QR code image from this thread.

@id1945
Copy link
Author

id1945 commented Jun 13, 2023

@undecaf 🥇
I will soon delete all the QR code containing my personal data here. I will give you another QR code image with the above error please wait for me. @@

@id1945
Copy link
Author

id1945 commented Jun 13, 2023

@undecaf 🥇 🥇
Luckily for me! 🙏
I found these websites that generate the error QRcode as we are discussed above. Below is the QRcode that I created. You can also create your own in Vietnamese language.

https://qrplanet.com/#text
https://qrcode.tec-it.com/en/Raw

Capture-001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants