-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate solutions for handling encoding errors during file upload and parsing #2854
Comments
this issue came up again for tribal file submitted this week. an audit of rejected files needs to be conducted. suggested criteria for the audit:
|
May want to explore accepting UTF-8 with BOM, potentially bypass the issue |
@ADPennington There are two paths I see this going. First, if we detect that the file is not plain old UTF-8 we provide the user an error in the same way we let them know about file extension errors and force them to correct it. We could also provide a KC link on how to change file encoding. Second, if we detect that the file is not UTF-8 we could give the user a warning that the file is not encoded with UTF-8 and we will be encoding it as UTF-8 before submitting. If we go this route, it would make sense to install another frontend dependency like jschardet. This would help us also cover the case where the file is not encoded as UTF-8 and could not be safely converted to UTF-8 without data loss or corruption. Thus, we would inform the user they have some work to do to fix their file. Both of these options can be done strictly in the frontend and would not require the backend. Let me know what you think. cc. @reitermb |
If feasible, I like the 2nd option. In most cases I've come across, the data submitter does not know anything about encoding or how to fix it. So warning about this, fixing it if possible, and informing when it can't be fixed would be great. |
Background
An aggregate file was rejected in TDP due to an encoding issue, but the error messages returned were misleading, indicating a different cause:
![encodingerror](https://private-user-images.githubusercontent.com/63075587/305574316-7d1c254c-34d5-4728-848c-9bf1a134d5a5.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MDgxMzksIm5iZiI6MTczODkwNzgzOSwicGF0aCI6Ii82MzA3NTU4Ny8zMDU1NzQzMTYtN2QxYzI1NGMtMzRkNS00NzI4LTg0OGMtOWJmMWExMzRkNWE1LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA1NTcxOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTgzMGEzYTllZTc0ODgxOWMwNDIxYWVjMDc4MzFkMzJmZDJmMjAxZTBiZmJiMWQ3ODNhYTlmNmVlZjZkNTdlM2YmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.i7f_xGY0cYXtqGEUm4_SQplc0WxCl9L_dZ00xB2kvDs)
After further investigation, it was discovered that the file failed because of its encoding format. When the file was re-encoded to UTF-8, the file was processed successfully:
![encoding](https://private-user-images.githubusercontent.com/63075587/305574448-9cb56b96-3d5d-4f71-a758-c89a7a8a1c71.jpg?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzg5MDgxMzksIm5iZiI6MTczODkwNzgzOSwicGF0aCI6Ii82MzA3NTU4Ny8zMDU1NzQ0NDgtOWNiNTZiOTYtM2Q1ZC00ZjcxLWE3NTgtYzg5YTdhOGExYzcxLmpwZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMDclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjA3VDA1NTcxOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTg2NWMzMDNlZDRhZGZlMWE0ZTUwMTc1OTNkNjk0ODk2NjU5MThjNzJmZWE4Zjk0ZjNhNzUxODhlZDc5MmE3ZWEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.eFtUhq1HL-m7E1IJgoskCdEfMjyhskQsMPrlwWKhL0A)
This issue raises questions about how to catch and handle encoding problems early in the process, particularly at the file upload stage, and whether more helpful error messages can be provided. Additionally, there is the possibility that accepting UTF-8 with BOM could bypass the issue, which needs to be explored further.
The purpose of this spike is to explore potential solutions to better handle encoding errors, whether at the file upload stage, pre-parsing stage, or through on-the-fly encoding adjustments.
Tasks
FileUpload.jsx
) and if more meaningful error messages can be provided to the user.Acceptance Criteria
The text was updated successfully, but these errors were encountered: