Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

evmnfts, evmtokens: utf-8-clean strings from chain #568

Merged
merged 2 commits into from
Nov 17, 2023

Conversation

pro-wh
Copy link
Collaborator

@pro-wh pro-wh commented Nov 16, 2023

values that come from chain: use our SanitizeString, as they may contain postgresql-displeasing nul characters

edit: as well as values that come from servers 🙄

@pro-wh pro-wh added bug Something isn't working analysis-layer Analysis layer-related issues. labels Nov 16, 2023
Copy link
Contributor

@mitjat mitjat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Worth mentioning why we're implementing this PR:
We found invalid metadata JSON in the wild (in particular, it was invalid UTF-8 in that it contained \x00, but the Go JSON parser didn't balk at that, but Postgres does).
Hope I got that right :)

nftData.Name,
nftData.Description,
nftData.Image,
// If SanitizeString can affect JSON validity, we're doomed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(
It does seem low risk.
We could have another wrapper, ensureValidJSON(), that returns the input as-is if it's valid JSON, otherwise logs a warning and returns {} or null. Your call on whether that's warranted. I think you already made the call with this comment :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok turns out the round trip through json unmarshal+marshal is enough to filter out bad UTF-8 and NUL bytes. updated the comment. I'll keep the SanitizeStringP here out of principle

@pro-wh
Copy link
Collaborator Author

pro-wh commented Nov 17, 2023

e2e regression flaked, http 500 from ipfs gateway

@pro-wh
Copy link
Collaborator Author

pro-wh commented Nov 17, 2023

oh clarification about @mitjat 's explanation: \x00 NUL bytes are valid UTF-8, but they're not allowed in PostgreSQL due to a technical limitation.

incidentally, they're also not allowed in JSON, as strings aren't allowed to contain control characters (\uxxxx escapes of them are allowed), and the grammar doesn't allow them anywhere else either.

@pro-wh pro-wh merged commit 9e621ca into main Nov 17, 2023
6 checks passed
@pro-wh pro-wh deleted the pro-wh/bugfix/chainutf8 branch November 17, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis-layer Analysis layer-related issues. bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants