fix(playground): multi-byte character issue #647
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
In Rust we use UTF-8 byte offsets for
TextSize
, which will be used inTextRange
and ultimately be passed to the JavaScript side as the numbers indiagnostic.location.span
. However, in JavaScript, we use UTF-16 code unit offsets to access the characters.So when a character is represented by multiple bytes in UTF-8 encoding and is placed before or is part of the code that triggers a diagnostic, the playground will wrongly interpret its byte offset as a code unit offset, thus causing an out of bounds indexing error when it tries to highlight the range or make a selection of the code that triggers the diagnostic.
To fix this problem, we need to convert the UTF-8 byte offset into a code unit offset that can be used directly by JavaScript
String
methods likeslice
. I took the solution here and adapted it to fit our use case.This conversion will inevitably introduce an
O(n)
overhead because the byte length of an UTF-8 encoded character is variable and we have to scan from the beginning to calculate the byte offset of a certain character in the middle. However, I think it's acceptable, because the code in the playground is usually not very long, and for it performance is also not our first consideration. The benefit is that we no longer have to worry about playground crashes triggered by non-English characters, especailly for users that use a second IME.Closes biomejs/biome#1385
Closes biomejs/biome#3250