Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix maxBuffer bug with TextDecoder() #105

Merged
merged 1 commit into from
Aug 16, 2023
Merged

Fix maxBuffer bug with TextDecoder() #105

merged 1 commit into from
Aug 16, 2023

Conversation

ehmicky
Copy link
Collaborator

@ehmicky ehmicky commented Aug 16, 2023

If the stream contains a UTF-8 string with a partial multibyte sequence at the end, a final character \ufffd () is appended. This is the standard behavior by TextDecoder and other tools.

At the moment, this final byte is not checked by the maxBuffer logic. I.e. the returned string could be one byte over maxBuffer when appending .

This PR fixes this bug. This required some refactoring.

}

return finalize(contents, length, textDecoder);
appendFinalChunk({state, convertChunk, getSize, addChunk, getFinalChunk, maxBuffer});
Copy link
Collaborator Author

@ehmicky ehmicky Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final chunk () must use the same logic as the previous chunks, including the maxBuffer logic.
To do this required two changes:

  • Adding a getFinalChunk() method that returns either undefined (no final chunk) or the final chunk, with the right type
  • Extracting the appendChunk() logic so that it can be used after getFinalChunk(). To do this, the inside of the main for loop had to be extracted to its own function appendChunk(). To do this, stateful arguments had to be put into a state object. Changing this required refactoring some additional functions.

@@ -112,6 +112,11 @@ test('get stream with truncated UTF-8 sequences', async t => {
t.is(result, `${multiByteString.slice(0, -1)}${INVALID_UTF8_MARKER}`);
});

test('handles truncated UTF-8 sequences over maxBuffer', async t => {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test succeeds with the current PR, and fails without it.


export async function getStreamAsArrayBuffer(stream, options) {
return getStreamContents(stream, arrayBufferMethods, options);
}

const initArrayBuffer = () => new Uint8Array(0);
const initArrayBuffer = () => ({contents: new Uint8Array(0)});
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file and the following ones are related to refactoring stateful variables into a state object. See comment below.

@@ -77,5 +77,6 @@ const arrayBufferMethods = {
},
getSize: getLengthProp,
addChunk: addArrayBufferChunk,
getFinalChunk: noop,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getFinalChunk() is only used by getString(). It is a noop otherwise.

@sindresorhus sindresorhus merged commit 7e191bb into sindresorhus:main Aug 16, 2023
@sindresorhus
Copy link
Owner

Feel free to do another release when you are ready. You have publish access on npm.

@ehmicky ehmicky deleted the max-buffer-string branch August 16, 2023 23:16
@ehmicky
Copy link
Collaborator Author

ehmicky commented Aug 17, 2023

Sounds good! Will do after #106.

@ehmicky
Copy link
Collaborator Author

ehmicky commented Aug 17, 2023

Done in 8.0.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants