Use text-encoding npm package instead of integrated text-encoding dep. #50

Uzlopak · 2021-11-30T19:39:38Z

Replace encodings with external package, which provide the same features.

We should first check to get reliable benchmarks. The implementor of the external package claims, that it is a reference implementation and no focus on performance. But maybe it makes sense, to use a reference implementation to avoid some security issues?

Anyway... we should maybe not give up on the idea of caching the utf-8 TextDecoder and using it from node directly and just use the external TextDecoder as fallback.

Checklist

run npm run test and npm run benchmark
tests and/or benchmarks are included
documentation is changed or added
commit message and code follows the Developer's Certification of Origin
and the Code of conduct

Uzlopak · 2021-12-01T02:59:43Z

https://github.com/samthor/fast-text-encoding

Uzlopak · 2021-12-01T11:37:54Z

@kibertoad

What I realized is, that mscdex took the text encoding from https://github.com/inexorabletash/text-encoding and modified it for better node integration. (according to his remarks in the License Header).

The package text-decoding forked from that project and made it to be on spec and wrote more unit tests for it.

According to this benchmarks:
https://github.com/samthor/fast-text-encoding/tree/master/bench
the native TextDecoder is the fastest.

So what I did is:

added caching for the TextDecoder instances
Always instantiate the native TextDecoder for utf-8/utf8 as node always has utf-8 encoding.
If we get a "new" encoding, we try to instantiate a TextDecoder with the specified destEncoding, and if we are not having the encoding installed in the node instance we fall back to the external textdecoder.

So we get rid of the responsibility to keep the encodings up to date or the need to write tests for covering the code.

=============================== Coverage summary ===============================
Statements   : 93.24% ( 800/858 )
Branches     : 86.12% ( 447/519 )
Functions    : 95.38% ( 62/65 )
Lines        : 96.11% ( 718/747 )
================================================================================

uzlopak@uzlopak-Lenovo-Legion-5-17ARH05H:~/Workspace/fastify/busboy$ npm run bench:busboy

> @fastify/[email protected] bench:busboy
> node bench/fastify-busboy-bench.js

7142.86 mb/sec
9090.91 mb/sec
9090.91 mb/sec
9090.91 mb/sec
10000.00 mb/sec
10000.00 mb/sec
10000.00 mb/sec
9090.91 mb/sec
10000.00 mb/sec
10000.00 mb/sec
uzlopak@uzlopak-Lenovo-Legion-5-17ARH05H:~/Workspace/fastify/busboy$ npm run bench:dicer

> @fastify/[email protected] bench:dicer
> node bench/dicer/dicer-bench-multipart-parser.js

9090.91 mb/sec
12500.00 mb/sec
11111.11 mb/sec
11111.11 mb/sec
10000.00 mb/sec
11111.11 mb/sec
12500.00 mb/sec
12500.00 mb/sec
12500.00 mb/sec
11111.11 mb/sec

package.json

Uzlopak · 2021-12-01T11:52:06Z

@kibertoad is the caching in an Object Ok? Or do you have a specific fastify way?

kibertoad · 2021-12-01T11:52:34Z

Will work on benchmarks today, then we can measure :)

kibertoad · 2021-12-01T11:53:02Z

Will also benchmark different ways to cache, see if Object has same performance as Map here

Uzlopak · 2021-12-01T11:55:06Z

I think for benchmarking, we actually need to create a new benchmark, were we send a form with alot of fields (like a registration form).

The current benchmark is missing the point for this case, as it ist just about sending a big chunk of data as file. So there is no real necessity to call decodeText, what is touched in this PR.

kibertoad · 2021-12-01T11:58:58Z

good point. Can you provide some samples of such forms? I could build a benchmark around that

Uzlopak · 2021-12-01T13:03:41Z

@kibertoad added the necessary benchmarks

:)

This PR:

node ./bench/fastify-busboy-form-bench.js 
62.25 mb/sec

node ./bench/fastify-busboy-form-bench-utf8-only.js 
72.89 mb/sec

Current Master:

node ./bench/fastify-busboy-form-bench.js 
22.88 mb/sec

node ./bench/fastify-busboy-form-bench-utf8-only.js 
23.20 mb/sec

Old Busboy:

node ./bench/busboy-form-bench.js 
24.23 mb/sec

node ./bench/busboy-form-bench-utf8-only.js 
25.10 mb/sec

Uzlopak · 2021-12-01T13:24:30Z

Ok, if the content of the part is also encoded specifically we get some interesting results :D

This PR:

node ./bench/fastify-busboy-form-bench-utf8.js 
64.95 mb/sec

node ./bench/fastify-busboy-form-bench-latin1.js 
58.19 mb/sec

master:

node ./bench/fastify-busboy-form-bench-latin1.js 

<--- Last few GCs --->

[27407:0x4deab50]    26983 ms: Mark-sweep (reduce) 4095.4 (4104.8) -> 4095.3 (4105.5) MB, 2090.1 / 0.0 ms  (+ 69.6 ms in 16 steps since start of marking, biggest step 17.3 ms, walltime since start of marking 2163 ms) (average mu = 0.088, current mu = 0.00[27407:0x4deab50]    29134 ms: Mark-sweep (reduce) 4096.3 (4102.5) -> 4096.1 (4104.0) MB, 2080.3 / 0.0 ms  (+ 67.0 ms in 15 steps since start of marking, biggest step 17.6 ms, walltime since start of marking 2151 ms) (average mu = 0.046, current mu = 0.00

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa25510 node::Abort() [node]
 2: 0x9664d3 node::FatalError(char const*, char const*) [node]
 3: 0xb9a8be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb9ac37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd56ca5  [node]
 6: 0xd5782f  [node]
 7: 0xd6566b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xd6922c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xd3790b v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x107fbef v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x1426919  [node]
Aborted (core dumped)

node ./bench/fastify-busboy-form-bench-utf8.js 
23.43 mb/sec

Old Busboy:

node ./bench/busboy-form-bench-latin1.js 

<--- Last few GCs --->
00[26881:0x48ecb40]    29350 ms: Mark-sweep (reduce) 4098.4 (4107.0) -> 4098.2 (4107.8) MB, 2134.1 / 0.0 ms  (+ 74.7 ms in 16 steps since start of marking, biggest step 18.8 ms, walltime since start of marking 2212 ms) (average mu = 0.088, current mu = 0.00[26881:0x48ecb40]    31602 ms: Mark-sweep (reduce) 4099.2 (4104.8) -> 4099.1 (4106.5) MB, 2173.8 / 0.0 ms  (+ 74.5 ms in 16 steps since start of marking, biggest step 21.4 ms, walltime since start of marking 2252 ms) (average mu = 0.045, current mu = 0.00

<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0xa25510 node::Abort() [node]
 2: 0x9664d3 node::FatalError(char const*, char const*) [node]
 3: 0xb9a8be v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb9ac37 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xd56ca5  [node]
 6: 0xd5782f  [node]
 7: 0xd6566b v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 8: 0xd6922c v8::internal::Heap::AllocateRawWithRetryOrFailSlowPath(int, v8::internal::AllocationType, v8::internal::AllocationOrigin, v8::internal::AllocationAlignment) [node]
 9: 0xd3790b v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationType, v8::internal::AllocationOrigin) [node]
10: 0x107fbef v8::internal::Runtime_AllocateInYoungGeneration(int, unsigned long*, v8::internal::Isolate*) [node]
11: 0x1426919  [node]
Aborted (core dumped)

node ./bench/busboy-form-bench-utf8.js 
24.43 mb/sec

Uzlopak · 2021-12-01T13:26:25Z

I tested with node 14.17.6

So this PR makes busboy even more resilient, yeah :)

kibertoad · 2021-12-01T13:40:06Z

Can we also add tests based on same payloads?

Uzlopak · 2021-12-01T13:51:41Z

But these are based on the same payload?

Uzlopak · 2021-12-01T14:15:29Z

@kibertoad
extracted the createMultipartBufferForFormStream

As long as the parameters are the same they generate the same Lorem Ipsum payload.

…ForEncodingBench fix typo in Changelog

kibertoad · 2021-12-01T21:21:28Z

But these are based on the same payload?

I don't mean using same payload for all benchmarks (although extracting it was a nice touch), I was just wondering if we have code coverage for same kind of payloads in our unit tests, because you produced them specifically for benchmarks, and I'm not sure if we had similar ones in unit tests.

Uzlopak · 2021-12-01T21:39:00Z

@kibertoad
It is actually a modification of parse-params.spec "Multiple extended parameters (RFC 5987) with mixed charsets"

Fair enough, We dont have a unit test, which does the same. Actually the current test is kind of crazy, as I create about 100000 fields :D.

I will create a unit test, like that benchmark.

kibertoad · 2021-12-02T23:30:31Z

@Uzlopak I think it's correct now, can you check?

bench/createMultipartBufferForEncodingBench.js

…into replace-encoding

kibertoad · 2021-12-03T09:24:09Z

@Uzlopak After (and if) this lands and I do #60, I'm thinking of starting to prep the 1.0.0 release. Anything else you want in prior to that?

Fdawgs · 2021-12-03T09:32:14Z

@Uzlopak After (and if) this lands and I do #60, I'm thinking of starting to prep the 1.0.0 release. Anything else you want in prior to that?

Is it worth dropping support for node 10 and 12 now before v1.0.0 release, considering Fastify v4 will most likely do the same?

kibertoad · 2021-12-03T09:38:39Z

@Fdawgs What is the advantage of dropping version support early if it doesn't affect our code or dependencies? Can't we just release a semver major later if we ever need to?
multer is going to be another major user of this, and they are 12+, so I think we can drop 10 already if there is a good reason to.

Fdawgs · 2021-12-03T09:41:04Z

@kibertoad Cool beans, good point. Was just to avoid a future major release really.

Uzlopak · 2021-12-03T10:06:01Z

@kibertoad
Are these benchmarks ok?

I reduced the size of the payload, as nodejs could not handle the backpressure and clogged the event loop so OOM was inevitable.

Uzlopak · 2021-12-03T10:07:58Z

@kibertoad
Strange there is still a memory leak.

kibertoad · 2021-12-03T10:12:39Z

I'll take a look in the evening.

kibertoad · 2021-12-03T10:14:51Z

@LinusU Any preferences on Node 10 vs Node 12 baseline? I assume you don't intend to ever raise the Node version requirement for old api Multer, so it can't use fastify-busboy either way, and new api will be Node 12 from the get go, right?

Uzlopak · 2021-12-03T10:28:22Z

@kibertoad
I think the memory leak is a common issue and was introduced before. I never get a busboy finish event.

see
mscdex/busboy#229

Nothing to do with this, i think

Uzlopak · 2021-12-03T11:06:53Z

@kibertoad
I guess, this happens because backpressure is not handled correctly by busboy. This is a whole different can of worms. :/

LinusU · 2021-12-03T11:54:02Z

@LinusU Any preferences on Node 10 vs Node 12 baseline? I assume you don't intend to ever raise the Node version requirement for old api Multer, so it can't use fastify-busboy either way, and new api will be Node 12 from the get go, right?

Multer 2.x currently only supports ^12.20.0 || ^14.13.1 || >=16.0.0 since we have ESM only dependencies, so 12 sounds good 👍

Multer 1.x supports Node.js 0.10 so we won't be able to upgrade there 😅

Uzlopak · 2021-12-04T09:46:51Z

@kibertoad

anything you need from me to approve this PR :)?

kibertoad · 2021-12-04T09:50:59Z

just want to try it out hands-on a bit, will approve after :).
hit my head on corpo party yesterday, so wasn't able to be productive yesterday, and today is a date night, but I hope to wrap up everything on Sunday and hopefully release 1.0.0 already :)

kibertoad

LGTM, but we need to look into the memory leak problem eventually

replace encodings

0281fb4

use external TextDecoder as polyfill

284ef1c

Uzlopak changed the title ~~replace encodings~~ Use text-encoding npm package instead of integrated text-encoding dep. Dec 1, 2021

add information to changelog

454ba29

kibertoad reviewed Dec 1, 2021

View reviewed changes

package.json Outdated Show resolved Hide resolved

move dependencies closer to devDependencies

1510bbe

Uzlopak added 3 commits December 1, 2021 13:37

add benchmark

218801c

add missing \r\n\r\n to each part

0f6ca71

create utf8 and latin1 specific benchmarks

304acce

also encode the content of the parts

7b47fe8

extracted createMultipartBufferForFormBench

f6bc5c3

rename createMultipartBufferForEncodingBench to createMultipartBuffer…

46b2343

…ForEncodingBench fix typo in Changelog

Uzlopak added 3 commits December 1, 2021 23:38

add unit test

bc2bebe

Merge branch 'master' into replace-encoding

b2d6b1c

fix linting issues

636f977

kibertoad reviewed Dec 3, 2021

View reviewed changes

bench/createMultipartBufferForEncodingBench.js Show resolved Hide resolved

Uzlopak added 3 commits December 3, 2021 09:55

Merge branch 'master' into replace-encoding

05cc9a6

Merge branch 'replace-encoding' of https://github.com/Uzlopak/busboy …

6b97219

…into replace-encoding

latin1 is the same as iso8859-1 but is like the unit test i wrote

9235d10

Uzlopak added 3 commits December 3, 2021 10:47

use Map instead of Object,bind streamsearch onInfo directly

7fb9bd4

fix linting

f36b7a3

fix benchmarks

607ec43

revert unnecessary change for this PR

97fadfe

Uzlopak mentioned this pull request Dec 4, 2021

Improve test coverage #25

Closed

2 tasks

Uzlopak and others added 3 commits December 4, 2021 11:34

Merge branch 'fastify:master' into replace-encoding

91756f4

Merge branch 'master' into replace-encoding

a0b1b77

Minor cleanup

5a5b1a4

kibertoad approved these changes Dec 4, 2021

View reviewed changes

Uzlopak merged commit 640f5ab into fastify:master Dec 4, 2021

KhafraDev mentioned this pull request Aug 13, 2023

remove text-decoding #120

Merged

Uzlopak deleted the replace-encoding branch August 13, 2023 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use text-encoding npm package instead of integrated text-encoding dep. #50

Use text-encoding npm package instead of integrated text-encoding dep. #50

Uzlopak commented Nov 30, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 •

edited

Loading

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 •

edited

Loading

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 2, 2021

kibertoad commented Dec 3, 2021

Fdawgs commented Dec 3, 2021

kibertoad commented Dec 3, 2021

Fdawgs commented Dec 3, 2021

Uzlopak commented Dec 3, 2021

Uzlopak commented Dec 3, 2021

kibertoad commented Dec 3, 2021

kibertoad commented Dec 3, 2021

Uzlopak commented Dec 3, 2021 •

edited

Loading

Uzlopak commented Dec 3, 2021

LinusU commented Dec 3, 2021

Uzlopak commented Dec 4, 2021

kibertoad commented Dec 4, 2021

kibertoad left a comment

Use text-encoding npm package instead of integrated text-encoding dep. #50

Use text-encoding npm package instead of integrated text-encoding dep. #50

Conversation

Uzlopak commented Nov 30, 2021 • edited Loading

Checklist

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 • edited Loading

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021 • edited Loading

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 1, 2021

Uzlopak commented Dec 1, 2021

kibertoad commented Dec 2, 2021

kibertoad commented Dec 3, 2021

Fdawgs commented Dec 3, 2021

kibertoad commented Dec 3, 2021

Fdawgs commented Dec 3, 2021

Uzlopak commented Dec 3, 2021

Uzlopak commented Dec 3, 2021

kibertoad commented Dec 3, 2021

kibertoad commented Dec 3, 2021

Uzlopak commented Dec 3, 2021 • edited Loading

Uzlopak commented Dec 3, 2021

LinusU commented Dec 3, 2021

Uzlopak commented Dec 4, 2021

kibertoad commented Dec 4, 2021

kibertoad left a comment

Choose a reason for hiding this comment

Uzlopak commented Nov 30, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 1, 2021 •

edited

Loading

Uzlopak commented Dec 3, 2021 •

edited

Loading