You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After some discussion with @interalfx (thanks a bunch), the upload code is using node streams. Node streams info via @buskila:
Using .pipe() has other benefits too, like handling backpressure automatically so that
node won't buffer chunks into memory needlessly when the remote client
is on a really slow or high-latency connection.
Currently, @internalfx runs 10 network requests in flight at any given time. In a scenario where there is infinite network latency, node won't write to the ReGrid API until at least 1 network request completes.
Cool. I think we could maybe do the same with 10 async tasks laying down bytes over a connection pool then as they complete, then come back read more bytes as network requests complete.
Other Research Findings
RethinkDB Limitations
Query size (419554663) greater than maximum (134217727).
So batch size can't be too big, Max query size is ~130MB something. So only ~130MB per batch max.
The text was updated successfully, but these errors were encountered:
Seems like Node ReGrid can get 3x more writes than .NET; yielding faster upload wall time. See image below (credits @buskila):
Test setup
Upload only:
File Size: 1 GB.
Server: RethinkDB / Linux / Ubuntu 14, 3 nodes
Client: .NET Core / Linux
Chunk Size: Default
Batch Size: Default 8 -> 32
They tried single connection and connection pooling. No difference.
Using Stream IO:
Suspicion
Too much chunk calculation in stream upload code. Try to parallelize / simplify some of this, especially when given
byte[]
.Node's ReGrid upload code is here:
https://github.com/internalfx/regrid/blob/master/lib/upload.js
Other notes
This should come after #77 is done.
After some discussion with @interalfx (thanks a bunch), the upload code is using node streams. Node streams info via @buskila:
https://github.com/substack/stream-handbook
Currently, @internalfx runs 10 network requests in flight at any given time. In a scenario where there is infinite network latency, node won't write to the ReGrid API until at least 1 network request completes.
Cool. I think we could maybe do the same with 10 async tasks laying down bytes over a connection pool then as they complete, then come back read more bytes as network requests complete.
Other Research Findings
RethinkDB Limitations
Query size (419554663) greater than maximum (134217727).
So batch size can't be too big, Max query size is ~130MB something. So only ~130MB per batch max.
The text was updated successfully, but these errors were encountered: