Stream performance enhancements #87

dkocher · 2015-04-08T08:13:33Z

Writing to IRODSFileOutputStream is terribly slow. Even when not optimized and using multiple concurrent connections I would at least expect performance similar to a plain HTTP connection.

The text was updated successfully, but these errors were encountered:

deardooley · 2015-04-18T00:36:06Z

I can confirm that streaming performance across the board is signifcantly slower than with standard HTTP and a fraction of what we get via sftp and gridftp. This seems to remain true, but vary in degree based on the target resources. Even when running everything locally in docker containers there is a noticeable discrepancy between protocols.

michael-conway · 2015-04-18T16:17:22Z

Yes we know and as soon as we can resolve restarts will address. This is
also a core server issue. Streaming is a separate code path from get/put.
We are currently stretched quite thin.
On Apr 17, 2015 8:36 PM, "Rion Dooley" [email protected] wrote:

I can confirm that streaming performance across the board is signifcantly
slower than with standard HTTP and a fraction of what we get via sftp and
gridftp. This seems to remain true, but vary in degree based on the target
resources. Even when running everything locally in docker containers there
is a noticeable discrepancy between protocols.

—
Reply to this email directly or view it on GitHub
#87 (comment).

deardooley · 2015-04-18T18:25:43Z

That would be awesome. We’re bottlenecked by stream performance and will have to drop use of jargon in Agave if we can’t get around this issue. Really would rather not do that.

—
Rion

On Apr 18, 2015, at 11:17 AM, Mike Conway [email protected] wrote:

Yes we know and as soon as we can resolve restarts will address. This is
also a core server issue. Streaming is a separate code path from get/put.
We are currently stretched quite thin.
On Apr 17, 2015 8:36 PM, "Rion Dooley" [email protected] wrote:

I can confirm that streaming performance across the board is signifcantly
slower than with standard HTTP and a fraction of what we get via sftp and
gridftp. This seems to remain true, but vary in degree based on the target
resources. Even when running everything locally in docker containers there
is a noticeable discrepancy between protocols.

—
Reply to this email directly or view it on GitHub
#87 (comment).

—
Reply to this email directly or view it on GitHub #87 (comment).

michael-conway · 2015-04-21T14:47:51Z

Will revisit the tuning/buffer sizes here to see what first-order improvements I can get client-side only, may end up creating a secondary ticket linked to the iRODS core server when we can address this issue holistically with the C code.

deardooley · 2015-04-21T15:26:07Z

If that doesn't yield results, a chunked streaming interface and / or concat support would let us fake better streaming input. FWIW, I ran buffer tests at 4k blocks up to 10mb and got best results at 2mb. That is after adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order improvements I can get client-side only, may end up creating a secondary ticket linked to the iRODS core server when we can address this issue holistically with the C code.

Reply to this email directly or view it on GitHub:
#87 (comment)

michael-conway · 2015-04-21T15:32:27Z

Yes, I think I've extracted about as much as I can get on the client.

A big difference here is that each 'write()' operation to the output stream
is a separate packing instruction, in a 'put' the entire contents of the
file is pushed down the pipe in one go.

This is one critical piece, and making larger buffer sizes can amortize the
cost of the successive calls. We need to instrument and follow this onto
the server in order to get further gains, that's what I suspect. The
streaming is a separate code path from get/put...

I +1 the chunking, as it needs to be in the REST API. If we can get
consortium help on this, I'm willing to do the client side. Essentially we
need microservices to validate and assemble the chunks on completion,
handle cleanup, and probably to trigger postProcForPut after the file is
reassembled.

I suppose we'd need some default rules to quash postProc operations for
chunks while the transfer is in flight?

MC

On Tue, Apr 21, 2015 at 11:26 AM, Rion Dooley [email protected]
wrote:

If that doesn't yield results, a chunked streaming interface and / or
concat support would let us fake better streaming input. FWIW, I ran buffer
tests at 4k blocks up to 10mb and got best results at 2mb. That is after
adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order
improvements I can get client-side only, may end up creating a secondary
ticket linked to the iRODS core server when we can address this issue
holistically with the C code.

Reply to this email directly or view it on GitHub:
#87 (comment)

—
Reply to this email directly or view it on GitHub
#87 (comment).

deardooley · 2015-04-21T16:45:05Z

.. and cleanup, failure rules. And proper namespacing for concurrent writes. And logic for partial reads and writes. And first class support (long overdue) for denoting files in flight via write or replication. And configurable buffer and window size. And restart support per chunk and across chunks...

IMHO, This needs to go into core, NOT into microservices. I would also argue the REST API should roll in as a distinct server component rather as part of jargon so it can natively talk to the server and optimize behavior.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 10:32 AM

Yes, I think I've extracted about as much as I can get on the client.

A big difference here is that each 'write()' operation to the output stream
is a separate packing instruction, in a 'put' the entire contents of the
file is pushed down the pipe in one go.

This is one critical piece, and making larger buffer sizes can amortize the
cost of the successive calls. We need to instrument and follow this onto
the server in order to get further gains, that's what I suspect. The
streaming is a separate code path from get/put...

I +1 the chunking, as it needs to be in the REST API. If we can get
consortium help on this, I'm willing to do the client side. Essentially we
need microservices to validate and assemble the chunks on completion,
handle cleanup, and probably to trigger postProcForPut after the file is
reassembled.

I suppose we'd need some default rules to quash postProc operations for
chunks while the transfer is in flight?

MC

On Tue, Apr 21, 2015 at 11:26 AM, Rion Dooley [email protected]
wrote:

If that doesn't yield results, a chunked streaming interface and / or
concat support would let us fake better streaming input. FWIW, I ran buffer
tests at 4k blocks up to 10mb and got best results at 2mb. That is after
adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order
improvements I can get client-side only, may end up creating a secondary
ticket linked to the iRODS core server when we can address this issue
holistically with the C code.

Reply to this email directly or view it on GitHub:
#87 (comment)

—
Reply to this email directly or view it on GitHub
#87 (comment).

Reply to this email directly or view it on GitHub:
#87 (comment)

dkocher mentioned this issue Apr 8, 2015

Work with Cyberduck on transfer tuning #86

Closed

michael-conway modified the milestone: Maintenance release 4.0.2.2 includes iRODS 4.1 Apr 21, 2015

dkocher mentioned this issue May 7, 2015

Option to obtain MD5 from server after upload is complete #89

Closed

michael-conway modified the milestones: Maintenance release 4.0.2.2 with misc Cyberduck integration, consortium-tls 4.0.2.4, consortium-tls support, Performance enhancements for streams and put/get - 4.0.2.4 May 22, 2015

michael-conway added the A label Jun 1, 2015

michael-conway self-assigned this Jun 1, 2015

michael-conway changed the title ~~Low throughput with IRODSFileOutputStream~~ Stream performance enhancements Jun 1, 2015

michael-conway added a commit that referenced this issue Jun 2, 2015

#87 initial work

0924ace

michael-conway pushed a commit that referenced this issue Jun 2, 2015

#87 more on streams

392a69e

michael-conway added a commit that referenced this issue Jun 5, 2015

#87 testing packing output stream

e1aef39

michael-conway added a commit that referenced this issue Jun 5, 2015

#87 working on input stream buffering

5df2325

michael-conway added a commit that referenced this issue Jun 8, 2015

#87 input stream w/caching

954bc1f

michael-conway pushed a commit that referenced this issue Jun 9, 2015

#87 successful testing input stream

f6979f8

michael-conway pushed a commit that referenced this issue Jun 9, 2015

#87 update comment

f0a81ca

michael-conway closed this as completed Jun 9, 2015

michael-conway added a commit that referenced this issue Jun 10, 2015

#87 streaming for webdav

95918ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream performance enhancements #87

Stream performance enhancements #87

dkocher commented Apr 8, 2015

deardooley commented Apr 18, 2015

michael-conway commented Apr 18, 2015

deardooley commented Apr 18, 2015

michael-conway commented Apr 21, 2015

deardooley commented Apr 21, 2015

michael-conway commented Apr 21, 2015

deardooley commented Apr 21, 2015

Stream performance enhancements #87

Stream performance enhancements #87

Comments

dkocher commented Apr 8, 2015

deardooley commented Apr 18, 2015

michael-conway commented Apr 18, 2015

deardooley commented Apr 18, 2015

michael-conway commented Apr 21, 2015

deardooley commented Apr 21, 2015

michael-conway commented Apr 21, 2015

deardooley commented Apr 21, 2015