Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream performance enhancements #87

Closed
dkocher opened this issue Apr 8, 2015 · 7 comments
Closed

Stream performance enhancements #87

dkocher opened this issue Apr 8, 2015 · 7 comments
Assignees

Comments

@dkocher
Copy link
Contributor

dkocher commented Apr 8, 2015

Writing to IRODSFileOutputStream is terribly slow. Even when not optimized and using multiple concurrent connections I would at least expect performance similar to a plain HTTP connection.

@deardooley
Copy link

I can confirm that streaming performance across the board is signifcantly slower than with standard HTTP and a fraction of what we get via sftp and gridftp. This seems to remain true, but vary in degree based on the target resources. Even when running everything locally in docker containers there is a noticeable discrepancy between protocols.

@michael-conway
Copy link
Collaborator

Yes we know and as soon as we can resolve restarts will address. This is
also a core server issue. Streaming is a separate code path from get/put.
We are currently stretched quite thin.
On Apr 17, 2015 8:36 PM, "Rion Dooley" [email protected] wrote:

I can confirm that streaming performance across the board is signifcantly
slower than with standard HTTP and a fraction of what we get via sftp and
gridftp. This seems to remain true, but vary in degree based on the target
resources. Even when running everything locally in docker containers there
is a noticeable discrepancy between protocols.


Reply to this email directly or view it on GitHub
#87 (comment).

@deardooley
Copy link

That would be awesome. We’re bottlenecked by stream performance and will have to drop use of jargon in Agave if we can’t get around this issue. Really would rather not do that.


Rion

On Apr 18, 2015, at 11:17 AM, Mike Conway [email protected] wrote:

Yes we know and as soon as we can resolve restarts will address. This is
also a core server issue. Streaming is a separate code path from get/put.
We are currently stretched quite thin.
On Apr 17, 2015 8:36 PM, "Rion Dooley" [email protected] wrote:

I can confirm that streaming performance across the board is signifcantly
slower than with standard HTTP and a fraction of what we get via sftp and
gridftp. This seems to remain true, but vary in degree based on the target
resources. Even when running everything locally in docker containers there
is a noticeable discrepancy between protocols.


Reply to this email directly or view it on GitHub
#87 (comment).


Reply to this email directly or view it on GitHub #87 (comment).

@michael-conway michael-conway modified the milestone: Maintenance release 4.0.2.2 includes iRODS 4.1 Apr 21, 2015
@michael-conway
Copy link
Collaborator

Will revisit the tuning/buffer sizes here to see what first-order improvements I can get client-side only, may end up creating a secondary ticket linked to the iRODS core server when we can address this issue holistically with the C code.

@deardooley
Copy link

If that doesn't yield results, a chunked streaming interface and / or concat support would let us fake better streaming input. FWIW, I ran buffer tests at 4k blocks up to 10mb and got best results at 2mb. That is after adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order improvements I can get client-side only, may end up creating a secondary ticket linked to the iRODS core server when we can address this issue holistically with the C code.


Reply to this email directly or view it on GitHub:
#87 (comment)

@michael-conway
Copy link
Collaborator

Yes, I think I've extracted about as much as I can get on the client.

A big difference here is that each 'write()' operation to the output stream
is a separate packing instruction, in a 'put' the entire contents of the
file is pushed down the pipe in one go.

This is one critical piece, and making larger buffer sizes can amortize the
cost of the successive calls. We need to instrument and follow this onto
the server in order to get further gains, that's what I suspect. The
streaming is a separate code path from get/put...

I +1 the chunking, as it needs to be in the REST API. If we can get
consortium help on this, I'm willing to do the client side. Essentially we
need microservices to validate and assemble the chunks on completion,
handle cleanup, and probably to trigger postProcForPut after the file is
reassembled.

I suppose we'd need some default rules to quash postProc operations for
chunks while the transfer is in flight?

MC

On Tue, Apr 21, 2015 at 11:26 AM, Rion Dooley [email protected]
wrote:

If that doesn't yield results, a chunked streaming interface and / or
concat support would let us fake better streaming input. FWIW, I ran buffer
tests at 4k blocks up to 10mb and got best results at 2mb. That is after
adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order
improvements I can get client-side only, may end up creating a secondary
ticket linked to the iRODS core server when we can address this issue
holistically with the C code.


Reply to this email directly or view it on GitHub:
#87 (comment)


Reply to this email directly or view it on GitHub
#87 (comment).

@deardooley
Copy link

.. and cleanup, failure rules. And proper namespacing for concurrent writes. And logic for partial reads and writes. And first class support (long overdue) for denoting files in flight via write or replication. And configurable buffer and window size. And restart support per chunk and across chunks...

IMHO, This needs to go into core, NOT into microservices. I would also argue the REST API should roll in as a distinct server component rather as part of jargon so it can natively talk to the server and optimize behavior.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 10:32 AM

Yes, I think I've extracted about as much as I can get on the client.

A big difference here is that each 'write()' operation to the output stream
is a separate packing instruction, in a 'put' the entire contents of the
file is pushed down the pipe in one go.

This is one critical piece, and making larger buffer sizes can amortize the
cost of the successive calls. We need to instrument and follow this onto
the server in order to get further gains, that's what I suspect. The
streaming is a separate code path from get/put...

I +1 the chunking, as it needs to be in the REST API. If we can get
consortium help on this, I'm willing to do the client side. Essentially we
need microservices to validate and assemble the chunks on completion,
handle cleanup, and probably to trigger postProcForPut after the file is
reassembled.

I suppose we'd need some default rules to quash postProc operations for
chunks while the transfer is in flight?

MC

On Tue, Apr 21, 2015 at 11:26 AM, Rion Dooley [email protected]
wrote:

If that doesn't yield results, a chunked streaming interface and / or
concat support would let us fake better streaming input. FWIW, I ran buffer
tests at 4k blocks up to 10mb and got best results at 2mb. That is after
adjusting my local tcp window size.

Rion

----- Reply message -----
From: "Mike Conway" [email protected]
To: "DICE-UNC/jargon" [email protected]
Cc: "Rion Dooley" [email protected]
Subject: [jargon] Low throughput with IRODSFileOutputStream (#87)
Date: Tue, Apr 21, 2015 9:47 AM

Will revisit the tuning/buffer sizes here to see what first-order
improvements I can get client-side only, may end up creating a secondary
ticket linked to the iRODS core server when we can address this issue
holistically with the C code.


Reply to this email directly or view it on GitHub:
#87 (comment)


Reply to this email directly or view it on GitHub
#87 (comment).


Reply to this email directly or view it on GitHub:
#87 (comment)

@michael-conway michael-conway modified the milestones: Maintenance release 4.0.2.2 with misc Cyberduck integration, consortium-tls 4.0.2.4, consortium-tls support, Performance enhancements for streams and put/get - 4.0.2.4 May 22, 2015
@michael-conway michael-conway self-assigned this Jun 1, 2015
@michael-conway michael-conway changed the title Low throughput with IRODSFileOutputStream Stream performance enhancements Jun 1, 2015
michael-conway added a commit that referenced this issue Jun 2, 2015
michael-conway pushed a commit that referenced this issue Jun 2, 2015
michael-conway added a commit that referenced this issue Jun 5, 2015
michael-conway added a commit that referenced this issue Jun 8, 2015
michael-conway pushed a commit that referenced this issue Jun 9, 2015
michael-conway pushed a commit that referenced this issue Jun 9, 2015
michael-conway added a commit that referenced this issue Jun 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants