Reuse a buffer in ACDSI.InputAdapter #561

JacekLach · 2021-09-02T14:15:51Z

Before this PR

The allocation in .read was 30% of all allocations in internal product
during a period where we were allocating and GCing ~20GB of byte[]s
per minute.

After this PR

==COMMIT_MSG==
Reduce byte[] allocations during reads via ApacheCtrDecryptingSeekableInput
==COMMIT_MSG==

Possible downsides?

changelog-app · 2021-09-02T14:15:54Z

Generate changelog in `changelog/@unreleased`

Type

Description

Reduce byte[] allocations during reads via ApacheCtrDecryptingSeekableInput

Check the box to generate changelog(s)

Generate changelog entry

JacekLach · 2021-09-02T14:16:24Z

@ellisjoe I can't assign reviewers, would appreciate a look

JacekLach · 2021-09-02T14:17:19Z

jfr screenshot for this being a significant issue

The allocation in .read was 30% of all allocations in codex-foundry during a period where we were allocating and GCing ~20GB of byte[]s per minute.

JacekLach · 2021-09-03T10:40:08Z

I considered checking if buffer.hasArray() and if so writing directly to the array - however on this code path the byte buffer is always allocated via ByteBuffer.allocateDirect(this.bufferSize + cipher.getBlockSize());

https://github.com/apache/commons-crypto/blob/baa0d8fd73ee4756f1ae397afbdce8db0a9a2580/src/main/java/org/apache/commons/crypto/stream/CryptoInputStream.java#L203-L207

therefore the buffer will never have an array we can write to, and we have to pay for this copy

carterkozak · 2021-09-08T14:11:52Z

crypto-core/src/main/java/com/palantir/crypto2/io/ApacheCtrDecryptingSeekableInput.java

+            readBuffer = new byte[newLength(size, required - size, size << 1)];
+        }
+
+        // copied from jdk.internal.util.ArraysSupport


Isn't that GPL?

bleh, yes it is. I'll rewrite & simplify a bit

carterkozak · 2021-09-08T14:14:06Z

crypto-core/src/main/java/com/palantir/crypto2/io/ApacheCtrDecryptingSeekableInput.java

-            byte[] bytes = new byte[dst.remaining()];
-            int read = input.read(bytes, 0, bytes.length);
+            if (readBuffer.length < dst.remaining()) {
+                resize(dst.remaining());


Why is resizing necessary? Can't we read from the source in a loop using the buffer we already have?

in practice the buffer should always fit, since we read into a byte buffer created with same size in

hadoop-crypto/crypto-core/src/main/java/com/palantir/crypto2/io/ApacheCtrDecryptingSeekableInput.java

Line 45 in e42e049

super(new InputAdapter(input), Utils.getCipherInstance(ALGORITHM, PROPS), BUFFER_SIZE,

so this covers the case that something changes in the calling code to use a larger buffer. looping intuitively feels more expensive, more os calls vs single allocation that gets reused from then on (assuming buffer size stays constant)

looping intuitively feels more expensive

Beyond 8kB (or in some tests 16kB) larger buffers don't tend to reduce overhead for network or local disk operations in my experience. I'd bias toward the loop unless there's a benchmark that proves larger values can work better (and in that case I'd update the buffer size while keeping the loop as a fallback).

Sure, I guess. The code gets more complicated but can do that

carterkozak

lgtm

carterkozak · 2021-09-08T16:18:11Z

crypto-core/src/main/java/com/palantir/crypto2/io/ApacheCtrDecryptingSeekableInput.java

        private SeekableInput input;
+        private byte[] readBuffer = new byte[BUFFER_SIZE];


final, no need to null this out after close imo -- assuming we're not holding a reference to the closed stream for a long time.

Suggested change

private byte[] readBuffer = new byte[BUFFER_SIZE];

private final byte[] readBuffer = new byte[BUFFER_SIZE];

yep, makes sense if we know the buffer is constant size

JacekLach · 2021-09-08T16:36:50Z

Can someone click the 'generate changelog' button, it's not available for me :P

JacekLach · 2021-09-08T17:30:04Z

Hm, it timed out on the second push too, but I really don't see how that could be caused by this change. I'll try another run tomorrow, I guess, hoping that whatever's wrong atm will pass

ellisjoe · 2021-09-08T21:12:22Z

crypto-core/src/main/java/com/palantir/crypto2/io/ApacheCtrDecryptingSeekableInput.java

+                int read = input.read(readBuffer, 0, chunk);
+
+                if (read == -1) {
+                    return totalRead;


I think this needs to be:

if (read == -1) { if (totalRead == 0) { totalRead = -1; } }

there's a reference implementation here: https://github.com/apache/commons-crypto/blob/master/src/main/java/org/apache/commons/crypto/stream/input/StreamInput.java#L58-L75

I think this logic ends up being complex enough to warrant a few tests as well. When I was playing with this myself I pulled the InputAdapter up. I'm still trying to find some common util that we can use here but haven't found anything yet.

I was under the impression that DecryptionTests would cover this - but in fact am struggling to have those run locally (tests in intellij die with 'Process finished with exit code 134 (interrupted by signal 6: SIGABRT)') - is there a trick to it?

JacekLach · 2021-09-09T10:48:18Z

it does look like the circleci hang was related to that bug - unclear how but clearly I don't understand the mechanics of these tests

…eInput (#561)

…eInput (#561) (#576) Co-authored-by: Jacek Lach <[email protected]>

Reuse a buffer in ACDSI.InputAdapter

64b9924

The allocation in .read was 30% of all allocations in codex-foundry during a period where we were allocating and GCing ~20GB of byte[]s per minute.

JacekLach force-pushed the jl/reuse-buffer branch from bce214b to 64b9924 Compare September 2, 2021 14:21

JacekLach added 2 commits September 2, 2021 15:55

Read the correct amount of data

94ddd9e

Release buffer on close

e42e049

carterkozak reviewed Sep 8, 2021

View reviewed changes

JacekLach added 2 commits September 8, 2021 15:27

Don't reuse ArraysSupport code

702b6dd

Loop instead of resizing

700d81a

carterkozak approved these changes Sep 8, 2021

View reviewed changes

carterkozak reviewed Sep 8, 2021

View reviewed changes

Make buffer and input final in InputAdapter

b51a8ba

ellisjoe mentioned this pull request Sep 8, 2021

Use more efficient StreamInput implementation #562

Closed

ellisjoe reviewed Sep 8, 2021

View reviewed changes

Correctly report empty reads in InputAdapter

d1e832e

ellisjoe approved these changes Sep 9, 2021

View reviewed changes

ellisjoe merged commit 8d3c320 into palantir:develop Sep 9, 2021

dtobin pushed a commit that referenced this pull request Oct 7, 2021

Reduce byte[] allocations during reads via ApacheCtrDecryptingSeekabl…

1407795

…eInput (#561)

dtobin pushed a commit that referenced this pull request Oct 7, 2021

Reduce byte[] allocations during reads via ApacheCtrDecryptingSeekabl…

9b2b7ac

…eInput (#561)

dtobin mentioned this pull request Oct 7, 2021

Backport buffer re-use (#561) to 2.x #576

Merged

ellisjoe pushed a commit that referenced this pull request Oct 20, 2021

Reduce byte[] allocations during reads via ApacheCtrDecryptingSeekabl…

3fbd456

…eInput (#561) (#576) Co-authored-by: Jacek Lach <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse a buffer in ACDSI.InputAdapter #561

Reuse a buffer in ACDSI.InputAdapter #561

JacekLach commented Sep 2, 2021 •

edited

Loading

changelog-app bot commented Sep 2, 2021 •

edited by ellisjoe

Loading

JacekLach commented Sep 2, 2021

JacekLach commented Sep 2, 2021

JacekLach commented Sep 3, 2021

carterkozak Sep 8, 2021 •

edited

Loading

JacekLach Sep 8, 2021

carterkozak Sep 8, 2021

JacekLach Sep 8, 2021

carterkozak Sep 8, 2021

JacekLach Sep 8, 2021

carterkozak left a comment

carterkozak Sep 8, 2021 •

edited

Loading

JacekLach Sep 8, 2021

JacekLach commented Sep 8, 2021

JacekLach commented Sep 8, 2021

ellisjoe Sep 8, 2021

JacekLach Sep 9, 2021

JacekLach Sep 9, 2021

JacekLach commented Sep 9, 2021

		private SeekableInput input;
		private byte[] readBuffer = new byte[BUFFER_SIZE];

	private byte[] readBuffer = new byte[BUFFER_SIZE];
	private final byte[] readBuffer = new byte[BUFFER_SIZE];

Reuse a buffer in ACDSI.InputAdapter #561

Reuse a buffer in ACDSI.InputAdapter #561

Conversation

JacekLach commented Sep 2, 2021 • edited Loading

Before this PR

After this PR

Possible downsides?

changelog-app bot commented Sep 2, 2021 • edited by ellisjoe Loading

Generate changelog in changelog/@unreleased

JacekLach commented Sep 2, 2021

JacekLach commented Sep 2, 2021

JacekLach commented Sep 3, 2021

carterkozak Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carterkozak left a comment

Choose a reason for hiding this comment

carterkozak Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacekLach commented Sep 8, 2021

JacekLach commented Sep 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JacekLach commented Sep 9, 2021

JacekLach commented Sep 2, 2021 •

edited

Loading

changelog-app bot commented Sep 2, 2021 •

edited by ellisjoe

Loading

Generate changelog in `changelog/@unreleased`

carterkozak Sep 8, 2021 •

edited

Loading

carterkozak Sep 8, 2021 •

edited

Loading