[SPARK-24801][CORE] Avoid memory waste by empty byte[] arrays in SaslEncryption$EncryptedMessage #21811

countmdm · 2018-07-18T23:40:15Z

What changes were proposed in this pull request?

Initialize SaslEncryption$EncryptedMessage.byteChannel lazily,
so that empty, not yet used instances of ByteArrayWritableChannel
referenced by this field don't use up memory.

I analyzed a heap dump from Yarn Node Manager where this code is used, and found that there are over 40,000 of the above objects in memory, each with a big empty byte[] array. The reason they are all there is because of Netty queued up a large number of messages in memory before transferTo() is called. There is a small number of netty ChannelOutboundBuffer objects, and then collectively , via linked lists starting from their flushedEntry data fields, they end up referencing over 40K ChannelOutboundBuffer$Entry objects, which ultimately reference EncryptedMessage objects.

How was this patch tested?

Ran all the tests locally.

…Encryption$EncryptedMessage Initialize SaslEncryption$EncryptedMessage.byteChannel lazily, so that empty, not yet used instances of ByteArrayWritableChannel referenced by this field don't use up memory. Ran all the tests locally.

countmdm · 2018-07-18T23:45:15Z

Yes.

…

On Wed, Jul 18, 2018 at 4:43 PM, UCB AMPLab ***@***.***> wrote: Can one of the admins verify this patch? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#21811 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGGQ8xvaTA14VChQYRh-o2GhrpnUQdJ_ks5uH8gwgaJpZM4VVh59> .

squito · 2018-07-19T19:29:59Z

Ok to test

squito · 2018-07-19T19:32:09Z

can you add something in the PR description about how this is important because sometimes many of these messages queue up in netty's ChannelOutboundBuffer before transferTo() is called? its discussed in the jira, but good to have here too

squito · 2018-07-19T19:38:18Z

@zsxwing @jerryshao @Victsm you might be interested in this

felixcheung

I like it, but this will still create the byte channel right? is there a way to reuse it?

kiszk · 2018-07-22T18:43:53Z

Does it make sense to release byteChannel at deallocate()?

if (byteChannel != null) {
  byteChannel = null;
}

squito · 2018-07-23T14:57:14Z

I like it, but this will still create the byte channel right? is there a way to reuse it?

we could create a pool, though management becomes a bit more complex. would you ever shrink the pool, or would it always stay the same size? I guess it shouldn't grow more than 1 byte array per io thread.

I'd rather get this simple fix in first before doing anything more complicated ...

squito · 2018-07-23T15:10:59Z

Does it make sense to release byteChannel at deallocate()?

you could, just to let GC kick in a bit earlier, but I don't think its going to make a big difference. (Netty's ByteBufs must be since they may be owned by netty's internal pools and never gc'ed)

SparkQA · 2018-07-23T19:17:26Z

Test build #4221 has finished for PR 21811 at commit 8b46534.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

kiszk · 2018-07-23T19:45:52Z

I see. SGTM.
Would it be possible to attach heap profiling (allocated size for each type) before and after this PR to record the difference?

countmdm · 2018-07-23T19:56:52Z

Thank you very much for your responses, @squito. I agree with all you said.

@kiszk the heap dump that prompted me to make this change was obtained from a customer, who probably ran into some unusual situation. So, not being a spark expert, I am not sure how to reproduce this problem. Of course, normally I am all for measuring the app's memory before and after the change to make sure it worked.

kiszk · 2018-07-23T20:11:52Z

@countmdm I see. We may be interested in ratio regarding byte[] / all allocated memory before and after. Not interested in other objects (e.g. an object including customer's name).

countmdm · 2018-07-23T21:17:59Z

@kiszk the situation "before" is well understood. In the respective SPARK-24801 ticket I present a fragment from the analysis of this heap dump by jxray (www.jxray.com). It shows that ~2.5GB of memory, or 64% of the used heap size, is wasted by ~40.5 thousand emtpty byte[] arrays in question:

2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%)
↖org.apache.spark.network.util.ByteArrayWritableChannel.data
↖org.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel
↖io.netty.channel.ChannelOutboundBuffer$Entry.msg
...

However, we don't, and probably cannot, get the real "after" evidence. That's because, as I said, I don't know how to reproduce the situation in house. And I think it's very unlikely that the customer can easily reproduce it either, let alone accept our patched code and collect the necessary data before and after the fix. However, I believe this fix is simple and obvious enough, and thus we can be pretty sure that with it, in the above situation there would simply be no problematic byte[] arrays anymore, and memory consumption will be 64% smaller.

kiszk · 2018-07-24T01:04:21Z

@countmdm Sorry for overlooking the JIRA description. I got the situation.
While the memory pool could be, it is too complex.

LGTM

felixcheung · 2018-07-24T15:41:21Z

SGTM

…

________________________________

squito · 2018-07-24T17:22:11Z

lgtm

will wait a bit for any more comments before merging

dongjoon-hyun · 2018-07-26T20:16:22Z

+1, LGTM, too.

squito · 2018-07-27T03:15:45Z

merged to master, thanks!

…edMessage by using a shared byteRawChannel ### What changes were proposed in this pull request? This patch aims to reduce the memory overhead of `TransportCipher$EncryptedMessage`. In the current code, the `EncryptedMessage` constructor eagerly initializes a `ByteArrayWritableChannel byteRawChannel` (which consumes ~32kb of memory). If there are many `EncryptedMessage` instances on the heap (e.g. because there is a long queue of outgoing messages on a channel) then this overhead adds up and can cause OOMs or GC problems. SPARK-24801 / #21811 fixed a similar issue in `SaslEncryption`. There, the fix was to lazily initialize the buffer: the buffer isn't actually accessed before `transferTo()` is called (and is only used there), so lazily initializing it there reduces memory requirements for queued outgoing messages. In principle we could apply a similar lazy initialization fix here. In this PR, however, I have taken a different approach: I construct a single shared `ByteArrayWritableChannel byteRawChannel` in `TransportChannel$EncryptionHandler` and pass that shared instance to the `EncryptedMessage` constructor. I believe that this is safe because we are already doing this for the `byteEncChannel` channel buffer. That shared `byteEncChannel` gets `reset()` when `EncryptedMessage.deallocate()` is called. If we assume that existing sharing is correct then I think it's okay to apply similar sharing of the `byteRawChannel` buffer because its scope of use and lifecycle is similar. ### Why are the changes needed? Improve performance and reduce a source of OOMs when encryption is enabled. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? **Correctness**: Existing unit tests. **PerformanceI**: observed memory usage and performance improvements by running an artificial workload that significantly stresses the shuffle sending path. On a two-host Spark Standalone cluster where each host had an external shuffle service (with 1gb heap) and a 64-core executor, I ran the following code: ```scala val numMapTasks = 25000 val numReduceTasks = 256 val random = new java.util.Random() val inputData = spark.range(1, numMapTasks * numReduceTasks, 1, numMapTasks).map { x => val bytes = new Array[Byte](10 * 1024) random.nextBytes(bytes) bytes } inputData.repartition(numReduceTasks).write.mode("overwrite").format("noop").save() ``` Prior to this patch, this job reliably failed because the Worker (where the shuffle service runs) would fill its heap and go into long GC pauses, eventually causing it to become disassociated from the Master. After this patch's changes, this job smoothly runs to completion. Closes #37110 from JoshRosen/TransportCipher-memory-usage-improvements. Authored-by: Josh Rosen <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

felixcheung reviewed Jul 22, 2018

View reviewed changes

asfgit closed this in 094aa59 Jul 27, 2018

JoshRosen mentioned this pull request Jul 7, 2022

[SPARK-39702][CORE] Reduce memory overhead of TransportCipher$EncryptedMessage by using a shared byteRawChannel #37110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24801][CORE] Avoid memory waste by empty byte[] arrays in SaslEncryption$EncryptedMessage #21811

[SPARK-24801][CORE] Avoid memory waste by empty byte[] arrays in SaslEncryption$EncryptedMessage #21811

countmdm commented Jul 18, 2018 •

edited

Loading

countmdm commented Jul 18, 2018 via email

squito commented Jul 19, 2018

squito commented Jul 19, 2018

squito commented Jul 19, 2018

felixcheung left a comment

kiszk commented Jul 22, 2018

squito commented Jul 23, 2018

squito commented Jul 23, 2018

SparkQA commented Jul 23, 2018

kiszk commented Jul 23, 2018

countmdm commented Jul 23, 2018

kiszk commented Jul 23, 2018

countmdm commented Jul 23, 2018

kiszk commented Jul 24, 2018

felixcheung commented Jul 24, 2018 via email

squito commented Jul 24, 2018

dongjoon-hyun commented Jul 26, 2018

squito commented Jul 27, 2018

[SPARK-24801][CORE] Avoid memory waste by empty byte[] arrays in SaslEncryption$EncryptedMessage #21811

[SPARK-24801][CORE] Avoid memory waste by empty byte[] arrays in SaslEncryption$EncryptedMessage #21811

Conversation

countmdm commented Jul 18, 2018 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

countmdm commented Jul 18, 2018 via email

squito commented Jul 19, 2018

squito commented Jul 19, 2018

squito commented Jul 19, 2018

felixcheung left a comment

Choose a reason for hiding this comment

kiszk commented Jul 22, 2018

squito commented Jul 23, 2018

squito commented Jul 23, 2018

SparkQA commented Jul 23, 2018

kiszk commented Jul 23, 2018

countmdm commented Jul 23, 2018

kiszk commented Jul 23, 2018

countmdm commented Jul 23, 2018

kiszk commented Jul 24, 2018

felixcheung commented Jul 24, 2018 via email

squito commented Jul 24, 2018

dongjoon-hyun commented Jul 26, 2018

squito commented Jul 27, 2018

countmdm commented Jul 18, 2018 •

edited

Loading