Skip to content

Commit

Permalink
[SPARK-25692][CORE] Remove static initialization of worker eventLoop …
Browse files Browse the repository at this point in the history
…handling chunk fetch requests within TransportContext. This fixes ChunkFetchIntegrationSuite as well

## What changes were proposed in this pull request?

How to reproduce
./build/mvn test -Dtest=org.apache.spark.network.RequestTimeoutIntegrationSuite,org.apache.spark.network.ChunkFetchIntegrationSuite -DwildcardSuites=None test
furtherRequestsDelay Test within RequestTimeoutIntegrationSuite was holding onto buffer references within worker threads. The test does close the server context but since the threads are global and there is sleep of 60 secs to fetch a specific chunk within this test, it grabs on it and waits for the client to consume but however the test is testing for a request timeout and it times out after 10 secs, so the workers are just waiting there for the buffer to be consumed by client as per my understanding.

This tends to happen if you dont have enough IO threads available on the specific system and also the order of the tests being run determines its flakyness like if ChunkFetchIntegrationSuite runs first then there is no issue. For example on mac with 8 threads these tests run fine but on my vm with 4 threads it fails. It matches the number of fetch calls in RequestTimeoutIntegrationSuite.

So do we really need it to be static?

I dont think this requires a global declaration as these threads are only required on the shuffle server end and on the client TransportContext initialization i.e the Client don't initialize these threads. The Shuffle Server initializes one TransportContext object. So, I think this is fine to be an instance variable and I see no harm.

## How was this patch tested?
Integration tests, manual tests

Closes #23700 from redsanket/SPARK-25692.

Authored-by: schintap <[email protected]>
Signed-off-by: Sean Owen <[email protected]>
  • Loading branch information
schintap authored and srowen committed Feb 5, 2019
1 parent 1dd7419 commit 13c5634
Showing 1 changed file with 10 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public class TransportContext {
// Separate thread pool for handling ChunkFetchRequest. This helps to enable throttling
// max number of TransportServer worker threads that are blocked on writing response
// of ChunkFetchRequest message back to the client via the underlying channel.
private static EventLoopGroup chunkFetchWorkers;
private final EventLoopGroup chunkFetchWorkers;

public TransportContext(TransportConf conf, RpcHandler rpcHandler) {
this(conf, rpcHandler, false, false);
Expand Down Expand Up @@ -122,16 +122,15 @@ public TransportContext(
this.closeIdleConnections = closeIdleConnections;
this.isClientOnly = isClientOnly;

synchronized(TransportContext.class) {
if (chunkFetchWorkers == null &&
conf.getModuleName() != null &&
conf.getModuleName().equalsIgnoreCase("shuffle") &&
!isClientOnly) {
chunkFetchWorkers = NettyUtils.createEventLoop(
IOMode.valueOf(conf.ioMode()),
conf.chunkFetchHandlerThreads(),
"shuffle-chunk-fetch-handler");
}
if (conf.getModuleName() != null &&
conf.getModuleName().equalsIgnoreCase("shuffle") &&
!isClientOnly) {
chunkFetchWorkers = NettyUtils.createEventLoop(
IOMode.valueOf(conf.ioMode()),
conf.chunkFetchHandlerThreads(),
"shuffle-chunk-fetch-handler");
} else {
chunkFetchWorkers = null;
}
}

Expand Down

0 comments on commit 13c5634

Please sign in to comment.