-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-1713. Use a thread pool for launching executors. #663
Conversation
Merged build triggered. |
Merged build started. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14710/ |
10, Integer.MAX_VALUE, | ||
1, TimeUnit.MINUTES, | ||
new LinkedBlockingQueue[Runnable](), | ||
Executors.defaultThreadFactory()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use guava's ThreadFactoryBuilder and name these threads?
Updated patch uses Guava ThreadFactoryBuilder |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
This doesn't dynamically change the thread pool size. Its basically limited at 10 since its using a LinkedBlockingQueue. And the value of the maximumPoolSize doesn't have any effect. The MR AM has code to change the size and also has a config for max limit. |
See also Capping the number of threads helps those two factors (why wouldn't the max pool size have effect?) but at some level this means this may block when the executor is busy. Right now the no-arg How about setting the max pool size, and queue capacity, to a value that is deemed quite large? like, you never want more than 1000 threads and 100K tasks queued? at least you're putting some sane defenses up against these situations rather than make 100K threads. |
Ah, misunderstood how ThreadPoolFactory worked. To take a step back, we expect to have a burst (max low thousands) of executors to run at the beginning of an application. Then the only time we'll use the thread pool is when executors fail and we need to start new ones. So the behavior we want is to be able to scale the pool up to a large number of threads, and then shrink it after the initial burst is done. We'd probably like to limit this number of threads at some number, but keep queueing up containers to start. I think the right solution is probably to create a pool with a large core size (500?), and then, after the initial containers are launched, bring this down to something more manageable like 10. |
Or, actually, if we use allowCoreThreadTimeout then we don't need to change the core size. |
That sounds reasonable approach - will know more when PR is updated. The initial idea behind separate Thread for each was exactly that - we expected low hundreds as upper limit (at that time); we needed to send request to RM asap while not being sure how long the rpc can take. This is, clearly, suboptimal leading to unbounded increase in threads during initial allocation burst. |
Merged build triggered. |
Merged build started. |
We don't necessarily need to start with 500 threads. In the very least it should be configurable. In our tests of MR we found that 500 threads didn't actually help, we found that 10's worked just fine (20-30), even when launching thousands of tasks. If you actually get 500 threads it can cause memory issues. I guess if we have them timeout quicker it might not be as much of an issues (I believe MR's timeout is high), but I would like to see it a config just in case it is. Also 1 second might be a bit to short if you actually want to reuse them as the AM heartbeats to the RM every 5 seconds, so if you get one round launch them, heartbeat back in, it will have shutdown those threads even though it could reuse them. @srowen See the javadoc on the ThreadPoolExecutor about LInkedBlockingQueue: Unbounded queues. Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy. Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.) This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server. While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed. Also perhaps adding a comment there to explain it would be good as its easy to miss that max isn't used. |
Merged build finished. |
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14734/ |
By default, ThreadPoolExecutor doesn't start all the core threads at once. Though I just learned that when it gets tasks, it will start new ones even if others are idle, so I agree that 500 seems high. Uploading a patch that defaults to 25 and makes it configurable. |
Merged build triggered. |
Merged build started. |
Merged build finished. All automated tests passed. |
All automated tests passed. |
@sryza can you update this to the latest master and also update the yarn alpha side. Since we are adding a config I think it should apply to both of them. |
Also I'm curious if you ran any benchmarks test startup time? |
test this please |
@sryza When you have a chance can you upmerge this to master and resolve the conflicts? |
@andrewor14 fyi - I want to finish reviewing and merge in #2020 which will probably affect many of the existing PRs. So you might want to hold off on upmerging. |
@tgravescs What is the approximate timeline on the refactor? We are cutting a release candidate for 1.1 soon and at this point we are basically considering only bug fixes and other minor changes. The changes in #2020 seem a little big. |
the changes in 2020 are meant only for master, not 1.1. At this point I don't see this one going into 1.1 either. Or did you hit some particular issues this is needed? |
Note I'm hoping to get that in in the next few days. |
Ah I see. Yeah I agree that the refactor should go into master but not 1.1, and I suppose the same goes for this PR since we still need to test it. Sounds good. |
@sryza sorry these have gone so long. Would you mind upmerging? |
cf0ff3f
to
036550d
Compare
Upmerged |
QA tests have started for PR 663 at commit
|
QA tests have finished for PR 663 at commit
|
+1, looks good. Thanks @sryza ! |
This patch copies the approach used in the MapReduce application master for launching containers.