[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should respect spark.executor.instances #19183

karth295 · 2017-09-10T22:24:46Z

What changes were proposed in this pull request?

Removes check that spark.executor.instances is set to 0 when using Streaming DRA.

How was this patch tested?

Manual tests

My only concern with this PR is that spark.executor.instances (or the actual initial number of executors that the cluster manager gives Spark) can be outside of spark.streaming.dynamicAllocation.minExecutors to spark.streaming.dynamicAllocation.maxExecutors. I don't see a good way around that, because this code only runs after the SparkContext has been created.

…spect spark.executor.instances

ghost · 2018-03-22T19:07:58Z

This is actually a good design solution. Right now, it is not very clear (not even docs, examples, google searches) how to set an initial number of executors for a Streaming Application that haves StreamingDynamicAllocation enabled. I considerer this important, cause for some streams, like Kinesis... a minimun number of executors are needed, to match shards.

Any quick workaround for this?

sansagara · 2018-03-27T15:13:56Z

Just commenting to be subscribed!

holdenk · 2018-05-04T23:19:17Z

Jenkins, ok to test.

holdenk · 2018-05-04T23:24:50Z

@karth295 for validating that spark.executors is in a valid range could we look at it in the config with ConfigBuilder checkValue ?

holdenk · 2018-05-04T23:26:37Z

Also one point, the lack of tests leaves me a little concerned about this change, maybe look at ./streaming/src/test/scala/org/apache/spark/streaming/scheduler/ExecutorAllocationManagerSuite.scala and see if it would make sense to add something there?

This is a bit out of my usual range so I'm going to CC @koeninger to take a more detailed look.

SparkQA · 2018-05-05T00:43:39Z

Test build #90232 has finished for PR 19183 at commit 4c9769e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

koeninger · 2018-05-10T03:11:03Z

I don't have personal experience with streaming dynamic allocation, but this patch makes sense to me and I don't see anything obviously wrong.

I agree with Holden regarding tests.

sansagara · 2018-05-10T15:35:36Z

I can add a test on ExecutorAllocationManagerSuite.scala to assert spark.executor.instances correctly allocates that number of executors initially when uaing Streaming DA, while still respecting streaming min and max executors. Can't think of another test, TBH. What do you think?

koeninger · 2018-05-10T21:04:58Z

@sansagara sounds reasonable to me

karth295 · 2018-05-11T01:37:08Z

@sansagara go for it -- it'll be a few days until I'll have time to look at this again. I'll close my PR if/when you make a new one :)

skonto · 2018-05-28T11:17:26Z

@karth295 I think you could validate the config here, in the validateSettings method for SparkConfig. I guess this is also where you have all the info required and its not too late, it is when SparkContext gets initialized. I see other exceptions thrown there.

AmplabJenkins · 2018-06-09T00:19:21Z

Can one of the admins verify this patch?

srowen · 2018-07-18T17:40:09Z

I thought this check also existed in the non-streaming code; the theory was that if you have set a fixed number of executors but enabled dynamic allocation, then that's probably a configuration error. But given that many people run on clusters with dynamic allocation defaulting to 'on' globally, that could be confusing or a little inconvenient to work around.

I don't think that check exists in the non-streaming code anymore though, and I see a test to that effect too. Therefore I think this is reasonable for consistency.

CC @tdas

karth295 · 2018-07-20T03:16:59Z

@srowen ah, thanks for the background -- that does make sense.

@skonto I agree that the message should be logged when the SparkContext gets initialized, but I don't like the idea of putting it in validateSettings. Spark Core's ExecuterAllocationManager does its own validation on construction, and it is constructed while the SparkContext is getting initialized. I like that each module validates its own configs, rather than having validation separate from code.

Spark Streaming's ExecutorAllocationManager also validates on construction, but it gets created when a job starts, not when the StreamingContext is initialized. This is because it depends on two things that only are available when the job starts: ReceiverTracker and the micro-batch interval.

I see two options:

Initialize ExecutorAllocationManager excluding those two things, and pass them in start()
Make validateSettings a static method and call that when creating the streaming context (or SparkConf), then construct it later.

I'm inclined to implement option 1 to avoid parsing the properties in two places -- constructor and validateSettings.

srowen

I'm OK with the change as-is, but OK with other solutions if you all think there's a better way.

[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should re…

4c9769e

…spect spark.executor.instances

karth295 closed this May 10, 2018

karth295 reopened this May 10, 2018

srowen approved these changes Jul 22, 2018

View reviewed changes

asfgit closed this in ee5a5a0 Jul 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should respect spark.executor.instances #19183

[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should respect spark.executor.instances #19183

karth295 commented Sep 10, 2017

ghost commented Mar 22, 2018

sansagara commented Mar 27, 2018

holdenk commented May 4, 2018

holdenk commented May 4, 2018

holdenk commented May 4, 2018

SparkQA commented May 5, 2018

koeninger commented May 10, 2018

sansagara commented May 10, 2018 •

edited

Loading

koeninger commented May 10, 2018

karth295 commented May 11, 2018

skonto commented May 28, 2018 •

edited

Loading

AmplabJenkins commented Jun 9, 2018

srowen commented Jul 18, 2018

karth295 commented Jul 20, 2018

srowen left a comment

[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should respect spark.executor.instances #19183

[SPARK-21960][Streaming] Spark Streaming Dynamic Allocation should respect spark.executor.instances #19183

Conversation

karth295 commented Sep 10, 2017

What changes were proposed in this pull request?

How was this patch tested?

ghost commented Mar 22, 2018

sansagara commented Mar 27, 2018

holdenk commented May 4, 2018

holdenk commented May 4, 2018

holdenk commented May 4, 2018

SparkQA commented May 5, 2018

koeninger commented May 10, 2018

sansagara commented May 10, 2018 • edited Loading

koeninger commented May 10, 2018

karth295 commented May 11, 2018

skonto commented May 28, 2018 • edited Loading

AmplabJenkins commented Jun 9, 2018

srowen commented Jul 18, 2018

karth295 commented Jul 20, 2018

srowen left a comment

Choose a reason for hiding this comment

sansagara commented May 10, 2018 •

edited

Loading

skonto commented May 28, 2018 •

edited

Loading