Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mesos] expand coarse-grained mode docs #14059

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 51 additions & 26 deletions docs/running-on-mesos.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,30 +180,53 @@ Note that jars or python files that are passed to spark-submit should be URIs re

# Mesos Run Modes

Spark can run over Mesos in two modes: "coarse-grained" (default) and "fine-grained".

The "coarse-grained" mode will launch only *one* long-running Spark task on each Mesos
machine, and dynamically schedule its own "mini-tasks" within it. The benefit is much lower startup
overhead, but at the cost of reserving the Mesos resources for the complete duration of the
application.

Coarse-grained is the default mode. You can also set `spark.mesos.coarse` property to true
to turn it on explicitly in [SparkConf](configuration.html#spark-properties):

{% highlight scala %}
conf.set("spark.mesos.coarse", "true")
{% endhighlight %}

In addition, for coarse-grained mode, you can control the maximum number of resources Spark will
acquire. By default, it will acquire *all* cores in the cluster (that get offered by Mesos), which
only makes sense if you run just one application at a time. You can cap the maximum number of cores
using `conf.set("spark.cores.max", "10")` (for example).

In "fine-grained" mode, each Spark task runs as a separate Mesos task. This allows
multiple instances of Spark (and other frameworks) to share machines at a very fine granularity,
where each application gets more or fewer machines as it ramps up and down, but it comes with an
additional overhead in launching each task. This mode may be inappropriate for low-latency
requirements like interactive queries or serving web requests.
Spark can run over Mesos in two modes: "coarse-grained" (default) and
"fine-grained".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coarse-Grained Mode

## Coarse-Grained

In "coarse-grained" mode, each Spark executor runs as a single Mesos
task. Spark executors are sized according to the following
configuration variables:

* Executor memory: `spark.executor.memory`
* Executor cores: `spark.executor.cores`
* Number of executors: `spark.cores.max`/`spark.executor.cores`

Please see the [Spark Configuration](configuration.html) page for
details and default values.

Executors are brought up eagerly when the application starts, until
`spark.cores.max` is reached. If you don't set `spark.cores.max`, the
Spark application will reserve all resources offered to it by Mesos,
so we of course urge you to set this variable in any sort of
multi-tenant cluster, including one which runs multiple concurrent
Spark applications.

The scheduler will start executors round-robin on the offers Mesos
gives it, but there are no spread guarantees, as Mesos does not
provide such guarantees on the offer stream.

The benefit of coarse-grained mode is much lower startup overhead, but
at the cost of reserving Mesos resources for the complete duration of
the application. To configure your job to dynamically adjust to its
resource requirements, look into
[Dynamic Allocation](#dynamic-resource-allocation-with-mesos).

## Fine-Grained

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine-Grained Mode

In "fine-grained" mode, each Spark task inside the Spark executor runs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a message saying this is deprecated as of spark 2.0?

Copy link
Contributor Author

@mgummelt mgummelt Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made a separate PR for that: #14078

Does that work?

Also, will this be merged into Spark 2.0.0? I need to make the copy consistent with the version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup both should go into 2.0.

as a separate Mesos task. This allows multiple instances of Spark (and
other frameworks) to share cores at a very fine granularity, where
each application gets more or fewer cores as it ramps up and down, but
it comes with an additional overhead in launching each task. This mode
may be inappropriate for low-latency requirements like interactive
queries or serving web requests.

Note that while Spark tasks in fine-grained will relinquish cores as
they terminate, they will not relinquish memory, as the JVM does not
give memory back to the Operating System. Neither will executors
terminate when they're idle.

To run in fine-grained mode, set the `spark.mesos.coarse` property to false in your
[SparkConf](configuration.html#spark-properties):
Expand All @@ -212,7 +235,9 @@ To run in fine-grained mode, set the `spark.mesos.coarse` property to false in y
conf.set("spark.mesos.coarse", "false")
{% endhighlight %}

You may also make use of `spark.mesos.constraints` to set attribute based constraints on mesos resource offers. By default, all resource offers will be accepted.
You may also make use of `spark.mesos.constraints` to set
attribute-based constraints on Mesos resource offers. By default, all
resource offers will be accepted.

{% highlight scala %}
conf.set("spark.mesos.constraints", "os:centos7;us-east-1:false")
Expand Down Expand Up @@ -246,7 +271,7 @@ In either case, HDFS runs separately from Hadoop MapReduce, without being schedu

# Dynamic Resource Allocation with Mesos

Mesos supports dynamic allocation only with coarse-grain mode, which can resize the number of
Mesos supports dynamic allocation only with coarse-grained mode, which can resize the number of
executors based on statistics of the application. For general information,
see [Dynamic Resource Allocation](job-scheduling.html#dynamic-resource-allocation).

Expand Down