Skip to content

Commit

Permalink
[SPARK-4864] Add documentation to Netty-based configs
Browse files Browse the repository at this point in the history
Author: Aaron Davidson <[email protected]>

Closes #3713 from aarondav/netty-configs and squashes the following commits:

8a8b373 [Aaron Davidson] Address Patrick's comments
3b1f84e [Aaron Davidson] [SPARK-4864] Add documentation to Netty-based configs

(cherry picked from commit fbca6b6)
Signed-off-by: Patrick Wendell <[email protected]>
  • Loading branch information
aarondav authored and pwendell committed Dec 22, 2014
1 parent c7396b5 commit 4b2bded
Show file tree
Hide file tree
Showing 2 changed files with 36 additions and 1 deletion.
35 changes: 35 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -852,6 +852,41 @@ Apart from these, the following properties are also available, and may be useful
between nodes leading to flooding the network with those.
</td>
</tr>
<tr>
<td><code>spark.shuffle.io.preferDirectBufs</code></td>
<td>true</td>
<td>
(Netty only) Off-heap buffers are used to reduce garbage collection during shuffle and cache
block transfer. For environments where off-heap memory is tightly limited, users may wish to
turn this off to force all allocations from Netty to be on-heap.
</td>
</tr>
<tr>
<td><code>spark.shuffle.io.numConnectionsPerPeer</code></td>
<td>1</td>
<td>
(Netty only) Connections between hosts are reused in order to reduce connection buildup for
large clusters. For clusters with many hard disks and few hosts, this may result in insufficient
concurrency to saturate all disks, and so users may consider increasing this value.
</td>
</tr>
<tr>
<td><code>spark.shuffle.io.maxRetries</code></td>
<td>3</td>
<td>
(Netty only) Fetches that fail due to IO-related exceptions are automatically retried if this is
set to a non-zero value. This retry logic helps stabilize large shuffles in the face of long GC
pauses or transient network connectivity issues.
</td>
</tr>
<tr>
<td><code>spark.shuffle.io.retryWait</code></td>
<td>5</td>
<td>
(Netty only) Seconds to wait between retries of fetches. The maximum delay caused by retrying
is simply <code>maxRetries * retryWait</code>, by default 15 seconds.
</td>
</tr>
</table>

#### Scheduling
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ public int connectionTimeoutMs() {
return conf.getInt("spark.shuffle.io.connectionTimeout", 120) * 1000;
}

/** Number of concurrent connections between two nodes for fetching data. **/
/** Number of concurrent connections between two nodes for fetching data. */
public int numConnectionsPerPeer() {
return conf.getInt("spark.shuffle.io.numConnectionsPerPeer", 1);
}
Expand Down

0 comments on commit 4b2bded

Please sign in to comment.