forked from apache/spark
-
Notifications
You must be signed in to change notification settings - Fork 52
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-25299] SPARK-25299 upstream updates (#605)
* Bring implementation into closer alignment with upstream. Step to ease merge conflict resolution and build failure problems when we pull in changes from upstream. * Cherry-pick BypassMergeSortShuffleWriter changes and shuffle writer API changes * [SPARK-28607][CORE][SHUFFLE] Don't store partition lengths twice The shuffle writer API introduced in SPARK-28209 has a flaw that leads to a memory usage regression - we ended up tracking the partition lengths in two places. Here, we modify the API slightly to avoid redundant tracking. The implementation of the shuffle writer plugin is now responsible for tracking the lengths of partitions, and propagating this back up to the higher shuffle writer as part of the commitAllPartitions API. Existing unit tests. Closes apache#25341 from mccheah/dont-redundantly-store-part-lengths. Authored-by: mcheah <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter Use the shuffle writer APIs introduced in SPARK-28209 in the sort shuffle writer. Existing unit tests were changed to use the plugin instead, and they used the local disk version to ensure that there were no regressions. Closes apache#25342 from mccheah/shuffle-writer-refactor-sort-shuffle-writer. Lead-authored-by: mcheah <[email protected]> Co-authored-by: mccheah <[email protected]> Signed-off-by: Marcelo Vanzin <[email protected]> * [SPARK-28570][CORE][SHUFFLE] Make UnsafeShuffleWriter use the new API. * Resolve build issues and remaining semantic conflicts * More build fixes * More build fixes * Attempt to fix build * More build fixes * [SPARK-29072] Put back usage of TimeTrackingOutputStream for UnsafeShuffleWriter and ShufflePartitionPairsWriter. * Address comments * Import ordering * Fix stream reference
- Loading branch information
Showing
50 changed files
with
1,527 additions
and
1,221 deletions.
There are no files selected for viewing
34 changes: 0 additions & 34 deletions
34
core/src/main/java/org/apache/spark/api/shuffle/ShuffleDataIO.java
This file was deleted.
Oops, something went wrong.
37 changes: 0 additions & 37 deletions
37
core/src/main/java/org/apache/spark/api/shuffle/ShuffleExecutorComponents.java
This file was deleted.
Oops, something went wrong.
39 changes: 0 additions & 39 deletions
39
core/src/main/java/org/apache/spark/api/shuffle/ShuffleMapOutputWriter.java
This file was deleted.
Oops, something went wrong.
42 changes: 0 additions & 42 deletions
42
core/src/main/java/org/apache/spark/api/shuffle/ShuffleReadSupport.java
This file was deleted.
Oops, something went wrong.
53 changes: 0 additions & 53 deletions
53
core/src/main/java/org/apache/spark/api/shuffle/SupportsTransferTo.java
This file was deleted.
Oops, something went wrong.
54 changes: 0 additions & 54 deletions
54
core/src/main/java/org/apache/spark/api/shuffle/TransferrableWritableByteChannel.java
This file was deleted.
Oops, something went wrong.
53 changes: 53 additions & 0 deletions
53
core/src/main/java/org/apache/spark/shuffle/api/MapOutputWriterCommitMessage.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import java.util.Optional; | ||
|
||
import org.apache.spark.annotation.Private; | ||
import org.apache.spark.storage.BlockManagerId; | ||
|
||
@Private | ||
public final class MapOutputWriterCommitMessage { | ||
|
||
private final long[] partitionLengths; | ||
private final Optional<BlockManagerId> location; | ||
|
||
private MapOutputWriterCommitMessage( | ||
long[] partitionLengths, Optional<BlockManagerId> location) { | ||
this.partitionLengths = partitionLengths; | ||
this.location = location; | ||
} | ||
|
||
public static MapOutputWriterCommitMessage of(long[] partitionLengths) { | ||
return new MapOutputWriterCommitMessage(partitionLengths, Optional.empty()); | ||
} | ||
|
||
public static MapOutputWriterCommitMessage of( | ||
long[] partitionLengths, BlockManagerId location) { | ||
return new MapOutputWriterCommitMessage(partitionLengths, Optional.of(location)); | ||
} | ||
|
||
public long[] getPartitionLengths() { | ||
return partitionLengths; | ||
} | ||
|
||
public Optional<BlockManagerId> getLocation() { | ||
return location; | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
53 changes: 53 additions & 0 deletions
53
core/src/main/java/org/apache/spark/shuffle/api/ShuffleDataIO.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
/* | ||
* Licensed to the Apache Software Foundation (ASF) under one or more | ||
* contributor license agreements. See the NOTICE file distributed with | ||
* this work for additional information regarding copyright ownership. | ||
* The ASF licenses this file to You under the Apache License, Version 2.0 | ||
* (the "License"); you may not use this file except in compliance with | ||
* the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.shuffle.api; | ||
|
||
import org.apache.spark.annotation.Private; | ||
|
||
/** | ||
* :: Private :: | ||
* An interface for plugging in modules for storing and reading temporary shuffle data. | ||
* <p> | ||
* This is the root of a plugin system for storing shuffle bytes to arbitrary storage | ||
* backends in the sort-based shuffle algorithm implemented by the | ||
* {@link org.apache.spark.shuffle.sort.SortShuffleManager}. If another shuffle algorithm is | ||
* needed instead of sort-based shuffle, one should implement | ||
* {@link org.apache.spark.shuffle.ShuffleManager} instead. | ||
* <p> | ||
* A single instance of this module is loaded per process in the Spark application. | ||
* The default implementation reads and writes shuffle data from the local disks of | ||
* the executor, and is the implementation of shuffle file storage that has remained | ||
* consistent throughout most of Spark's history. | ||
* <p> | ||
* Alternative implementations of shuffle data storage can be loaded via setting | ||
* <code>spark.shuffle.sort.io.plugin.class</code>. | ||
* @since 3.0.0 | ||
*/ | ||
@Private | ||
public interface ShuffleDataIO { | ||
|
||
String SHUFFLE_SPARK_CONF_PREFIX = "spark.shuffle.plugin."; | ||
|
||
ShuffleDriverComponents driver(); | ||
|
||
/** | ||
* Called once on executor processes to bootstrap the shuffle data storage modules that | ||
* are only invoked on the executors. | ||
*/ | ||
ShuffleExecutorComponents executor(); | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.