Skip to content

Commit

Permalink
[SPARK-5644] [Core]Delete tmp dir when sc is stop
Browse files Browse the repository at this point in the history
When we run driver as a service, and for each time we run job we only call sc.stop, then will not delete tmp dir create by HttpFileServer and SparkEnv, it will be deleted until the service process exit, so we need to delete these tmp dirs when sc is stop directly.

Author: Sephiroth-Lin <[email protected]>

Closes apache#4412 from Sephiroth-Lin/bug-fix-master-01 and squashes the following commits:

fbbc785 [Sephiroth-Lin] using an interpolated string
b968e14 [Sephiroth-Lin] using an interpolated string
4edf394 [Sephiroth-Lin] rename the variable and update comment
1339c96 [Sephiroth-Lin] add a member to store the reference of tmp dir
b2018a5 [Sephiroth-Lin] check sparkFilesDir before delete
f48a3c6 [Sephiroth-Lin] don't check sparkFilesDir, check executorId
dd9686e [Sephiroth-Lin] format code
b38e0f0 [Sephiroth-Lin] add dir check before delete
d7ccc64 [Sephiroth-Lin] Change log level
1d70926 [Sephiroth-Lin] update comment
e2a2b1b [Sephiroth-Lin] update comment
aeac518 [Sephiroth-Lin] Delete tmp dir when sc is stop
c0d5b28 [Sephiroth-Lin] Delete tmp dir when sc is stop

Conflicts:
	core/src/main/scala/org/apache/spark/SparkEnv.scala
  • Loading branch information
Sephiroth-Lin authored and ybyudin committed Mar 16, 2015
1 parent 4483c23 commit 6af7b65
Show file tree
Hide file tree
Showing 2 changed files with 37 additions and 1 deletion.
9 changes: 9 additions & 0 deletions core/src/main/scala/org/apache/spark/HttpFileServer.scala
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ private[spark] class HttpFileServer(securityManager: SecurityManager) extends Lo

def stop() {
httpServer.stop()

// If we only stop sc, but the driver process still run as a services then we need to delete
// the tmp dir, if not, it will create too many tmp dirs
try {
Utils.deleteRecursively(baseDir)
} catch {
case e: Exception =>
logWarning(s"Exception while deleting Spark temp dir: ${baseDir.getAbsolutePath}", e)
}
}

def addFile(file: File) : String = {
Expand Down
29 changes: 28 additions & 1 deletion core/src/main/scala/org/apache/spark/SparkEnv.scala
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,8 @@ class SparkEnv (
// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
private[spark] val hadoopJobMetadata = new MapMaker().softValues().makeMap[String, Any]()

private var driverTmpDirToDelete: Option[String] = None

private[spark] def stop() {
pythonWorkers.foreach { case(key, worker) => worker.stop() }
httpFileServer.stop()
Expand All @@ -90,6 +92,22 @@ class SparkEnv (
// down, but let's call it anyway in case it gets fixed in a later release
// UPDATE: In Akka 2.1.x, this hangs if there are remote actors, so we can't call it.
// actorSystem.awaitTermination()

// If we only stop sc, but the driver process still run as a services then we need to delete
// the tmp dir, if not, it will create too many tmp dirs.
// We only need to delete the tmp dir create by driver, because sparkFilesDir is point to the
// current working dir in executor which we do not need to delete.
driverTmpDirToDelete match {
case Some(path) => {
try {
Utils.deleteRecursively(new File(path))
} catch {
case e: Exception =>
logWarning(s"Exception while deleting Spark temp dir: $path", e)
}
}
case None => // We just need to delete tmp dir created by driver, so do nothing on executor
}
}

private[spark]
Expand Down Expand Up @@ -248,7 +266,7 @@ object SparkEnv extends Logging {
"levels using the RDD.persist() method instead.")
}

new SparkEnv(
val envInstance = new SparkEnv(
executorId,
actorSystem,
serializer,
Expand All @@ -264,6 +282,15 @@ object SparkEnv extends Logging {
sparkFilesDir,
metricsSystem,
conf)

// Add a reference to tmp dir created by driver, we will delete this tmp dir when stop() is
// called, and we only need to do it for driver. Because driver may run as a service, and if we
// don't delete this tmp dir when sc is stopped, then will create too many tmp dirs.
if (isDriver) {
envInstance.driverTmpDirToDelete = Some(sparkFilesDir)
}

envInstance
}

/**
Expand Down

0 comments on commit 6af7b65

Please sign in to comment.