Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-3811 [CORE] More robust / standard Utils.deleteRecursively, Utils.createTempDir #2670

Closed
wants to merge 3 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented Oct 6, 2014

I noticed a few issues with how temp directories are created and deleted:

Minor

  • Guava's Files.createTempDir() plus File.deleteOnExit() is used in many tests to make a temp dir, but Utils.createTempDir() seems to be the standard Spark mechanism
  • Call to File.deleteOnExit() could be pushed into Utils.createTempDir() as well, along with this replacement
  • I messed up the message in an exception in Utils in SPARK-3794; fixed here

Bit Less Minor

  • Utils.deleteRecursively() fails immediately if any IOException occurs, instead of trying to delete any remaining files and subdirectories. I've observed this leave temp dirs around. I suggest changing it to continue in the face of an exception and throw one of the possibly several exceptions that occur at the end.
  • Utils.createTempDir() will add a JVM shutdown hook every time the method is called. Even if the subdir is the parent of another parent dir, since this check is inside the hook. However Utils manages a set of all dirs to delete on shutdown already, called shutdownDeletePaths. A single hook can be registered to delete all of these on exit. This is how Tachyon temp paths are cleaned up in TachyonBlockManager.

I noticed a few other things that might be changed but wanted to ask first:

  • Shouldn't the set of dirs to delete be File, not just String paths?
  • Utils manages the set of TachyonFile that have been registered for deletion, but the shutdown hook is managed in TachyonBlockManager. Should this logic not live together, and not in Utils? it's more specific to Tachyon, and looks a slight bit odd to import in such a generic place.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have started for PR 2670 at commit da0146d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have finished for PR 2670 at commit da0146d.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21321/Test PASSed.

@vanzin
Copy link
Contributor

vanzin commented Oct 6, 2014

So, I've never been a fan of File.deleteOnExit(), and just to make sure my dislike was not unfounded, I wrote the following code:

import java.io.*;

class d { public static void main(String[] args) throws Exception {
  File d = new File("/tmp/foo");
  d.mkdir();
  d.deleteOnExit();

  File f = new File(d, "bar");
  Writer out = new FileWriter(f);
  out.close();
} }

And sure enough, after you run it, /tmp/foo and its child file are still there. I don't know if this applies to your patch (will be going through it next), but in general, whenever I see a deleteOnExit() call somewhere, it just feels like laziness to properly clean things up.

@@ -251,15 +265,8 @@ private[spark] object Utils extends Logging {
} catch { case e: IOException => ; }
}

dir.deleteOnExit()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my top-level comment on this not really working for directories.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's almost just like insurance. I believe it works if the dir is empty, and under some circumstances it will delete stuff in the right order. I have no problem removing it on the grounds that it doesn't hurt, maybe helps, but, this is sort of meaningless because the code should actually delete things correctly already.

@vanzin
Copy link
Contributor

vanzin commented Oct 6, 2014

LGTM, just a few minor things.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have started for PR 2670 at commit 071ae60.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have finished for PR 2670 at commit 071ae60.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21338/Test PASSed.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

Thanks @srowen. Merging to master.

@asfgit asfgit closed this in 363baac Oct 10, 2014
@srowen srowen deleted the SPARK-3811 branch October 12, 2014 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants