Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5778] throw if nonexistent metrics config file provided #4571

Closed
wants to merge 7 commits into from

Conversation

ryan-williams
Copy link
Contributor

previous behavior was to log an error; this is fine in the general
case where no spark.metrics.conf parameter was specified, in which
case a default metrics.properties is looked for, and the execption
logged and suppressed if it doesn't exist.

if the user has purposefully specified a metrics.conf file, however,
it makes more sense to show them an error when said file doesn't
exist.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27368 has started for PR 4571 at commit 105288e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27368 has finished for PR 4571 at commit 105288e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27368/
Test FAILed.

is = configFile match {
case Some(f) => new FileInputStream(f)
case None => Utils.getSparkClassLoader.getResourceAsStream(METRICS_CONF)
(configFile match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been thinking about style quite a bit recently as I'm working on a style guide. In this case, you'd want to break the chain down with intermediate variables. Otherwise the cognitive complexity of the chaining/nesting makes it hard to understand.

I wrote 3 versions of them, with each version gradually becoming more Java-like. I think any of them would be fine here.

version 1

// FileInputStream constructor throws FileNotFoundException if file doesn't exist
val isOpt: Option[InputStream] = configFile.map(new FileInputStream(_)).orElse {
  try {
    Option(Utils.getSparkClassLoader.getResourceAsStream(METRICS_CONF))
  } catch {
    case e: Exception =>
      logError("Error loading default configuration file", e)
      None
  }
}

isOpt.foreach { is =>
  try {
    properties.load(is)
  } finally {
    is.close()
  }
}

version 2

// FileInputStream constructor throws FileNotFoundException if file doesn't exist
var isOpt: Option[InputStream] = configFile.map { f =>
  logInfo("Loading MetricsConfig file: $f")
  new FileInputStream(f)
}

if (!isOpt.isDefined) {
  try {
    is = Option(Utils.getSparkClassLoader.getResourceAsStream(METRICS_CONF))
  } catch {
    case e: Exception =>
      logError("Error loading default configuration file", e)
  }
}

isOpt.foreach { is =>
  try {
    properties.load(is)
  } finally {
    is.close()
  }
}

version 3

// FileInputStream constructor throws FileNotFoundException if file doesn't exist
var is: InputStream = configFile.map(new FileInputStream(_)).orNull

if (is == null) {
  try {
    is = Option(Utils.getSparkClassLoader.getResourceAsStream(METRICS_CONF))
  } catch {
    case e: Exception =>
      logError("Error loading default configuration file", e)
  }
}

if (is != null) {
  try {
    properties.load(is)
  } finally {
    is.close()
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks, went with the first one.

incidentally I had a Some() where I should have had an Option(), which your candidates all fix, that I think was breaking the tests

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27376 has started for PR 4571 at commit b98e325.

  • This patch merges cleanly.

@ryan-williams
Copy link
Contributor Author

I threw in a couple of other trivial things:

  • throw if a bad Sink class is specified
  • exclude conf/metrics.properties, if present, from RAT checks

lmk if you want me to make those separate JIRAs/PRs

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27378 has started for PR 4571 at commit 45f1d3e.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27376 has finished for PR 4571 at commit b98e325.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27376/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 12, 2015

Test build #27378 has finished for PR 4571 at commit 45f1d3e.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logError("Sink class " + classPath + " cannot be instantialized")

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27378/
Test FAILed.

@SparkQA
Copy link

SparkQA commented Feb 13, 2015

Test build #27399 has started for PR 4571 at commit 1b74f69.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 13, 2015

Test build #27399 has finished for PR 4571 at commit 1b74f69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logError("Sink class " + classPath + " cannot be instantialized")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27399/
Test PASSed.

@rxin
Copy link
Contributor

rxin commented Feb 13, 2015

This looks good. @pwendell I'm not sure which branch you want this in, so I will let you do the merge.

previous behavior was to log an error; this is fine in the general
case where no `spark.metrics.conf` parameter was specified, in which
case a default `metrics.properties` is looked for, and the execption
logged and suppressed if it doesn't exist.

if the user has purposefully specified a metrics.conf file, however,
it makes more sense to show them an error when said file doesn't
exist.
it's common to have this file laying around as the default config file
for MetricsSystem, but RAT doesn't like its lack of an Apache header.
@ryan-williams
Copy link
Contributor Author

I just rebased and tacked a couple of trivial commits to MetricsConfig onto the end of this, jfyi

@rxin
Copy link
Contributor

rxin commented Feb 13, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 13, 2015

Test build #27464 has started for PR 4571 at commit 5bccb14.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 14, 2015

Test build #27464 has finished for PR 4571 at commit 5bccb14.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logError("Sink class " + classPath + " cannot be instantialized")
    • case class Params(

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27464/
Test FAILed.

@ryan-williams
Copy link
Contributor Author

looks like similar sorts of probably-spurious to those that I saw on #4573; thoughts, @rxin? re-test?

@rxin
Copy link
Contributor

rxin commented Feb 14, 2015

Jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Feb 14, 2015

Test build #27468 has started for PR 4571 at commit 5bccb14.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Feb 14, 2015

Test build #27468 has finished for PR 4571 at commit 5bccb14.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logError("Sink class " + classPath + " cannot be instantialized")

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27468/
Test FAILed.

@ryan-williams
Copy link
Contributor Author

seems like another spurious failure to me. That one passed in the previous build

@pwendell
Copy link
Contributor

Jenkins, retest this please. (sorry @ryan-williams we're trying to get tests in better shape).

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27529 has started for PR 4571 at commit 5bccb14.

  • This patch merges cleanly.

@pwendell
Copy link
Contributor

Overall, looks good to me. I do like the idea of failing here, since otherwise it can be very confusing to people why their settings are not being picked up. To avoid surprise here I think it's best to avoid backporting this patch (once merged) into older versions and just enforce it in new versions.

One other issue, it's not super well documented in the existing code base what type of error propagation we expect when initializing a spark environment on the executors. It would be good to verify that this gets thrown up the chain in a way that gets presented nicely to the user. I wouldn't block the patch on that (it's not newly introduced here), but having nicer internal documentation about the expectations that callers should have if SparkEnv's fail to initialize would be nice.

@SparkQA
Copy link

SparkQA commented Feb 15, 2015

Test build #27529 has finished for PR 4571 at commit 5bccb14.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • logError("Sink class " + classPath + " cannot be instantialized")

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27529/
Test PASSed.

@hammer
Copy link
Contributor

hammer commented Feb 17, 2015

@pwendell @rxin looks ready to merge?

@pwendell
Copy link
Contributor

Pulling it in now, thanks ryan!

asfgit pushed a commit that referenced this pull request Feb 17, 2015
previous behavior was to log an error; this is fine in the general
case where no `spark.metrics.conf` parameter was specified, in which
case a default `metrics.properties` is looked for, and the execption
logged and suppressed if it doesn't exist.

if the user has purposefully specified a metrics.conf file, however,
it makes more sense to show them an error when said file doesn't
exist.

Author: Ryan Williams <[email protected]>

Closes #4571 from ryan-williams/metrics and squashes the following commits:

5bccb14 [Ryan Williams] private-ize some MetricsConfig members
08ff998 [Ryan Williams] rename METRICS_CONF: DEFAULT_METRICS_CONF_FILENAME
f4d7fab [Ryan Williams] fix tests
ad24b0e [Ryan Williams] add "metrics.properties" to .rat-excludes
94e810b [Ryan Williams] throw if nonexistent Sink class is specified
31d2c30 [Ryan Williams] metrics code review feedback
56287db [Ryan Williams] throw if nonexistent metrics config file provided

(cherry picked from commit d8f69cf)
Signed-off-by: Patrick Wendell <[email protected]>
@asfgit asfgit closed this in d8f69cf Feb 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants