Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work #14433

Closed
wants to merge 7 commits into from

Conversation

wangmiao1981
Copy link
Contributor

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

./bin/sparkR
Launching java with spark-submit command /Users/mwang/spark_ws_0904/bin/spark-submit "sparkr-shell" /var/folders/s_/83b0sgvj2kl2kwq4stvft_pm0000gn/T//RtmpQxJGiZ/backend_porte9474603ed1e
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

sc.setLogLevel("INFO")
Error: could not find function "sc.setLogLevel"

sc.setLogLevel doesn't exist.

R has a function setLogLevel.

I rename the setLogLevel function to sc.setLogLevel.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Change unit test. Run unit tests.
Manually tested it in sparkR shell.

@SparkQA
Copy link

SparkQA commented Aug 1, 2016

Test build #63069 has finished for PR 14433 at commit 9d3d8e5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Aug 1, 2016

I think a better change might be to change that message if we are launching SparkR ? cc @felixcheung

@wangmiao1981
Copy link
Contributor Author

@shivaram I found spark-shell and pyspark using the same message:

Python 2.7.11 |Anaconda 2.4.0 (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

@felixcheung
Copy link
Member

Correct, I don't think should name the function sc.setLogLevel
The message is coming from Scala - I changed the default logging level for python and R recently so I think I know where it is coming from - it just so happen python is named the same way as JVM/Scala - it would be better if that code is checking what shell is running and print the right message instead.

@wangmiao1981
Copy link
Contributor Author

@felixcheung Let me check the code of launching the sparkR shell. Can you point me the code?

Thanks!

@wangmiao1981
Copy link
Contributor Author

@felixcheung I will try to retrieve terminal/shell type before printing out the message. I will update the PR if I can find a way of doing that. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 2, 2016

Test build #63132 has finished for PR 14433 at commit a78d354.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981
Copy link
Contributor Author

The failure seems unrelated to the change.

@wangmiao1981
Copy link
Contributor Author

Jenkins, re-test this please.

@SparkQA
Copy link

SparkQA commented Aug 2, 2016

Test build #63137 has finished for PR 14433 at commit a78d354.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981
Copy link
Contributor Author

@felixcheung I checked which object calls the log and print message accordingly.

./bin/sparkR

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

Launching java with spark-submit command /Users/mwang/spark_ws_0904/bin/spark-submit "sparkr-shell" /var/folders/s_/83b0sgvj2kl2kwq4stvft_pm0000gn/T//Rtmpi4dP8y/backend_portb9fa113abad2
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use setLogLevel(newLevel).

./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

./bin/pyspark
Python 2.7.11 |Anaconda 2.4.0 (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

@felixcheung
Copy link
Member

awesome!

I think we might have a need to create a helper for "am I running in the SparkR shell" function?
#14258 (comment)

@wangmiao1981
Copy link
Contributor Author

@felixcheung "I think we might have a need to create a helper for "am I running in the SparkR shell" function?" Do you mean for #14258 ? Not for this PR, right?

@felixcheung
Copy link
Member

felixcheung commented Aug 3, 2016 via email

@wangmiao1981
Copy link
Contributor Author

@felixcheung Checking whether it is running from shell is not exactly the same as checking which shell is calling it. My approach is depends on the fact that the Logging trait is used in three different gateway objects. Let me check if there is a better way to do that.

@felixcheung
Copy link
Member

Sure - but one is an subset of another (ie. knowing it's the sparkR shell means it is running a shell)
I don't feel strongly about and we could come back later on, but thought less fragmentation would be nice.

@wangmiao1981
Copy link
Contributor Author

I am investigating a solution now. Either today or tomorrow, I can give an update. Thanks!

@wangmiao1981
Copy link
Contributor Author

@felixcheung Since all shell instances (i.e., spark-shell, pyspark and sparkR) are initialized through spark-submit script, I think a possible solution is to set a status flag in SparkSubmit object when shell is initialized. Then, we can create a API public to [spark] for by checking the flag.

What do you think?

Thanks!

@wangmiao1981
Copy link
Contributor Author

@felixcheung I don't find a better way other than checking the string either passed by resource or by the class name. Do you have any good ideas?

@wangmiao1981
Copy link
Contributor Author

@felixcheung I have an idea of creating an Enum object in scala, like the example below:
object WeekDay {
sealed trait EnumVal
case object Mon extends EnumVal
case object Tue extends EnumVal
case object Wed extends EnumVal
case object Thu extends EnumVal
case object Fri extends EnumVal
val daysOfWeek = Seq(Mon, Tue, Wed, Thu, Fri)
}

I create a private API of setting the Enum value to a private variable and a public API of checking the shell type by returning Option[Shell_type]. If return value is NULL, it is not a shell otherwise, it returns the correct shell type.

How do you think about this approach?

Thanks!

@felixcheung
Copy link
Member

Can we have this like SparkSubmitAction that extends Enumeration?

@wangmiao1981
Copy link
Contributor Author

@felixcheung I think SparkSubmitAction is good for this purpose. We might have to make the scope as [spark] instead of [deploy]. So it can be used in other components if needed. I will create the Enum object accordingly. Thanks!

@SparkQA
Copy link

SparkQA commented Aug 15, 2016

Test build #63802 has finished for PR 14433 at commit ad88977.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 16, 2016

Test build #63805 has finished for PR 14433 at commit 80914e2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

import org.slf4j.{Logger, LoggerFactory}
import org.slf4j.impl.StaticLoggerBinder

import org.apache.spark.util.Utils
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per this import org.apache.spark.deploy should go here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixcheung It seems that you saw my old commit for some reason. I think I changed the import order in the latest commit. Thanks!

@wangmiao1981
Copy link
Contributor Author

@felixcheung, Any comments on the new change?

@felixcheung
Copy link
Member

@srowen @vanzin what do you think?

@srowen
Copy link
Member

srowen commented Aug 25, 2016

It feels like some overkill unless there are going to be more uses for changing logic based on whether it's running a shell. It seems not so bad to define setRootLevel in Scala as an alias when in the shell, or define something in SparkR, or just change the log message to note the two possibilities. Is there more need for this logic?

@felixcheung
Copy link
Member

that's a good point actually - how about we use args.primaryResource or args.isR that already exists in SparkSubmit?

@wangmiao1981
Copy link
Contributor Author

args.primaryResource is good for this purpose. I can make change similar to my initial commit but checking against args.primaryResource.

@wangmiao1981
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64491 has finished for PR 14433 at commit 6e66b5d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Aug 26, 2016

Test build #64493 has finished for PR 14433 at commit fb16375.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@wangmiao1981
Copy link
Contributor Author

Any further comments? Thanks!

@@ -135,7 +136,12 @@ private[spark] trait Logging {
val replLevel = Option(replLogger.getLevel()).getOrElse(Level.WARN)
if (replLevel != rootLogger.getEffectiveLevel()) {
System.err.printf("Setting default log level to \"%s\".\n", replLevel)
System.err.println("To adjust logging level use sc.setLogLevel(newLevel).")
if (SparkSubmit.isRShell) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think it's a bit ugly to pipe through this info with extra methods and so on. It seems simpler just to amend the message to also state the right command for sparkr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 -- I personally find it odd that we would import deploy.SparkSubmit into Logging which is a base class used throughout the code base. We could also change the log message to be more vague. Something like To adjust logging level please call setLogLevel(newLevel) using SparkContext might work for all languages ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about I change the log like To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel)? Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me

@wangmiao1981
Copy link
Contributor Author

@shivaram I simplified the solution by only changing the message as we discussed. Thanks!

@SparkQA
Copy link

SparkQA commented Sep 2, 2016

Test build #64839 has finished for PR 14433 at commit ef8887d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@shivaram
Copy link
Contributor

shivaram commented Sep 2, 2016

LGTM. Thanks @wangmiao1981 -- I'll keep this open for a bit to see if there are any more comments.

@@ -135,7 +135,8 @@ private[spark] trait Logging {
val replLevel = Option(replLogger.getLevel()).getOrElse(Level.WARN)
if (replLevel != rootLogger.getEffectiveLevel()) {
System.err.printf("Setting default log level to \"%s\".\n", replLevel)
System.err.println("To adjust logging level use sc.setLogLevel(newLevel).")
System.err.println("To adjust logging level use sc.setLogLevel(newLevel). " +
"For SparkR, use setLogLevel(newLevel).")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'd indent this continuation line, I don't think it's worth worrying about. LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this up during the merge

@shivaram
Copy link
Contributor

shivaram commented Sep 3, 2016

Merged into master

@asfgit asfgit closed this in e9b58e9 Sep 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants