[SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work #14433

wangmiao1981 · 2016-08-01T06:13:44Z

What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

./bin/sparkR
Launching java with spark-submit command /Users/mwang/spark_ws_0904/bin/spark-submit "sparkr-shell" /var/folders/s_/83b0sgvj2kl2kwq4stvft_pm0000gn/T//RtmpQxJGiZ/backend_porte9474603ed1e
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

sc.setLogLevel("INFO")
Error: could not find function "sc.setLogLevel"

sc.setLogLevel doesn't exist.

R has a function setLogLevel.

I rename the setLogLevel function to sc.setLogLevel.

How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
Change unit test. Run unit tests.
Manually tested it in sparkR shell.

SparkQA · 2016-08-01T06:42:15Z

Test build #63069 has finished for PR 14433 at commit 9d3d8e5.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-08-01T16:51:15Z

I think a better change might be to change that message if we are launching SparkR ? cc @felixcheung

wangmiao1981 · 2016-08-01T16:58:56Z

@shivaram I found spark-shell and pyspark using the same message:

Python 2.7.11 |Anaconda 2.4.0 (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

felixcheung · 2016-08-01T19:35:19Z

Correct, I don't think should name the function sc.setLogLevel
The message is coming from Scala - I changed the default logging level for python and R recently so I think I know where it is coming from - it just so happen python is named the same way as JVM/Scala - it would be better if that code is checking what shell is running and print the right message instead.

wangmiao1981 · 2016-08-01T20:53:53Z

@felixcheung Let me check the code of launching the sparkR shell. Can you point me the code?

Thanks!

felixcheung · 2016-08-01T21:05:21Z

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala#L138

wangmiao1981 · 2016-08-02T00:34:25Z

@felixcheung I will try to retrieve terminal/shell type before printing out the message. I will update the PR if I can find a way of doing that. Thanks!

SparkQA · 2016-08-02T20:32:53Z

Test build #63132 has finished for PR 14433 at commit a78d354.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2016-08-02T21:15:41Z

The failure seems unrelated to the change.

wangmiao1981 · 2016-08-02T21:16:14Z

Jenkins, re-test this please.

SparkQA · 2016-08-02T23:35:40Z

Test build #63137 has finished for PR 14433 at commit a78d354.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2016-08-02T23:39:54Z

@felixcheung I checked which object calls the log and print message accordingly.

./bin/sparkR

R version 3.3.0 (2016-05-03) -- "Supposedly Educational"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin13.4.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

[Previously saved workspace restored]

Launching java with spark-submit command /Users/mwang/spark_ws_0904/bin/spark-submit "sparkr-shell" /var/folders/s_/83b0sgvj2kl2kwq4stvft_pm0000gn/T//Rtmpi4dP8y/backend_portb9fa113abad2
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use setLogLevel(newLevel).

./bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

./bin/pyspark
Python 2.7.11 |Anaconda 2.4.0 (x86_64)| (default, Dec 6 2015, 18:57:58)
[GCC 4.2.1 (Apple Inc. build 5577)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).

felixcheung · 2016-08-03T02:13:32Z

awesome!

I think we might have a need to create a helper for "am I running in the SparkR shell" function?
#14258 (comment)

wangmiao1981 · 2016-08-03T05:18:52Z

@felixcheung "I think we might have a need to create a helper for "am I running in the SparkR shell" function?" Do you mean for #14258 ? Not for this PR, right?

felixcheung · 2016-08-03T05:55:41Z

I think both checks are a bit fragile and would be great to a single way to check if running as shell that is shared, and that could be what SparkSubmit.scala call as well. Would be better not to have 3 different ways to check if running as shell, right?

wangmiao1981 · 2016-08-03T05:59:27Z

@felixcheung Checking whether it is running from shell is not exactly the same as checking which shell is calling it. My approach is depends on the fact that the Logging trait is used in three different gateway objects. Let me check if there is a better way to do that.

felixcheung · 2016-08-03T17:51:08Z

Sure - but one is an subset of another (ie. knowing it's the sparkR shell means it is running a shell)
I don't feel strongly about and we could come back later on, but thought less fragmentation would be nice.

wangmiao1981 · 2016-08-03T22:08:27Z

I am investigating a solution now. Either today or tomorrow, I can give an update. Thanks!

wangmiao1981 · 2016-08-03T22:59:09Z

@felixcheung Since all shell instances (i.e., spark-shell, pyspark and sparkR) are initialized through spark-submit script, I think a possible solution is to set a status flag in SparkSubmit object when shell is initialized. Then, we can create a API public to [spark] for by checking the flag.

What do you think?

Thanks!

wangmiao1981 · 2016-08-04T21:46:34Z

@felixcheung I don't find a better way other than checking the string either passed by resource or by the class name. Do you have any good ideas?

wangmiao1981 · 2016-08-09T23:56:20Z

@felixcheung I have an idea of creating an Enum object in scala, like the example below:
object WeekDay {
sealed trait EnumVal
case object Mon extends EnumVal
case object Tue extends EnumVal
case object Wed extends EnumVal
case object Thu extends EnumVal
case object Fri extends EnumVal
val daysOfWeek = Seq(Mon, Tue, Wed, Thu, Fri)
}

I create a private API of setting the Enum value to a private variable and a public API of checking the shell type by returning Option[Shell_type]. If return value is NULL, it is not a shell otherwise, it returns the correct shell type.

How do you think about this approach?

Thanks!

felixcheung · 2016-08-12T01:20:22Z

Can we have this like SparkSubmitAction that extends Enumeration?

wangmiao1981 · 2016-08-12T05:08:06Z

@felixcheung I think SparkSubmitAction is good for this purpose. We might have to make the scope as [spark] instead of [deploy]. So it can be used in other components if needed. I will create the Enum object accordingly. Thanks!

SparkQA · 2016-08-15T21:54:09Z

Test build #63802 has finished for PR 14433 at commit ad88977.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-16T00:16:58Z

Test build #63805 has finished for PR 14433 at commit 80914e2.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

felixcheung · 2016-08-16T06:49:43Z

core/src/main/scala/org/apache/spark/internal/Logging.scala

 import org.slf4j.{Logger, LoggerFactory}
 import org.slf4j.impl.StaticLoggerBinder
-
 import org.apache.spark.util.Utils


as per this import org.apache.spark.deploy should go here

@felixcheung It seems that you saw my old commit for some reason. I think I changed the import order in the latest commit. Thanks!

wangmiao1981 · 2016-08-23T17:59:59Z

@felixcheung, Any comments on the new change?

felixcheung · 2016-08-25T01:54:57Z

@srowen @vanzin what do you think?

srowen · 2016-08-25T07:26:28Z

It feels like some overkill unless there are going to be more uses for changing logic based on whether it's running a shell. It seems not so bad to define setRootLevel in Scala as an alias when in the shell, or define something in SparkR, or just change the log message to note the two possibilities. Is there more need for this logic?

felixcheung · 2016-08-25T08:48:52Z

that's a good point actually - how about we use args.primaryResource or args.isR that already exists in SparkSubmit?

wangmiao1981 · 2016-08-25T21:49:38Z

args.primaryResource is good for this purpose. I can make change similar to my initial commit but checking against args.primaryResource.

wangmiao1981 · 2016-08-26T18:01:08Z

Jenkins, retest this please.

SparkQA · 2016-08-26T20:13:56Z

Test build #64491 has finished for PR 14433 at commit 6e66b5d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2016-08-26T20:23:32Z

Test build #64493 has finished for PR 14433 at commit fb16375.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

wangmiao1981 · 2016-09-01T06:01:33Z

Any further comments? Thanks!

srowen · 2016-09-01T07:21:14Z

core/src/main/scala/org/apache/spark/internal/Logging.scala

@@ -135,7 +136,12 @@ private[spark] trait Logging {
        val replLevel = Option(replLogger.getLevel()).getOrElse(Level.WARN)
        if (replLevel != rootLogger.getEffectiveLevel()) {
          System.err.printf("Setting default log level to \"%s\".\n", replLevel)
-          System.err.println("To adjust logging level use sc.setLogLevel(newLevel).")
+          if (SparkSubmit.isRShell) {


I personally think it's a bit ugly to pipe through this info with extra methods and so on. It seems simpler just to amend the message to also state the right command for sparkr.

+1 -- I personally find it odd that we would import deploy.SparkSubmit into Logging which is a base class used throughout the code base. We could also change the log message to be more vague. Something like To adjust logging level please call setLogLevel(newLevel) using SparkContext might work for all languages ?

How about I change the log like To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel)? Thanks!

Sounds good to me

…tializing log

wangmiao1981 · 2016-09-02T06:26:20Z

@shivaram I simplified the solution by only changing the message as we discussed. Thanks!

SparkQA · 2016-09-02T08:43:26Z

Test build #64839 has finished for PR 14433 at commit ef8887d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2016-09-02T16:29:45Z

LGTM. Thanks @wangmiao1981 -- I'll keep this open for a bit to see if there are any more comments.

srowen · 2016-09-03T08:50:29Z

core/src/main/scala/org/apache/spark/internal/Logging.scala

@@ -135,7 +135,8 @@ private[spark] trait Logging {
        val replLevel = Option(replLogger.getLevel()).getOrElse(Level.WARN)
        if (replLevel != rootLogger.getEffectiveLevel()) {
          System.err.printf("Setting default log level to \"%s\".\n", replLevel)
-          System.err.println("To adjust logging level use sc.setLogLevel(newLevel).")
+          System.err.println("To adjust logging level use sc.setLogLevel(newLevel). " +
+          "For SparkR, use setLogLevel(newLevel).")


While I'd indent this continuation line, I don't think it's worth worrying about. LGTM

I fixed this up during the merge

shivaram · 2016-09-03T20:58:00Z

Merged into master

wangmiao1981 force-pushed the sc branch from 9d3d8e5 to a78d354 Compare August 2, 2016 18:48

wangmiao1981 force-pushed the sc branch from a78d354 to ad88977 Compare August 15, 2016 21:47

felixcheung reviewed Aug 16, 2016
View reviewed changes

wangmiao1981 force-pushed the sc branch from 80914e2 to 6e66b5d Compare August 26, 2016 17:50

srowen reviewed Sep 1, 2016
View reviewed changes

wangmiao1981 added 7 commits September 1, 2016 23:10

rename setLogLevel

2e73994

revert change of rename function name in R; add a case match when ini…

1a2093a

…tializing log

add Enum and API for checking shell type

ca4094e

fix scala style error

1b90e97

revert previous change and add a simple function; manually tested

89d2b81

add blankline

f10d547

simplify the solution

ef8887d

wangmiao1981 force-pushed the sc branch from fb16375 to ef8887d Compare September 2, 2016 06:25

srowen reviewed Sep 3, 2016
View reviewed changes

asfgit closed this in e9b58e9 Sep 3, 2016

[SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work #14433

[SPARK-16829][SparkR]:sparkR sc.setLogLevel doesn't work #14433

Conversation

wangmiao1981 commented Aug 1, 2016

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Aug 1, 2016

shivaram commented Aug 1, 2016

wangmiao1981 commented Aug 1, 2016

felixcheung commented Aug 1, 2016

wangmiao1981 commented Aug 1, 2016

felixcheung commented Aug 1, 2016

wangmiao1981 commented Aug 2, 2016

SparkQA commented Aug 2, 2016

wangmiao1981 commented Aug 2, 2016

wangmiao1981 commented Aug 2, 2016

SparkQA commented Aug 2, 2016

wangmiao1981 commented Aug 2, 2016

felixcheung commented Aug 3, 2016

wangmiao1981 commented Aug 3, 2016

felixcheung commented Aug 3, 2016 via email

wangmiao1981 commented Aug 3, 2016

felixcheung commented Aug 3, 2016

wangmiao1981 commented Aug 3, 2016

wangmiao1981 commented Aug 3, 2016

wangmiao1981 commented Aug 4, 2016

wangmiao1981 commented Aug 9, 2016

felixcheung commented Aug 12, 2016

wangmiao1981 commented Aug 12, 2016

SparkQA commented Aug 15, 2016

SparkQA commented Aug 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangmiao1981 commented Aug 23, 2016

felixcheung commented Aug 25, 2016

srowen commented Aug 25, 2016

felixcheung commented Aug 25, 2016

wangmiao1981 commented Aug 25, 2016

wangmiao1981 commented Aug 26, 2016

SparkQA commented Aug 26, 2016

SparkQA commented Aug 26, 2016

wangmiao1981 commented Sep 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wangmiao1981 commented Sep 2, 2016

SparkQA commented Sep 2, 2016

shivaram commented Sep 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivaram commented Sep 3, 2016