-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quiet Spark Logging #5
Comments
I wouldn't monkeypatch SparkContext. We should be talking to the pyspark folks about how to expose options like the log level. Presumably you should be able to do SparkContext(..., log_level=ERROR) I don't think we need to do too much deviating from PySpark's default behavior here, but if the defaults don't make sense, we should bring it up with PySpark rather than forcibly overriding it. |
That makes sense. Looking into it further, there is a setLogLevel on the SparkContext, but I don't see any way to set it in the constructor. That would be a nice-to-have that we can bring up with pyspark developers. It also appears that spark attempts to see if it is in a scala REPL and changes the logging accordingly if it is. The ideal fix would be to make it so that Spark recognizes that it is in a REPL in the python or IPython shell. I'll look into what that would entail. |
Agree with @minrk that the monkey-patching, while clever, isn't really optimal. I like the idea of having it as an option in the constructor. It looks like it's not in the Scala version either (see here) and generally they like to aim for parity, so the nicest patch might be adding it as an argument to both versions. I'd definitely suggest opening a JIRA ticket over on https://issues.apache.org/jira/browse/spark/ about adding it (and explaining the use case) and see what the other Spark devs think. If they're on board, we can put a patch together! |
I've created two issues on Spark's Jira. One is to add an option in the constructor to the Spark Context to change the logging level. The other is to use a different logging properties file when Spark detects that it is in the python REPL. This already occurs for the scala REPL, so it would just be bringing parity. |
There is a huge amount of logging by Spark by default which clutters up the terminal and confuses new users. Findspark should cut down on this logging. @freeman-lab recommended using the following to change the logging level at runtime:
This could be implemented in Findspark by monkey-patching the SparkContext like so:
This however feels like a fragile solution to me. We could instead modify the logger properties files at $SPARK_HOME/conf/log4j.properties but this changes the logging for all uses of Spark, and may be too heavyweight of a solution.
The text was updated successfully, but these errors were encountered: