Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24418][Build] Upgrade Scala to 2.11.12 and 2.12.6 #21495

Closed
wants to merge 8 commits into from

Conversation

dbtsai
Copy link
Member

@dbtsai dbtsai commented Jun 5, 2018

What changes were proposed in this pull request?

Scala is upgraded to 2.11.12 and 2.12.6.

We used loadFIles() in ILoop as a hook to initialize the Spark before REPL sees any files in Scala 2.11.8. However, it was a hack, and it was not intended to be a public API, so it was removed in Scala 2.11.12.

From the discussion in Scala community, scala/bug#10913 , we can use initializeSynchronous to initialize Spark instead. This PR implements the Spark initialization there.

However, in Scala 2.11.12's ILoop.scala, in function def startup(), the first thing it calls is printWelcome(). As a result, Scala will call printWelcome() and splash before calling initializeSynchronous.

Thus, the Spark shell will allow users to type commends first, and then show the Spark UI URL. It's working, but it will change the Spark Shell interface as the following.

  apache-spark git:(scala-2.11.12)  ./bin/spark-shell 
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
      /_/
         
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161)
Type in expressions to have them evaluated.
Type :help for more information.

scala> Spark context Web UI available at http://192.168.1.169:4040
Spark context available as 'sc' (master = local[*], app id = local-1528180279528).
Spark session available as 'spark'.


scala> 

It seems there is no easy way to inject the Spark initialization code in the proper place as Scala doesn't provide a hook. Maybe @som-snytt can comment on this.

The following command is used to update the dep files.

./dev/test-dependencies.sh --replace-manifest

How was this patch tested?

Existing tests

*/
override def initializeSynchronous(): Unit = {
super.initializeSynchronous()
initializeSpark()
Copy link
Member Author

@dbtsai dbtsai Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@som-snytt It's working, but I'm wondering if I'm doing it correctly.

In this case, I'll use $intp without checking

if (intp.reporter.hasErrors) {
  echo("Interpreter encountered errors during initialization!")
  null
} 

in iLoop.scala. Of course, I can add the checking in our own code, but it doesn't look right to me.

And intp.quietBind(NamedParam[IMain]("$intp", intp)(tagOfIMain, classTag[IMain])) will not be executed before our custom Spark initialization code.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my environment (with build/sbt), I'm hitting NoSuchMethodError like the following. Did you see something like this?

~/PR-21495:PR-21495$ bin/spark-shell
18/06/07 23:39:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
      /_/

Using Scala version 2.11.12 (OpenJDK 64-Bit Server VM, Java 1.8.0_171)
Type in expressions to have them evaluated.
Type :help for more information.

scala> Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1528414746558).
Spark session available as 'spark'.
Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V
        at scala.tools.nsc.interpreter.jline.JLineConsoleReader.initCompletion(JLineReader.scala:139)
        at scala.tools.nsc.interpreter.jline.InteractiveReader.postInit(JLineReader.scala:54)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like jline version mismatch.

  • Spark 2.4.0-SNAPSHOT uses 2.12.1
  • Scala 2.11.12 uses 2.14.3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is working for me. I got scala-compiler-2.11.12.jar in my classpath. Can you do a clean build?

Copy link
Member

@dongjoon-hyun dongjoon-hyun Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Of course, it's a clean clone and build of this PR. Accoring to the error message and the followings, we need jline-2.14.3.jar because scala uses the setPrintSpaceAfterFullCompletion API of higher version of jline. Could you confirm this, @som-snytt ?

$ java -version
openjdk version "1.8.0_171"
OpenJDK Runtime Environment (build 1.8.0_171-8u171-b11-0ubuntu0.18.04.1-b11)
OpenJDK 64-Bit Server VM (build 25.171-b11, mixed mode)

$ javap -cp jline-2.12.1.jar jline.console.completer.CandidateListCompletionHandler
Compiled from "CandidateListCompletionHandler.java"
public class jline.console.completer.CandidateListCompletionHandler implements jline.console.completer.CompletionHandler {
  public jline.console.completer.CandidateListCompletionHandler();
  public boolean complete(jline.console.ConsoleReader, java.util.List<java.lang.CharSequence>, int) throws java.io.IOException;
  public static void setBuffer(jline.console.ConsoleReader, java.lang.CharSequence, int) throws java.io.IOException;
  public static void printCandidates(jline.console.ConsoleReader, java.util.Collection<java.lang.CharSequence>) throws java.io.IOException;
}

$ javap -cp jline-2.14.3.jar  jline.console.completer.CandidateListCompletionHandler
Compiled from "CandidateListCompletionHandler.java"
public class jline.console.completer.CandidateListCompletionHandler implements jline.console.completer.CompletionHandler {
  public jline.console.completer.CandidateListCompletionHandler();
  public boolean getPrintSpaceAfterFullCompletion();
  public void setPrintSpaceAfterFullCompletion(boolean);
  public boolean isStripAnsi();
  public void setStripAnsi(boolean);
  public boolean complete(jline.console.ConsoleReader, java.util.List<java.lang.CharSequence>, int) throws java.io.IOException;
  public static void setBuffer(jline.console.ConsoleReader, java.lang.CharSequence, int) throws java.io.IOException;
  public static void printCandidates(jline.console.ConsoleReader, java.util.Collection<java.lang.CharSequence>) throws java.io.IOException;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we upgrade to the corresponding jline version together in this PR, @dbtsai ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completion was upgraded since the old days; also, other bugs required updating jline. There is interest in upgrading to jline 3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirming, @som-snytt .

Copy link
Member Author

@dbtsai dbtsai Jun 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I know why it works for me.

I just run the following without building a distribution. jline was automatically resolved to newer version.

./build/sbt clean package
./bin/spark-shell

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91474 has finished for PR 21495 at commit de790fd.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit)

@viirya
Copy link
Member

viirya commented Jun 5, 2018

retest this please.

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91476 has finished for PR 21495 at commit de790fd.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit)

@som-snytt
Copy link

som-snytt commented Jun 5, 2018

Your best bet for synchronous startup is to do everything in printWelcome: createInterpreter, your init commands that produce output. IMain has a compiler that is initialized lazily, so I don't think you have to explicitly intp.initializeSynchronous. createInterpreter will be called again; you'll want to detect that and make it a no-op.

It looks like Scala REPL is moving toward a different architecture in 2.13, with loosely coupled front and back ends.

@dbtsai
Copy link
Member Author

dbtsai commented Jun 5, 2018

retest this please

@SparkQA
Copy link

SparkQA commented Jun 5, 2018

Test build #91491 has finished for PR 21495 at commit de790fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit)

@dongjoon-hyun
Copy link
Member

Retest this please.

@dbtsai
Copy link
Member Author

dbtsai commented Jun 8, 2018

@som-snytt initialize it in printWelcome will not work since in order version of Scala, printWelcome is the last one to be executed.

@som-snytt
Copy link

som-snytt commented Jun 8, 2018

@dbtsai 2.11 looks similar to 2.12. Do you mean you want the same technique on 2.10? I would not expect to find a single hook for all versions.

Edit: sorry, I see it was a mid-2.11 change.

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91538 has finished for PR 21495 at commit de790fd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit)

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91541 has finished for PR 21495 at commit f91d75a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

class SparkILoopInterpreter(settings: Settings, out: JPrintWriter) extends IMain(settings, out) {
self =>
class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, initializeSpark: () => Unit)
extends IMain(settings, out) { self =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: two spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought for extends, it's four spaces?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, four spaces is OK.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely two spaces after a period. I've been wanting to make that joke, but held off.

Copy link
Member

@HyukjinKwon HyukjinKwon Jun 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guys, parameters are listed in 4 spaces and other keywords after that are lined up with 2 spaces, which is written in https://github.com/databricks/scala-style-guide#spacing-and-indentation

In case of two lines, it's not explicitly written but wouldn't we better stick to the example as possible as we can? "Use 2-space indentation in general.".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, there's a bunch of code in Spark where both 4-space and 2-space indentation are existed. From my thought it is not so worthy to stick to this tiny point.

If you want to fix it, then creating a separate PR to fix all of them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can fix them in place. There are many other style nits too. It's not worth swiping them making the backport harder.

I am fine with ignoring such nits and they don't block this PR but it's not something we should say the opposite side is okay.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really not worthy to block on this thing (if it is the only issue).

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91561 has finished for PR 21495 at commit 631ef48.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91559 has finished for PR 21495 at commit c1ffd0b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 8, 2018

Test build #91579 has finished for PR 21495 at commit 96e87c2.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 9, 2018

Test build #91580 has finished for PR 21495 at commit 4c852fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jerryshao
Copy link
Contributor

jerryshao commented Jun 11, 2018

Having issues tested with latest patch:

NVM, just using the old patch...

@dbtsai
Copy link
Member Author

dbtsai commented Jun 11, 2018

I decided to remove the hack I put in to get the Spark UI consistent because this hack will bring in more problems.

@som-snytt Is it possible to move the printWelcome and splash.start() to some place after initializeSynchronous() is called?

@SparkQA
Copy link

SparkQA commented Jun 11, 2018

Test build #91650 has finished for PR 21495 at commit 82ca5f6.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 25, 2018

Test build #92292 has finished for PR 21495 at commit 82ca5f6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor

@jerryshao jerryshao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@felixcheung felixcheung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixcheung
Copy link
Member

db might be out for a bit. we should merge it if it looks/tested good?

@jerryshao
Copy link
Contributor

OK, I'm going to merge it. We can fix the following issues if exists.

@HyukjinKwon
Copy link
Member

LGTM too.

@asfgit asfgit closed this in c7967c6 Jun 26, 2018
@dbtsai
Copy link
Member Author

dbtsai commented Jun 27, 2018

I was on a family leave for couple weeks. Thank you all for helping out and merging it.

The only change with this PR is that the welcome message will be printed first, and then the Spark URL will be shown latter. It's a minor difference.

I had an offline discussion with @adriaanm in Scala community. To overcome this issue,
he suggested we can override the entire def process(settings: Settings): Boolean to put our initialization code. I have a working implementation of this, but this will copy a lot of code from Scala to just get the printing order right. If we decide to have a consistent printing order, I can submit a separate PR for this.

He is open to working with us to add proper hook in Scala so we don't need use hacks to initialize our code.

Thanks.

@dbtsai dbtsai deleted the scala-2.11.12 branch June 27, 2018 15:21
@felixcheung
Copy link
Member

felixcheung commented Jun 27, 2018 via email

@jerryshao
Copy link
Contributor

The only change with this PR is that the welcome message will be printed first, and then the Spark URL will be shown latter. It's a minor difference.

I think we should create a JIRA to create this change, also evaluate the necessity to fix this issue.

If this behavior change will affect some users, maybe we should move the target version to 3.0.0? @dbtsai @felixcheung

@felixcheung
Copy link
Member

felixcheung commented Jun 28, 2018 via email

@dongjoon-hyun
Copy link
Member

In addition to that, it woud be great if we fix sbt soon. After this PR, mvn works correctly, but sbt is still hitting NoSuchMethodError in master branch.

$ ./build/sbt -Pyarn -Phadoop-2.7 -Phadoop-cloud -Phive -Phive-thriftserver -Psparkr test:package
$ bin/spark-shell
scala> Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1530385877441).
Spark session available as 'spark'.
Exception in thread "main" java.lang.NoSuchMethodError: jline.console.completer.CandidateListCompletionHandler.setPrintSpaceAfterFullCompletion(Z)V

@jerryshao
Copy link
Contributor

@dongjoon-hyun can you please create a JIRA to track this issue.

@dongjoon-hyun
Copy link
Member

Sure. https://issues.apache.org/jira/browse/SPARK-24715 is created now.

@dbtsai
Copy link
Member Author

dbtsai commented Jul 11, 2018

JIRA and PR are created to make sure the messages are printed in the right order.

https://issues.apache.org/jira/browse/SPARK-24785

#21749

curtishoward pushed a commit to twosigma/spark that referenced this pull request Jul 26, 2018
* Upgrade Scala to 2.11.12

- Modifies slightly the Spark REPL code to reflect internal changes in Scala tooling
- This code was ported from apache#21495

(cherry picked from commit 3e52a9160875ec5c145c4e9fa0106ff7d1f380b2)
asfgit pushed a commit that referenced this pull request Aug 22, 2018
…elcome message

## What changes were proposed in this pull request?

After #21495 the welcome message is printed first, and then Scala prompt will be shown before the Spark UI info is printed.

Although it's a minor issue, but visually, it doesn't look as nice as the existing behavior. This PR intends to fix it by duplicating the Scala `process` code to arrange the printing order. However, one variable is private, so reflection has to be used which is not desirable.

We can use this PR to brainstorm how to handle it properly and how Scala can change their APIs to fit our need.

## How was this patch tested?

Existing test

Closes #21749 from dbtsai/repl-followup.

Authored-by: DB Tsai <[email protected]>
Signed-off-by: DB Tsai <[email protected]>
curtishoward pushed a commit to twosigma/spark that referenced this pull request Aug 27, 2018
* Upgrade Scala to 2.11.12

- Modifies slightly the Spark REPL code to reflect internal changes in Scala tooling
- This code was ported from apache#21495

(cherry picked from commit 3e52a9160875ec5c145c4e9fa0106ff7d1f380b2)
(cherry picked from commit f5a3901)
curtishoward pushed a commit to twosigma/spark that referenced this pull request Aug 28, 2018
* Upgrade Scala to 2.11.12

- Modifies slightly the Spark REPL code to reflect internal changes in Scala tooling
- This code was ported from apache#21495

(cherry picked from commit 3e52a9160875ec5c145c4e9fa0106ff7d1f380b2)
(cherry picked from commit f5a3901)
(cherry picked from commit a72488d)
Willymontaz pushed a commit to Willymontaz/spark that referenced this pull request Sep 26, 2018
curtishoward pushed a commit to curtishoward/spark that referenced this pull request Oct 8, 2018
* Upgrade Scala to 2.11.12

- Modifies slightly the Spark REPL code to reflect internal changes in Scala tooling
- This code was ported from apache#21495

(cherry picked from commit 3e52a9160875ec5c145c4e9fa0106ff7d1f380b2)
(cherry picked from commit f5a3901)
(cherry picked from commit a72488d)
Willymontaz pushed a commit to Willymontaz/spark that referenced this pull request Jun 6, 2019
Willymontaz added a commit to criteo-forks/spark that referenced this pull request Jun 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants