Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loadFiles() in ILoop was removed in Scala 2.11.12 which breaks Apache Spark #10913

Closed
dbtsai opened this issue May 29, 2018 · 11 comments
Closed

Comments

@dbtsai
Copy link

dbtsai commented May 29, 2018

We're migrating to Scala 2.11.12 for JDK9+ support, and we found that loadFIles() in ILoop was removed in Scala 2.11.12.

We use it as a hack to initialize the Spark before REPL sees any files.

https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L109

Is it possible to provide a hook for this?

Thanks.

@som-snytt
Copy link

Unfortunately, loadInitFiles is a local function. But you can do what sbt console does, and run your init during your override of createInterpreter. The downside for sbt users is that $intp is not available; if that is deemed useful, you can add that, too; ILoop checks isSbt to see how to handle errors at that point. It's safe to intp.initializeSynchronous.

Alternatively, you can externalize the init commands and set the system property scala.repl.autoruncode, which is read in before those -i user files. I don't know that anyone uses the system property. You could check in createInterpreter if it is set, and then merge it with your init in a tmp file. That might be more worthy to be called hack among hackers.

@dbtsai
Copy link
Author

dbtsai commented May 31, 2018

Thanks for the reply.

I tried two of the ideas you pointed out without success.

  1. Override createInterpreter, but it raises NullPointerException because $intp is not available yet.

  2. Initialize the Spark in initializeSynchronous by

class SparkILoop(in0: Option[BufferedReader], out: JPrintWriter)
    extends ILoop(in0, out) {
  ...
  override def createInterpreter(): Unit = {
    intp = new SparkILoopInterpreter(settings, out, initializeSpark)
  }

  def initializeSpark() {
    intp.beQuietDuring {
      savingReplayStack { // remove the commands from session history.
        initializationCommands.foreach(processLine)
      }
    }
  }
  ...
}

and

class SparkILoopInterpreter(settings: Settings, out: JPrintWriter, sparkInitFun: () => Unit)
  extends IMain(settings, out) {
  self =>

  override def initializeSynchronous(): Unit = {
    super.initializeSynchronous()
    sparkInitFun()
  }
  ...
}

this will hang forever.

Can you elaborate more on the alternative option?

Thanks.

@som-snytt
Copy link

If you need $intp in the init files, and who doesn't, have you tried: https://github.com/scala/scala/blob/2.12.x/src/repl/scala/tools/nsc/interpreter/ILoop.scala#L996

I'll try to get involved between now and the weekend, it's been on my bucket list.

@dbtsai
Copy link
Author

dbtsai commented May 31, 2018

FYI, the original Spark REPL runs initializeSpark() after loopPostInit() and before loadFiles(). Seems in Scala 2.11.12, it's kind of impossible unless scala.repl.autoruncode can do this work. Thanks.

@dbtsai
Copy link
Author

dbtsai commented May 31, 2018

@som-snytt are you saying calling intp.quietBind(NamedParam[IMain]("$intp", intp)(tagOfIMain, classTag[IMain])) in def createInterpreter(): Unit to get a $intp? Thanks.

@som-snytt
Copy link

@dbtsai Apologies, I ran out of time this weekend because of IKEA. I think I did mean to bind $intp early if you need it, but I haven't tried it yet. Sorry I don't have something better to say. I'll try again when the opportunity presents itself.

@dbtsai
Copy link
Author

dbtsai commented Jun 5, 2018

@som-snytt I got it working in apache/spark#21495

I have couple questions asked there, and would like to get your feedback. Thanks.

@som-snytt
Copy link

@dbtsai That's fantabulous! I commented there. The upshot is, after deprecating it, because it wasn't used, I have to revive -Yrepl-sync. Long live -Yrepl-sync!

@srowen
Copy link

srowen commented Jun 7, 2018

@dbtsai et al -- see https://issues.apache.org/jira/browse/SPARK-20395 and apache/spark#17982 . The internal change in question happened about 2.11.10 or so if it matters. Reenabling the hack is great I suppose but Spark was always using internal methods that weren't exposed.

HyukjinKwon pushed a commit to HyukjinKwon/spark that referenced this issue Jun 26, 2018
## What changes were proposed in this pull request?

Scala is upgraded to `2.11.12` and `2.12.6`.

We used `loadFIles()` in `ILoop` as a hook to initialize the Spark before REPL sees any files in Scala `2.11.8`. However, it was a hack, and it was not intended to be a public API, so it was removed in Scala `2.11.12`.

From the discussion in Scala community, scala/bug#10913 , we can use `initializeSynchronous` to initialize Spark instead. This PR implements the Spark initialization there.

However, in Scala `2.11.12`'s `ILoop.scala`, in function `def startup()`, the first thing it calls is `printWelcome()`. As a result, Scala will call `printWelcome()` and `splash` before calling `initializeSynchronous`.

Thus, the Spark shell will allow users to type commends first, and then show the Spark UI URL. It's working, but it will change the Spark Shell interface as the following.

```scala
➜  apache-spark git:(scala-2.11.12) ✗ ./bin/spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0-SNAPSHOT
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_161)
Type in expressions to have them evaluated.
Type :help for more information.

scala> Spark context Web UI available at http://192.168.1.169:4040
Spark context available as 'sc' (master = local[*], app id = local-1528180279528).
Spark session available as 'spark'.

scala>
```

It seems there is no easy way to inject the Spark initialization code in the proper place as Scala doesn't provide a hook. Maybe som-snytt can comment on this.

The following command is used to update the dep files.
```scala
./dev/test-dependencies.sh --replace-manifest
```
## How was this patch tested?

Existing tests

Author: DB Tsai <[email protected]>

Closes apache#21495 from dbtsai/scala-2.11.12.
@SethTisue
Copy link
Member

should this ticket be closed now that apache/spark#21495 has gone through?

@dbtsai
Copy link
Author

dbtsai commented Jun 27, 2018

Thank you all for helping on this!

@dbtsai dbtsai closed this as completed Jun 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants