-
Notifications
You must be signed in to change notification settings - Fork 118
SparkPi Example: java.nio.channels.UnresolvedAddressException #523
Comments
In the driver pod, the driver attempts to connect o |
Also can we try this with Scala's |
Just to add one more data point, I also saw this same error occurring during an integration test off #514. I ran it manually inside my IntelliJ and saw it happening. And the minikube had the
|
Can we double check that the headless service is actually being created in the cases where this error occurs? |
Sure, let me try to reproduce this and double-check if the headless service exists. |
@mccheah In my case, I do not have kube-dns running, I don't know how to enable it because we use Magnum for deploying the cluster. I did run the SparkPi example using Scala though, but no luck either. |
Can we see the stack trace for the Scala job? |
Actually, @kimoonkim has given us a stack trace already - would like to see if it's the same as what @apurva3000 is seeing. |
In DriverAddressConfigurationStep.scala, driver hostname is: But I think this should be passed by config . |
I just tried to reproduce this in the same way as before, but this time it succeeded. Here's
I wonder if we have a race condition here. If kube-dns is slow to add an entry for the headless service, then the driver fails while attempting to listen on the address. Maybe the driver should just use the underlying pod IP address instead of the service address for listening? (I don't know if this suggestion makes sense) |
re: @kimoonkim [@foxish doesn't the service address resolve to the appropriate pod IP address? And could it be the case that there is such a race condition? I don't understand also why it would sometimes fail and other times not] |
@mccheah I am seeing the same stack trace
Hope this helps. |
Regarding the race condition suggested by @kimoonkim, the submission client creates the driver pod before creating the headless service along with other Kubernetes resources the driver depends on. @apurva3000 as @mccheah suggested, can you run |
@liyinan926 Sure, that ordering exists. But does it mean the headless service DNS entry is actually created in |
@kimoonkim AFAIK, the endpoints controller watches for services and create corresponding endpoints when applicable. In this case, the headless service has selector so the endpoint controller will create the endpoints once it sees the service. |
But the endpoint backing the service in this case is the driver pod itself. So is it possible for whoever watches the endpoint or kube-dns can be slower than the the driver JVM and create a DNS entry too late? It's not the first time that slow DNS affected us. I won't be surprised. |
@kimoonkim I think it's possible that the driver pod starts and tries to bind to the host name derived from the service name before the endpoint controller creates the endpoint and modifies the DNS configuration if the latter is slow. |
@liyinan926 Okay I tried describing the service as well and this is what I get:
So, clearly the endpoints are blank in my output, what exactly am I missing here, the fact that I do not have kube-dns running? (And also the fact that I do not know how to enable that in OpenStack Magnum) |
Oh, if kube-dns is not running, then I think this won't work at all. Curious how come your cluster does not have kube-dns. I thought kube-dns is a required component for k8s these days. |
Yes, without the kube-dns addon, this won't work. The driver won't be able to resolve the fully-qualified driver service name ( |
@kimoonkim @mccheah @ifilonenko I think @kimoonkim's point above of using the underlying driver pod's IP address makes sense as anyway the derived driver host name ( |
I am trying to follow the official documentation for running the SparkPi Example and encountering the following problem.
We create the kubernetes cluster using Magnum (https://wiki.openstack.org/wiki/Magnum)
And run the example as per the following command:
Has anyone ran into this before? Which address does it refer to, considering I have already given it the master IP?
2017-10-11 16:35:52 INFO SparkContext:54 - Running Spark version 2.2.0-k8s-0.4.0 2017-10-11 16:35:52 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-10-11 16:35:52 INFO SparkContext:54 - Submitted application: PythonPi 2017-10-11 16:35:52 INFO SecurityManager:54 - Changing view acls to: root 2017-10-11 16:35:52 INFO SecurityManager:54 - Changing modify acls to: root 2017-10-11 16:35:52 INFO SecurityManager:54 - Changing view acls groups to: 2017-10-11 16:35:52 INFO SecurityManager:54 - Changing modify acls groups to: 2017-10-11 16:35:52 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 2017-10-11 16:35:53 ERROR SparkContext:91 - Error initializing SparkContext. java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:481) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:748) 2017-10-11 16:35:53 INFO SparkContext:54 - Successfully stopped SparkContext Traceback (most recent call last): File "/opt/spark/examples/src/main/python/pi.py", line 32, in <module> .appName("PythonPi")\ File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 169, in getOrCreate File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 334, in getOrCreate File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 118, in __init__ File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 180, in _do_init File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 273, in _initialize_context File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1401, in __call__ File "/opt/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:218) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:127) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:501) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1218) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:496) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:481) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:965) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:210) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:399) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131) at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144) at java.lang.Thread.run(Thread.java:748) Exception in thread "main" org.apache.spark.SparkUserAppException: User application exited with 1 at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:97) at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
The text was updated successfully, but these errors were encountered: