Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid Spark driver IP binding failure due to VPN setting #866

Merged
merged 3 commits into from
Jan 28, 2025

Conversation

flyrain
Copy link
Contributor

@flyrain flyrain commented Jan 24, 2025

Fixed Spark local run issue like this:

24/09/01 23:00:54 ERROR Inbox: Ignoring error
java.lang.NullPointerException: **Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo"** is null
        at org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:677)    
        at org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:133)
        at org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
        at org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)


./setup.sh

if [ -z "${SPARK_HOME}"]; then
export SPARK_HOME=$(realpath ~/${SPARK_DISTRIBUTION})
fi

SPARK_BEARER_TOKEN="${REGTEST_ROOT_BEARER_TOKEN:-principal:root;realm:default-realm}"
SPARK_BEARER_TOKEN="${REGTEST_ROOT_BEARER_TOKEN:-principal:root;realm:polaris}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this change is optional. It passed without the change for me locally.

@flyrain
Copy link
Contributor Author

flyrain commented Jan 24, 2025

The test failed due to it use the test profile in application.properties. There are two ways to fix this

  1. Using a different profile or even different application.properties file for the tests.
  2. Creating a new profile for local run.

@jbonofre
Copy link
Member

When possible, we should use the same application.properties between the app and the tests.
Imho, it's fine to create a dedicated profile for test/local run.

@@ -130,6 +131,8 @@ polaris.authentication.token-broker.max-token-generation=PT1H
%test.quarkus.log.category."org.apache.polaris.service.storage.PolarisStorageIntegrationProviderImpl".level=ERROR
%test.quarkus.http.limits.max-body-size=1000000
%test.quarkus.otel.sdk.disabled=true
%test.polaris.authentication.authenticator.type=test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My idea was to do exactly the opposite: make all tests (automated + regression) use the default authenticator. Cf #804.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me.

@flyrain flyrain changed the title Fix the Polaris test credentials Avoid Jan 24, 2025
@flyrain flyrain changed the title Avoid Avoid Spark driver IP binding failure due to VPN setting Jan 24, 2025
@flyrain
Copy link
Contributor Author

flyrain commented Jan 24, 2025

Repurpose this PR as #804 will resolve the credential issue.

regtests/run_spark_sql.sh Outdated Show resolved Hide resolved
@@ -56,7 +56,7 @@ if [ -z "${SPARK_HOME}"]; then
export SPARK_HOME=$(realpath ~/${SPARK_DISTRIBUTION})
fi

SPARK_BEARER_TOKEN="${REGTEST_ROOT_BEARER_TOKEN:-principal:root;realm:polaris}"
SPARK_BEARER_TOKEN="${REGTEST_ROOT_BEARER_TOKEN:-principal:root;realm:default-realm}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change may not be needed anymore since I removed the default values for these environment variables in #804.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested again based on #804. This change is still needed for my local run. I think this is due a recent change of my local VPN setting. However I couldn't touch VPN settings by myself. Given this script is mainly used for local test, I think this change should be good.

Here is the error message without this change:

Exception in thread "main" java.io.IOException: Failed to send RPC /jars/software.amazon.awssdk_http-client-spi-2.23.19.jar to mac.lan/192.168.86.27:55366: io.netty.channel.StacklessClosedChannelException
	at org.apache.spark.network.client.TransportClient$2.handleFailure(TransportClient.java:164)
	at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:372)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:590)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:557)
	at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:492)
	at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:636)
	at io.netty.util.concurrent.DefaultPromise.setFailure0(DefaultPromise.java:629)
	at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:118)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:999)
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:860)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1367)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:877)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856)
	at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:113)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:881)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:863)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:968)
	at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:856)
	at io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:879)
	at io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:940)
	at io.netty.channel.AbstractChannelHandlerContext$WriteTask.run(AbstractChannelHandlerContext.java:1247)
	at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:566)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: io.netty.channel.StacklessClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(Object, ChannelPromise)(Unknown Source)

regtests/run.sh Outdated
@@ -26,6 +26,7 @@ if [ -z "${SPARK_HOME}" ]; then
export SPARK_HOME=$(realpath ~/${SPARK_DISTRIBUTION})
fi
export PYTHONPATH="${SPARK_HOME}/python/:${SPARK_HOME}/python/lib/py4j-0.10.9.7-src.zip:$PYTHONPATH"
export SPARK_LOCAL_HOSTNAME=localhost #avoid VPN messing up driver local IP address binding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after #

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -48,6 +48,7 @@ cd ${REGTEST_HOME}

export SPARK_VERSION=spark-3.5.3
export SPARK_DISTRIBUTION=${SPARK_VERSION}-bin-hadoop3
export SPARK_LOCAL_HOSTNAME=localhost #avoid VPN messing up driver local IP address binding
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after #

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@adutra
Copy link
Contributor

adutra commented Jan 28, 2025

I also have a VPN on but wasn't having issues. I checked that this change does not introduce any regressions on my end, so LGTM. :shipit:

@flyrain flyrain merged commit 2f664a3 into apache:main Jan 28, 2025
5 checks passed
@flyrain
Copy link
Contributor Author

flyrain commented Jan 28, 2025

Thanks @eric-maynard and @adutra for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants