Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 #37990

Closed
wants to merge 8 commits into from

Conversation

attilapiros
Copy link
Contributor

@attilapiros attilapiros commented Sep 25, 2022

What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

Why are the changes needed?

To keep up with kubernetes-client changes.
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit tests

Manual tests for submit and application management

Started an application in a non-default namespace (bla):

➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000

Check that we cannot find it in the default namespace even with glob without the namespace definition:

➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.

Then check we can find it by specifying the namespace:

➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z

Changing the namespace to bla with kubectl:

➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.

Checking we can find it without specifying the namespace (and glob):

➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z

Killing the app:

➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.

@attilapiros
Copy link
Contributor Author

The inNamespace calls are added because of the namespace changes and to have more explicit namespace usage.

@attilapiros attilapiros changed the title [WIP][SPARK-40458] Bump Kubernetes Client Version to 6.1.1 [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 Sep 25, 2022
@@ -115,7 +115,10 @@ private[spark] object SparkKubernetesClientFactory extends Logging {
}
logDebug("Kubernetes client config: " +
new ObjectMapper().writerWithDefaultPrettyPrinter().writeValueAsString(config))
new DefaultKubernetesClient(factoryWithCustomDispatcher.createHttpClient(config), config)
new KubernetesClientBuilder()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of apiimpl-split:

When you rely solely on a compile dependency to the respective -api dependencies you will not be able to use DefaultKubernetesClient nor DefaultOpenShiftClient directly to create your client instances. You should instead use KubernetesClientBuilder.

@@ -144,14 +134,13 @@ private[spark] class K8SSparkSubmitOperation extends SparkSubmitOperation
kubernetesClient
.pods
}
ops.withLabel(SPARK_ROLE_LABEL, SPARK_POD_DRIVER_ROLE)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the label based filtering as early as possible.

.withName(updatedPod.getMetadata.getName)
.patch(inactivatedPod)
.edit(executorInactivationFn)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see the edit is preferred over the patch, see kubernetes-client-dsl-usage

@@ -101,18 +114,19 @@ class K8sSubmitOpSuite extends SparkFunSuite with BeforeAndAfter {
implicit val kubeClient: KubernetesClient = kubernetesClient
val killApp = new KillApplication
val conf = new SparkConf().set(KUBERNETES_SUBMIT_GRACE_PERIOD, 1L)
when(driverPodOperations1.withGracePeriod(1L)).thenReturn(driverPodOperations1)
doReturn(deletable).when(driverPodOperations1).withGracePeriod(1L)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit tricky. Because of some generic args the when()..doReturn() is not working but this doReturn()..when() does:

https://stackoverflow.com/questions/15942880/mocking-a-method-that-return-generics-with-wildcard-using-mockito

@@ -75,6 +75,11 @@
<scope>test</scope>
</dependency>

<dependency>
Copy link
Contributor Author

@attilapiros attilapiros Sep 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/fabric8io/kubernetes-client/blob/master/doc/MIGRATION-v6.md#okhttp-httpclient:

The -client dependencies still default to the OkHttp client If you are doing any customization to OkHttp clients directly, you'll need to include the kubernetes-httpclient-okhttp dependency in the compile scope - instead of the default runtime scope

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @attilapiros !

@dongjoon-hyun
Copy link
Member

Although 6.1.1 is intrusive, this patch looks solid. Is there any other reason why this is still WIP, @attilapiros ?

@attilapiros
Copy link
Contributor Author

attilapiros commented Sep 27, 2022

Although 6.1.1 is intrusive, this patch looks solid. Is there any other reason why this is still WIP, @attilapiros ?

@dongjoon-hyun I just executed some manual tests regarding the spark-submit/application management (actually I have found a bug and corrected it). The test steps are in the description.

@attilapiros attilapiros changed the title [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 Sep 27, 2022
@attilapiros
Copy link
Contributor Author

I would like to go through one more time to find all the places where we can specify the namespace.

@attilapiros attilapiros changed the title [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 Sep 27, 2022
@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Sep 27, 2022

Thanks. Please ping me when it's ready, @attilapiros .

conf.get(KUBERNETES_EXECUTOR_DECOMMISSION_LABEL_VALUE).getOrElse(""))
.endMetadata()
.build()})
.resources()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the KUBERNETES_EXECUTOR_DECOMMISSION_LABEL_VALUE label was earlier done in two phases:

  • Using list().getItems().asScala which resulted into a Scala List of Pod.
  • Iterating over the list and getting the Pods again one by one using the name (using withName) and namespace (using inNamespace )

Now with the resources() and forEach the above two is merged into a single DSL expression.

@attilapiros attilapiros changed the title [WIP][SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1 Sep 29, 2022
@attilapiros
Copy link
Contributor Author

@dongjoon-hyun it is ready!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @attilapiros .
Merged to master for Apache Spark 3.4.0.

@Yikun
Copy link
Member

Yikun commented Oct 1, 2022

Late LGTM. I also test it with volcano:

[info] VolcanoSuite:
[info] - Run SparkPi with volcano scheduler (8 seconds, 946 milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minCPU (21 seconds, 480 milliseconds)
[info] - SPARK-38187: Run SparkPi Jobs with minMemory (22 seconds, 384 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (only 1 enabled) (12 seconds, 462 milliseconds)
[info] - SPARK-38188: Run SparkPi jobs with 2 queues (all enabled) (20 seconds, 431 milliseconds)
[info] - SPARK-38423: Run driver job to validate priority order (15 seconds, 238 milliseconds)
[info] Run completed in 1 minute, 46 seconds.
[info] Total number of tests run: 6
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 6, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
[success] Total time: 132 s (02:12), completed 2022-10-1 9:37:41

senthh pushed a commit to acceldata-io/spark3 that referenced this pull request Sep 12, 2024
### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)
shubhluck pushed a commit to acceldata-io/spark3 that referenced this pull request Sep 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 12, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Nov 13, 2024
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
senthh added a commit to acceldata-io/spark3 that referenced this pull request Feb 20, 2025
* ODP-2189 Upgrade snakeyaml version to 2.0

* [SPARK-35579][SQL] Bump janino to 3.1.7

### What changes were proposed in this pull request?

upgrade janino to 3.1.7 from 3.0.16

### Why are the changes needed?

- The proposed version contains bug fix in janino by maropu.
   - janino-compiler/janino#148
- contains `getBytecodes` method which can be used to simplify the way to get bytecodes from ClassBodyEvaluator in CodeGenerator#updateAndGetCompilationStats method. (by LuciferYang)
   - apache#32536

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing UTs

Closes apache#37202 from singhpk234/upgrade/bump-janino.

Authored-by: Prashant Singh <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 29ed337)

* [SPARK-40633][BUILD] Upgrade janino to 3.1.9

### What changes were proposed in this pull request?
This pr aims upgrade janino from 3.1.7 to 3.1.9

### Why are the changes needed?
This version bring some improvement and bug fix, and janino 3.1.9 will no longer test Java 12, 15, 16 because these STS versions have been EOL:

- janino-compiler/janino@v3.1.7...v3.1.9

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions
- Manual test this pr with Scala 2.13, all test passed

Closes apache#38075 from LuciferYang/SPARK-40633.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Sean Owen <[email protected]>

(cherry picked from commit 49e102b)

* ODP-2167 Upgrade janino version from 3.1.9 to 3.1.10

* ODP-2190 Upgrade guava version to 32.1.3-jre

* ODP-2193 Upgrade jettison version to 1.5.4

* ODP-2194 Upgrade wildfly-openssl version to 1.1.3

* ODP-2198 Upgrade gson version to 2.11.0

* ODP-2199 Upgrade kryo-shaded version to 4.0.3

* ODP-2200 Upgrade datanucleus-core and datanucleus-rdbms versions to 5.2.3

* ODP-2203 Upgrade Snappy and common-compress to 1.1.10.4 and 1.26.0 respectively

* ODP-2198 Excluded gson from tink library

* ODP-2205 Upgrade jdom2 to 2.0.6.1

* ODP-2198 Excluded gson from hive-exec

* ODP-2175|SPARK-47018 Upgrade libthrift version and hive version

* [SPARK-39688][K8S] `getReusablePVCs` should handle accounts with no PVC permission

### What changes were proposed in this pull request?

This PR aims to handle `KubernetesClientException` in `getReusablePVCs` method to handle gracefully the cases where accounts has no PVC permission including `listing`.

### Why are the changes needed?

To prevent a regression in Apache Spark 3.4.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs with the newly added test case.

Closes apache#37095 from dongjoon-hyun/SPARK-39688.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 79f133b)

* [SPARK-40458][K8S] Bump Kubernetes Client Version to 6.1.1

### What changes were proposed in this pull request?

Bump kubernetes-client version from 5.12.3 to 6.1.1 and clean up all the deprecations.

### Why are the changes needed?

To keep up with kubernetes-client [changes](fabric8io/kubernetes-client@v5.12.3...v6.1.1).
As this is an upgrade where the main version changed I have cleaned up all the deprecations.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

#### Unit tests

#### Manual tests for submit and application management

Started an application in a non-default namespace (`bla`):

```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit \
    --master k8s://http://127.0.0.1:8001 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=5 \
    --conf spark.kubernetes.namespace=bla \
    --conf spark.kubernetes.container.image=docker.io/kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D \
    local:///opt/spark/examples/jars/spark-examples_2.12-3.4.0-SNAPSHOT.jar 200000
```

Check that we cannot find it in the default namespace even with glob without the namespace definition:

```
➜  spark git:(SPARK-40458) ✗ minikube kubectl -- config set-context --current --namespace=default
Context "minikube" modified.
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
No applications found.
```

Then check we can find it by specifying the namespace:
```
➜  spark git:(SPARK-40458) ✗ ./bin/spark-submit --status "bla:spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission bla:spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Changing the namespace to `bla` with `kubectl`:

```
➜  spark git:(SPARK-40458) ✗  minikube kubectl -- config set-context --current --namespace=bla
Context "minikube" modified.
```

Checking we can find it without specifying the namespace (and glob):
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --status "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request for the status of submission spark-pi-* in k8s://http://127.0.0.1:8001.
Application status (driver):
         pod name: spark-pi-4c4e70837c86ae1a-driver
         namespace: bla
         labels: spark-app-name -> spark-pi, spark-app-selector -> spark-c95a9a0888214c01a286eb7ba23980a0, spark-role -> driver, spark-version -> 3.4.0-SNAPSHOT
         pod uid: 0be8952e-3e00-47a3-9082-9cb45278ed6d
         creation time: 2022-09-27T01:19:06Z
         service account name: default
         volumes: spark-local-dir-1, spark-conf-volume-driver, kube-api-access-wxnqw
         node name: minikube
         start time: 2022-09-27T01:19:06Z
         phase: Running
         container status:
                 container name: spark-kubernetes-driver
                 container image: kubespark/spark:3.4.0-SNAPSHOT_064A99CC-57AF-46D5-B743-5B12692C260D
                 container state: running
                 container started at: 2022-09-27T01:19:07Z
```

Killing the app:
```
➜  spark git:(SPARK-40458) ✗  ./bin/spark-submit --kill "spark-pi-*" --master k8s://http://127.0.0.1:8001
Submitting a request to kill submission spark-pi-* in k8s://http://127.0.0.1:8001. Grace period in secs: not set.
Deleting driver pod: spark-pi-4c4e70837c86ae1a-driver.
```

Closes apache#37990 from attilapiros/SPARK-40458.

Authored-by: attilapiros <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit fa88651)

* [SPARK-36462][K8S] Add the ability to selectively disable watching or polling

### What changes were proposed in this pull request?

Add the ability to selectively disable watching or polling

Updated version of apache#34264

### Why are the changes needed?

Watching or polling for pod status on Kubernetes can place additional load on etcd, with a large number of executors and large number of jobs this can have negative impacts and executors register themselves with the driver under normal operations anyways.

### Does this PR introduce _any_ user-facing change?

Two new config flags.

### How was this patch tested?

New unit tests + manually tested a forked version of this on an internal cluster with both watching and polling disabled.

Closes apache#36433 from holdenk/SPARK-36462-allow-spark-on-kube-to-operate-without-watchers.

Lead-authored-by: Holden Karau <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

(cherry picked from commit 5bffb98)

* ODP-2201|SPARK-48867 Upgrade okhttp to 4.12.0, okio to 3.9.0 and esdk-obs-java to 3.24.3

* [SPARK-41958][CORE][3.3] Disallow arbitrary custom classpath with proxy user in cluster mode

Backporting fix for SPARK-41958 to 3.3 branch from apache#39474
Below description from original PR.

--------------------------

### What changes were proposed in this pull request?

This PR proposes to disallow arbitrary custom classpath with proxy user in cluster mode by default.

### Why are the changes needed?

To avoid arbitrary classpath in spark cluster.

### Does this PR introduce _any_ user-facing change?

Yes. User should reenable this feature by `spark.submit.proxyUser.allowCustomClasspathInClusterMode`.

### How was this patch tested?

Manually tested.

Closes apache#39474 from Ngone51/dev.

Lead-authored-by: Peter Toth <peter.tothgmail.com>
Co-authored-by: Yi Wu <yi.wudatabricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223apache.org>

(cherry picked from commit 909da96)

### What changes were proposed in this pull request?

### Why are the changes needed?

### Does this PR introduce _any_ user-facing change?

### How was this patch tested?

Closes apache#41428 from degant/spark-41958-3.3.

Lead-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>

* ODP-2049 Changing Spark3 version from 3.3.3.3.2.3.2-2 to 3.3.3.3.2.3.2-201

* ODP-2049 Changing libthrift version to 0.16 in deps files

* ODP-2049 Changing derby version to 10.14.3.0

---------

Signed-off-by: Dongjoon Hyun <[email protected]>
Co-authored-by: Prashant Singh <[email protected]>
Co-authored-by: yangjie01 <[email protected]>
Co-authored-by: Dongjoon Hyun <[email protected]>
Co-authored-by: attilapiros <[email protected]>
Co-authored-by: Holden Karau <[email protected]>
Co-authored-by: Degant Puri <[email protected]>
Co-authored-by: Peter Toth <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants