kubeflow · google-oss-prow · Jun 27, 2024 · Jun 25, 2024 · Jun 25, 2024 · Jun 27, 2024
diff --git a/docs/who-is-using.md → ADOPTERS.md b/docs/who-is-using.md → ADOPTERS.md
@@ -1,48 +1,50 @@
-## Who Is Using the Kubernetes Operator for Apache Spark?
+# Adopters of Kubeflow Spark Operator
+
+Below are the adopters of project Spark Operator. If you are using Spark Operator please add yourself into the following list by a pull request. Please keep the list in alphabetical order.
 
 | Organization | Contact (GitHub User Name) | Environment | Description of Use |
 | ------------- | ------------- | ------------- | ------------- |
-| [Caicloud](https://intl.caicloud.io/) | @gaocegege | Production | Cloud-Native AI Platform |
-| Microsoft (MileIQ) | @dharmeshkakadia | Production | AI & Analytics |
-| Lightbend | @yuchaoran2011 | Production | Data Infrastructure & Operations |
-| StackTome | @emiliauskas-fuzzy | Production | Data pipelines |
-| Salesforce | @khogeland | Production | Data transformation |
+| [Beeline](https://beeline.ru) | @spestua | Evaluation | ML & Data Infrastructure |
 | Bringg | @EladDolev | Production | ML & Analytics Data Platform |
-| [Siigo](https://www.siigo.com) | @Juandavi1 | Production | Data Migrations & Analytics Data Platform |
+| [Caicloud](https://intl.caicloud.io/) | @gaocegege | Production | Cloud-Native AI Platform |
+| Carrefour | @AliGouta | Production | Data Platform |
 | CERN|@mrow4a| Evaluation | Data Mining & Analytics |
-| Lyft |@kumare3| Evaluation | ML & Data Infrastructure |
-| MapR Technologies |@sarjeet2013| Evaluation | ML/AI & Analytics Data Platform |
-| Uber| @chenqin| Evaluation| Spark / ML |
-| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform |
-| Tencent | @runzhliu | Evaluation | ML Analytics Platform |
-| Exacaster | @minutis | Evaluation | Data pipelines |
-| Riskified | @henbh | Evaluation | Analytics Data Platform |
+| [CloudPhysics](https://www.cloudphysics.com) | @jkleckner | Production | ML/AI & Analytics |
 | CloudZone | @iftachsc | Evaluation | Big Data Analytics Consultancy |
 | Cyren | @avnerl | Evaluation | Data pipelines |
-| Shell (Agile Hub) | @TomLous | Production | Data pipelines |
-| Nielsen Identity Engine | @roitvt | Evaluation | Data pipelines |
+| [C2FO](https://www.c2fo.com/) | @vanhoale | Production | Data Platform / Data Infrastructure |
 | [Data Mechanics](https://www.datamechanics.co)  | @jrj-d | Production | Managed Spark Platform |
-| [PUBG](https://careers.pubg.com/#/en/) | @jacobhjkim | Production | ML & Data Infrastructure |
-| [Beeline](https://beeline.ru) | @spestua | Evaluation | ML & Data Infrastructure |
-| [Stitch Fix](https://multithreaded.stitchfix.com/) | @nssalian | Evaluation | Data pipelines |
-| [Typeform](https://typeform.com/) | @afranzi | Production | Data & ML pipelines |
-| incrmntal(https://incrmntal.com/) | @scravy | Production | ML & Data Infrastructure |
-| [CloudPhysics](https://www.cloudphysics.com) | @jkleckner | Production | ML/AI & Analytics |
-| [MongoDB](https://www.mongodb.com) | @chickenpopcorn | Production | Data Infrastructure |
-| [MavenCode](https://www.mavencode.com) | @charlesa101 | Production | MLOps & Data Infrastructure |
-| [Gojek](https://www.gojek.io/) | @pradithya | Production | Machine Learning Platform |
-| Fossil | @duyet | Production | Data Platform |
-| Carrefour | @AliGouta | Production | Data Platform |
-| Scaling Smart | @tarek-izemrane | Evaluation | Data Platform |
-| [Tongdun](https://www.tongdun.net/) | @lomoJG | Production | AI/ML & Analytics |
-| [Totvs Labs](https://www.totvslabs.com) | @luizm | Production | Data Platform |
-| [DiDi](https://www.didiglobal.com) | @Run-Lin | Evaluation | Data Infrastructure |
 | [DeepCure](https://www.deepcure.ai) | @mschroering | Production | Spark / ML |
-| [C2FO](https://www.c2fo.com/) | @vanhoale | Production | Data Platform / Data Infrastructure |
-| [Timo](https://timo.vn) | @vanducng | Production | Data Platform |
+| [DiDi](https://www.didiglobal.com) | @Run-Lin | Evaluation | Data Infrastructure |
+| Exacaster | @minutis | Evaluation | Data pipelines |
+| Fossil | @duyet | Production | Data Platform |
+| [Gojek](https://www.gojek.io/) | @pradithya | Production | Machine Learning Platform |
+| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform |
+| [incrmntal](https://incrmntal.com/) | @scravy | Production | ML & Data Infrastructure |
+| [Inter&Co](https://inter.co/) | @ignitz | Production | Data pipelines |
 | [Kognita](https://kognita.com.br/) | @andreclaudino | Production | MLOps, Data Platform / Data Infrastructure, ML/AI |
+| Lightbend | @yuchaoran2011 | Production | Data Infrastructure & Operations |
+| Lyft |@kumare3| Evaluation | ML & Data Infrastructure |
+| MapR Technologies |@sarjeet2013| Evaluation | ML/AI & Analytics Data Platform |
+| [MavenCode](https://www.mavencode.com) | @charlesa101 | Production | MLOps & Data Infrastructure |
+| Microsoft (MileIQ) | @dharmeshkakadia | Production | AI & Analytics |
 | [Molex](https://www.molex.com/) | @AshishPushpSingh | Evaluation/Production | Data Platform |
+| [MongoDB](https://www.mongodb.com) | @chickenpopcorn | Production | Data Infrastructure |
+| Nielsen Identity Engine | @roitvt | Evaluation | Data pipelines |
+| [PUBG](https://careers.pubg.com/#/en/) | @jacobhjkim | Production | ML & Data Infrastructure |
 | [Qualytics](https://www.qualytics.co/) | @josecsotomorales | Production | Data Quality Platform |
+| Riskified | @henbh | Evaluation | Analytics Data Platform |
 | [Roblox](https://www.roblox.com/) | @matschaffer-roblox | Evaluation | Data Infrastructure |
 | [Rokt](https://www.rokt.com) | @jacobsalway | Production | Data Infrastructure |
-| [Inter&Co](https://inter.co/) | @ignitz | Production | Data pipelines |
+| Salesforce | @khogeland | Production | Data transformation |
+| Scaling Smart | @tarek-izemrane | Evaluation | Data Platform |
+| Shell (Agile Hub) | @TomLous | Production | Data pipelines |
+| [Siigo](https://www.siigo.com) | @Juandavi1 | Production | Data Migrations & Analytics Data Platform |
+| StackTome | @emiliauskas-fuzzy | Production | Data pipelines |
+| [Stitch Fix](https://multithreaded.stitchfix.com/) | @nssalian | Evaluation | Data pipelines |
+| Tencent | @runzhliu | Evaluation | ML Analytics Platform |
+| [Timo](https://timo.vn) | @vanducng | Production | Data Platform |
+| [Tongdun](https://www.tongdun.net/) | @lomoJG | Production | AI/ML & Analytics |
+| [Totvs Labs](https://www.totvslabs.com) | @luizm | Production | Data Platform |
+| [Typeform](https://typeform.com/) | @afranzi | Production | Data & ML pipelines |
+| Uber| @chenqin| Evaluation| Spark / ML |
diff --git a/README.md b/README.md
@@ -1,10 +1,15 @@
 # Kubeflow Spark Operator
+
 [![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)
 
-## Overview
+## What is Spark Operator?
+
 The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
-[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
-for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [design doc](docs/design.md). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
+[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.
+
+## Overview
+
+For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [Architecture](https://www.kubeflow.org/docs/components/spark-operator/overview/#architecture). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
 
 The Kubernetes Operator for Apache Spark currently supports the following list of features:
 
@@ -28,69 +33,53 @@ The Kubernetes Operator for Apache Spark currently supports the following list o
 
 **If you are currently using the `v1beta1` version of the APIs in your manifests, please update them to use the `v1beta2` version by changing `apiVersion: "sparkoperator.k8s.io/<version>"` to `apiVersion: "sparkoperator.k8s.io/v1beta2"`. You will also need to delete the `previous` version of the CustomResourceDefinitions named `sparkapplications.sparkoperator.k8s.io` and `scheduledsparkapplications.sparkoperator.k8s.io`, and replace them with the `v1beta2` version either by installing the latest version of the operator or by running `kubectl create -f manifest/crds`.**
 
-Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod affinity, is implemented using a Kubernetes [Mutating Admission Webhook](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/), which became beta in Kubernetes 1.9. The mutating admission webhook is disabled by default if you install the operator using the Helm [chart](charts/spark-operator-chart). Check out the [Quick Start Guide](docs/quick-start-guide.md#using-the-mutating-admission-webhook) on how to enable the webhook.
-
 ## Prerequisites
 
 * Version >= 1.13 of Kubernetes to use the [`subresource` support for CustomResourceDefinitions](https://kubernetes.io/docs/tasks/access-kubernetes-api/custom-resources/custom-resource-definitions/#subresources), which became beta in 1.13 and is enabled by default in 1.13 and higher.
 
 * Version >= 1.16 of Kubernetes to use the `MutatingWebhook` and `ValidatingWebhook` of `apiVersion: admissionregistration.k8s.io/v1`.
 
-## Installation
+## Getting Started
 
-The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm [chart](charts/spark-operator-chart/).
+For getting started with Spark operator, please refer to [Getting Started](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
 
-```bash
-$ helm repo add spark-operator https://kubeflow.github.io/spark-operator
+## User Guide
 
-$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
-```
+For detailed user guide and API documentation, please refer to [User Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/) and [API Specification](docs/api-docs.md).
 
-This will install the Kubernetes Operator for Apache Spark into the namespace `spark-operator`. The operator by default watches and handles `SparkApplication`s in every namespaces. If you would like to limit the operator to watch and handle `SparkApplication`s in a single namespace, e.g., `default` instead, add the following option to the `helm install` command:
-
-```
---set "sparkJobNamespaces={default}"
-```
-
-For configuration options available in the Helm chart, please refer to the chart's [README](charts/spark-operator-chart/README.md).
+If you are running Spark operator on Google Kubernetes Engine (GKE) and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/gcp/).
 
 ## Version Matrix
 
 The following table lists the most recent few versions of the operator.
 
-| Operator Version | API Version | Kubernetes Version | Base Spark Version | Operator Image Tag |
-| ------------- | ------------- | ------------- | ------------- | ------------- |
-| `latest` (master HEAD) | `v1beta2` | 1.13+ | `3.0.0` | `latest` |
-| `v1beta2-1.3.3-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.3-3.1.1` |
-| `v1beta2-1.3.2-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.2-3.1.1` |
-| `v1beta2-1.3.0-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.0-3.1.1` |
-| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` | `v1beta2-1.2.3-3.1.1` |
-| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` | `v1beta2-1.2.0-3.0.0` |
-| `v1beta2-1.1.2-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` | `v1beta2-1.1.2-2.4.5` |
-| `v1beta2-1.0.1-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.1-2.4.4` |
-| `v1beta2-1.0.0-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.0-2.4.4` |
-| `v1beta1-0.9.0` | `v1beta1` | 1.13+ | `2.4.0` | `v2.4.0-v1beta1-0.9.0` |
-
-When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:
+| Operator Version | API Version | Kubernetes Version | Base Spark Version |
+| ------------- | ------------- | ------------- | ------------- |
+| `v1beta2-1.6.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.5.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.4.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.3.x-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` |
+| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` |
+| `v1beta2-1.2.2-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.2.1-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.1.x-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` |
+| `v1beta2-1.0.x-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` |
 
-```
---set image.tag=<operator image tag>
-```
+## Developer Guide
 
-## Get Started
+For developing with Spark Operator, please refer to [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).
 
-Get started quickly with the Kubernetes Operator for Apache Spark using the [Quick Start Guide](docs/quick-start-guide.md).
+## Contributor Guide
 
-If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](docs/gcp.md).
+For contributing to Spark Operator, please refer to [Contributor Guide](CONTRIBUTING.md).
 
-For more information, check the [Design](docs/design.md), [API Specification](docs/api-docs.md) and detailed [User Guide](docs/user-guide.md).
-
-## Contributing
+## Community
 
-Please check [CONTRIBUTING.md](CONTRIBUTING.md) and the [Developer Guide](docs/developer-guide.md) out.
+* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join `#kubeflow-spark-operator` Channel.
+* Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html).
+* Join our monthly community meeting [Kubeflow Spark Operator Meeting Notes](https://bit.ly/3VGzP4n).
 
-## Community
+## Adopters
 
-* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join ```#kubeflow-spark-operator``` Channel.
-* Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html)
-* Check out [who is using the Kubernetes Operator for Apache Spark](docs/who-is-using.md).
+Check out [adopters of Spark Operator](ADOPTERS.md).
diff --git a/docs/_config.yml b/docs/_config.yml
diff --git a/docs/architecture-diagram.png b/docs/architecture-diagram.png