Update docs

Signed-off-by: Yi Chen <[email protected]>
kubeflow · Jun 25, 2024 · 67bd783 · 67bd783
1 parent 012b52a
commit 67bd783
Show file tree

Hide file tree

Showing 16 changed files with 474 additions and 539 deletions.
diff --git a/README.md b/README.md
@@ -1,10 +1,15 @@
 # Kubeflow Spark Operator
+
 [![Go Report Card](https://goreportcard.com/badge/github.com/kubeflow/spark-operator)](https://goreportcard.com/report/github.com/kubeflow/spark-operator)
 
-## Overview
+## What is Kubeflow Spark Operator?
+
 The Kubernetes Operator for Apache Spark aims to make specifying and running [Spark](https://github.com/apache/spark) applications as easy and idiomatic as running other workloads on Kubernetes. It uses
-[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/)
-for specifying, running, and surfacing status of Spark applications. For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [design doc](docs/design.md). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
+[Kubernetes custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) for specifying, running, and surfacing status of Spark applications.
+
+## Overview
+
+ For a complete reference of the custom resource definitions, please refer to the [API Definition](docs/api-docs.md). For details on its design, please refer to the [design doc](docs/design.md). It requires Spark 2.3 and above that supports Kubernetes as a native scheduler backend.
 
 The Kubernetes Operator for Apache Spark currently supports the following list of features:
 
@@ -36,61 +41,41 @@ Customization of Spark pods, e.g., mounting arbitrary volumes and setting pod af
 
 * Version >= 1.16 of Kubernetes to use the `MutatingWebhook` and `ValidatingWebhook` of `apiVersion: admissionregistration.k8s.io/v1`.
 
-## Installation
-
-The easiest way to install the Kubernetes Operator for Apache Spark is to use the Helm [chart](charts/spark-operator-chart/).
+## Getting Started
 
-```bash
-$ helm repo add spark-operator https://kubeflow.github.io/spark-operator
+For getting started with Spark operator, please refer to [Getting Started](https://www.kubeflow.org/docs/components/spark-operator/getting-started/).
 
-$ helm install my-release spark-operator/spark-operator --namespace spark-operator --create-namespace
-```
+## User Guide
 
-This will install the Kubernetes Operator for Apache Spark into the namespace `spark-operator`. The operator by default watches and handles `SparkApplication`s in every namespaces. If you would like to limit the operator to watch and handle `SparkApplication`s in a single namespace, e.g., `default` instead, add the following option to the `helm install` command:
+For detailed user guide, please refer to [User Guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/).
 
-```
---set "sparkJobNamespaces={default}"
-```
+For API documentation, please refer to [API Specification](docs/api-docs.md).
 
-For configuration options available in the Helm chart, please refer to the chart's [README](charts/spark-operator-chart/README.md).
+If you are running Spark operator on Google Kubernetes Engine (GKE) and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](https://www.kubeflow.org/docs/components/spark-operator/user-guide/gcp/).
 
 ## Version Matrix
 
 The following table lists the most recent few versions of the operator.
 
-| Operator Version | API Version | Kubernetes Version | Base Spark Version | Operator Image Tag |
-| ------------- | ------------- | ------------- | ------------- | ------------- |
-| `latest` (master HEAD) | `v1beta2` | 1.13+ | `3.0.0` | `latest` |
-| `v1beta2-1.3.3-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.3-3.1.1` |
-| `v1beta2-1.3.2-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.2-3.1.1` |
-| `v1beta2-1.3.0-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` | `v1beta2-1.3.0-3.1.1` |
-| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` | `v1beta2-1.2.3-3.1.1` |
-| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` | `v1beta2-1.2.0-3.0.0` |
-| `v1beta2-1.1.2-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` | `v1beta2-1.1.2-2.4.5` |
-| `v1beta2-1.0.1-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.1-2.4.4` |
-| `v1beta2-1.0.0-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` | `v1beta2-1.0.0-2.4.4` |
-| `v1beta1-0.9.0` | `v1beta1` | 1.13+ | `2.4.0` | `v2.4.0-v1beta1-0.9.0` |
-
-When installing using the Helm chart, you can choose to use a specific image tag instead of the default one, using the following option:
-
-```
---set image.tag=<operator image tag>
-```
-
-## Get Started
-
-Get started quickly with the Kubernetes Operator for Apache Spark using the [Quick Start Guide](docs/quick-start-guide.md).
-
-If you are running the Kubernetes Operator for Apache Spark on Google Kubernetes Engine and want to use Google Cloud Storage (GCS) and/or BigQuery for reading/writing data, also refer to the [GCP guide](docs/gcp.md).
-
-For more information, check the [Design](docs/design.md), [API Specification](docs/api-docs.md) and detailed [User Guide](docs/user-guide.md).
+| Operator Version | API Version | Kubernetes Version | Base Spark Version |
+| ------------- | ------------- | ------------- | ------------- |
+| `v1beta2-1.6.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.5.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.4.x-3.5.0` | `v1beta2` | 1.16+ | `3.5.0` |
+| `v1beta2-1.3.x-3.1.1` | `v1beta2` | 1.16+ | `3.1.1` |
+| `v1beta2-1.2.3-3.1.1` | `v1beta2` | 1.13+ | `3.1.1` |
+| `v1beta2-1.2.2-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.2.1-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.2.0-3.0.0` | `v1beta2` | 1.13+ | `3.0.0` |
+| `v1beta2-1.1.x-2.4.5` | `v1beta2` | 1.13+ | `2.4.5` |
+| `v1beta2-1.0.x-2.4.4` | `v1beta2` | 1.13+ | `2.4.4` |
 
 ## Contributing
 
-Please check [CONTRIBUTING.md](CONTRIBUTING.md) and the [Developer Guide](docs/developer-guide.md) out.
+For contributing, please refer to [CONTRIBUTING.md](CONTRIBUTING.md) and [Developer Guide](https://www.kubeflow.org/docs/components/spark-operator/developer-guide/).
 
 ## Community
 
-* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join ```#kubeflow-spark-operator``` Channel.
+* Join the [CNCF Slack Channel](https://www.kubeflow.org/docs/about/community/#kubeflow-slack-channels) and then join `#kubeflow-spark-operator` Channel.
 * Check out our blog post [Announcing the Kubeflow Spark Operator: Building a Stronger Spark on Kubernetes Community](https://blog.kubeflow.org/operators/2024/04/15/kubeflow-spark-operator.html)
-* Check out [who is using the Kubernetes Operator for Apache Spark](docs/who-is-using.md).
+* Check out [who is using the Spark Operator](docs/adopters.md).
diff --git a/docs/who-is-using.md → docs/adopters.md b/docs/who-is-using.md → docs/adopters.md
@@ -1,48 +1,50 @@
-## Who Is Using the Kubernetes Operator for Apache Spark?
+# Adopters of Kubeflow Spark Operator
+
+Below are the adopters of project Spark Operator. If you are using Spark Operator please add yourself into the following list by a pull request. Please keep the list in alphabetical order.
 
 | Organization | Contact (GitHub User Name) | Environment | Description of Use |
 | ------------- | ------------- | ------------- | ------------- |
-| [Caicloud](https://intl.caicloud.io/) | @gaocegege | Production | Cloud-Native AI Platform |
-| Microsoft (MileIQ) | @dharmeshkakadia | Production | AI & Analytics |
-| Lightbend | @yuchaoran2011 | Production | Data Infrastructure & Operations |
-| StackTome | @emiliauskas-fuzzy | Production | Data pipelines |
-| Salesforce | @khogeland | Production | Data transformation |
+| [Beeline](https://beeline.ru) | @spestua | Evaluation | ML & Data Infrastructure |
 | Bringg | @EladDolev | Production | ML & Analytics Data Platform |
-| [Siigo](https://www.siigo.com) | @Juandavi1 | Production | Data Migrations & Analytics Data Platform |
+| [Caicloud](https://intl.caicloud.io/) | @gaocegege | Production | Cloud-Native AI Platform |
+| Carrefour | @AliGouta | Production | Data Platform |
 | CERN|@mrow4a| Evaluation | Data Mining & Analytics |
-| Lyft |@kumare3| Evaluation | ML & Data Infrastructure |
-| MapR Technologies |@sarjeet2013| Evaluation | ML/AI & Analytics Data Platform |
-| Uber| @chenqin| Evaluation| Spark / ML |
-| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform |
-| Tencent | @runzhliu | Evaluation | ML Analytics Platform |
-| Exacaster | @minutis | Evaluation | Data pipelines |
-| Riskified | @henbh | Evaluation | Analytics Data Platform |
+| [CloudPhysics](https://www.cloudphysics.com) | @jkleckner | Production | ML/AI & Analytics |
 | CloudZone | @iftachsc | Evaluation | Big Data Analytics Consultancy |
 | Cyren | @avnerl | Evaluation | Data pipelines |
-| Shell (Agile Hub) | @TomLous | Production | Data pipelines |
-| Nielsen Identity Engine | @roitvt | Evaluation | Data pipelines |
+| [C2FO](https://www.c2fo.com/) | @vanhoale | Production | Data Platform / Data Infrastructure |
 | [Data Mechanics](https://www.datamechanics.co)  | @jrj-d | Production | Managed Spark Platform |
-| [PUBG](https://careers.pubg.com/#/en/) | @jacobhjkim | Production | ML & Data Infrastructure |
-| [Beeline](https://beeline.ru) | @spestua | Evaluation | ML & Data Infrastructure |
-| [Stitch Fix](https://multithreaded.stitchfix.com/) | @nssalian | Evaluation | Data pipelines |
-| [Typeform](https://typeform.com/) | @afranzi | Production | Data & ML pipelines |
-| incrmntal(https://incrmntal.com/) | @scravy | Production | ML & Data Infrastructure |
-| [CloudPhysics](https://www.cloudphysics.com) | @jkleckner | Production | ML/AI & Analytics |
-| [MongoDB](https://www.mongodb.com) | @chickenpopcorn | Production | Data Infrastructure |
-| [MavenCode](https://www.mavencode.com) | @charlesa101 | Production | MLOps & Data Infrastructure |
-| [Gojek](https://www.gojek.io/) | @pradithya | Production | Machine Learning Platform |
-| Fossil | @duyet | Production | Data Platform |
-| Carrefour | @AliGouta | Production | Data Platform |
-| Scaling Smart | @tarek-izemrane | Evaluation | Data Platform |
-| [Tongdun](https://www.tongdun.net/) | @lomoJG | Production | AI/ML & Analytics |
-| [Totvs Labs](https://www.totvslabs.com) | @luizm | Production | Data Platform |
-| [DiDi](https://www.didiglobal.com) | @Run-Lin | Evaluation | Data Infrastructure |
 | [DeepCure](https://www.deepcure.ai) | @mschroering | Production | Spark / ML |
-| [C2FO](https://www.c2fo.com/) | @vanhoale | Production | Data Platform / Data Infrastructure |
-| [Timo](https://timo.vn) | @vanducng | Production | Data Platform |
+| [DiDi](https://www.didiglobal.com) | @Run-Lin | Evaluation | Data Infrastructure |
+| Exacaster | @minutis | Evaluation | Data pipelines |
+| Fossil | @duyet | Production | Data Platform |
+| [Gojek](https://www.gojek.io/) | @pradithya | Production | Machine Learning Platform |
+| HashmapInc| @prem0132 | Evaluation | Analytics Data Platform |
+| [incrmntal](https://incrmntal.com/) | @scravy | Production | ML & Data Infrastructure |
+| [Inter&Co](https://inter.co/) | @ignitz | Production | Data pipelines |
 | [Kognita](https://kognita.com.br/) | @andreclaudino | Production | MLOps, Data Platform / Data Infrastructure, ML/AI |
+| Lightbend | @yuchaoran2011 | Production | Data Infrastructure & Operations |
+| Lyft |@kumare3| Evaluation | ML & Data Infrastructure |
+| MapR Technologies |@sarjeet2013| Evaluation | ML/AI & Analytics Data Platform |
+| [MavenCode](https://www.mavencode.com) | @charlesa101 | Production | MLOps & Data Infrastructure |
+| Microsoft (MileIQ) | @dharmeshkakadia | Production | AI & Analytics |
 | [Molex](https://www.molex.com/) | @AshishPushpSingh | Evaluation/Production | Data Platform |
+| [MongoDB](https://www.mongodb.com) | @chickenpopcorn | Production | Data Infrastructure |
+| Nielsen Identity Engine | @roitvt | Evaluation | Data pipelines |
+| [PUBG](https://careers.pubg.com/#/en/) | @jacobhjkim | Production | ML & Data Infrastructure |
 | [Qualytics](https://www.qualytics.co/) | @josecsotomorales | Production | Data Quality Platform |
+| Riskified | @henbh | Evaluation | Analytics Data Platform |
 | [Roblox](https://www.roblox.com/) | @matschaffer-roblox | Evaluation | Data Infrastructure |
 | [Rokt](https://www.rokt.com) | @jacobsalway | Production | Data Infrastructure |
-| [Inter&Co](https://inter.co/) | @ignitz | Production | Data pipelines |
+| Salesforce | @khogeland | Production | Data transformation |
+| Scaling Smart | @tarek-izemrane | Evaluation | Data Platform |
+| Shell (Agile Hub) | @TomLous | Production | Data pipelines |
+| [Siigo](https://www.siigo.com) | @Juandavi1 | Production | Data Migrations & Analytics Data Platform |
+| StackTome | @emiliauskas-fuzzy | Production | Data pipelines |
+| [Stitch Fix](https://multithreaded.stitchfix.com/) | @nssalian | Evaluation | Data pipelines |
+| Tencent | @runzhliu | Evaluation | ML Analytics Platform |
+| [Timo](https://timo.vn) | @vanducng | Production | Data Platform |
+| [Tongdun](https://www.tongdun.net/) | @lomoJG | Production | AI/ML & Analytics |
+| [Totvs Labs](https://www.totvslabs.com) | @luizm | Production | Data Platform |
+| [Typeform](https://typeform.com/) | @afranzi | Production | Data & ML pipelines |
+| Uber| @chenqin| Evaluation| Spark / ML |
diff --git a/docs/developer-guide.md b/docs/developer-guide.md
@@ -27,7 +27,7 @@ pre-commit install-hooks
 
 In case you want to build the operator from the source code, e.g., to test a fix or a feature you write, you can do so following the instructions below.
 
-The easiest way to build the operator without worrying about its dependencies is to just build an image using the [Dockerfile](../Dockerfile).
+The easiest way to build the operator without worrying about its dependencies is to just build an image using the [Dockerfile](https://github.com/kubeflow/spark-operator/Dockerfile).
 
 ```bash
 docker build -t <image-tag> .
@@ -39,8 +39,6 @@ The operator image is built upon a base Spark image that defaults to `spark:3.5.
 docker build --build-arg SPARK_IMAGE=<your Spark image> -t <image-tag> .
 ```
 
-If you want to use the operator on OpenShift clusters, first make sure you have Docker version 18.09.3 or above, then build your operator image using the [OpenShift-specific Dockerfile](../Dockerfile.rh).
-
 ```bash
 export DOCKER_BUILDKIT=1
 docker build -t <image-tag> -f Dockerfile.rh .