Adoption of Spark-on-k8s-operator #648

mwielgus · 2023-09-05T21:41:01Z

We are looking for a new home for Spark-on-k8s-operator. The project was quite active for years, delivering a convenient way of running Spark in the Kubernetes environment. Unfortunately, due to some org changes, the previous maintainers are unable to provide enough time and love the project and its users deserve. So, GoogleCloudPlatform would like to transfer ownership of the code (already on the Apache license) to an organisation that would help to bring more life to the project and continue to help users run Spark on K8S. Given that you support a wide variety of ML/batch frameworks (MPI, TF, Pytorch etc) we think that Kubeflow would be a good place for the Spark operator.

mwielgus · 2023-09-05T21:45:36Z

cc: @terrytangyuan

terrytangyuan · 2023-09-05T23:35:25Z

+1 happy to sponsor this. This would be a great addition to the Kubeflow community. cc @james-jwu @theadactyl

cc @kubeflow/wg-training-leads

andreyvelich · 2023-09-06T11:53:12Z

Thank you for proposing this @mwielgus!

I agree, that Spark operator might be useful for Kubeflow users who want to do Data Preparation, Feature extraction, Data Validation, etc. before building and training their ML models. Currently, Kubeflow doesn't offer such functionality.

It would be nice if you could join our upcoming AutoML and Training WG Community call today (September 6th) at 6pm UTC (10am PST) to discuss the details and potential use-cases.

cc @kubeflow/wg-training-leads @tenzen-y @kuizhiqing

johnugeorge · 2023-09-06T13:21:13Z

Is this proposal to haev Spark operator to be independent operator in Kubeflow?

mwielgus · 2023-09-06T13:45:13Z

Thanks, I will join the meeting today :).

tenzen-y · 2023-09-06T14:57:50Z

Basically, SGTM. However, I have the same question as @johnugeorge said.

jbottum · 2023-09-06T15:03:08Z

FYI, The Kubeflow User Survey(s) have consistently shown that users would like a Spark / Kubeflow integration.

jbottum · 2023-09-06T18:20:51Z

We will discuss if and how Kubeflow will support a Spark K8s operator in our Community Meeting on Tuesday, please find the bridge in these meeting notes. I suspect there maybe several operators or implementations and we need to decide if we are going to pick one, how it will be supported, if it is part of a (new) Kubeflow Working Group, how it is installed, etc. @kimwnasptd @mwielgus Kubeflow community meeting notes: https://docs.google.com/document/d/1Wdxt1xedAj7qF_Rjmxy1R0NRdfv7UWs-r2PItewxHpE/edit.

@thesuperzapper

juliusvonkohout · 2023-09-11T08:11:39Z

mwielgus we had a spark operator before. Are they using the modern sparkconnect? https://spark.apache.org/docs/latest/spark-connect-overview.html

You can already use the kubernetes apiserver as spark master. So i am wondering whether that + spark-connect is alaready enough. Anyway i am open to contributions in manifests/contrib.

andreyvelich · 2023-09-12T12:42:22Z

Here is the recording for our initial discussion on Sep 9th around Spark Operator in Kubeflow: https://youtu.be/3D2h5OUNCQo.

@mwielgus Please can you attend Kubeflow Community call today at 8:00am PST, so we can have a followup discussion around Spark Operator: https://docs.google.com/document/d/1Wdxt1xedAj7qF_Rjmxy1R0NRdfv7UWs-r2PItewxHpE/edit#heading=h.xtqde2br5mh4.

cc @kubeflow/wg-training-leads

mwielgus · 2023-09-12T12:45:42Z

@andreyvelich I will be there.

andreyvelich · 2023-09-12T12:46:22Z

Thank you, Marcin!

jbottum · 2023-09-12T23:34:08Z

As a follow-up to our recent Apache Spark discussions in the Kubeflow Community meetings, we are requesting some user input... If you are a Spark user or contributor, the Kubeflow Community would like to know if you need active support for a Spark Kubernetes Operator. If so, would you please comment or +1 on this GitHub issue. We need at least 10 users and would appreciate any ideas on use cases i.e. integration with notebooks or Kubeflow pipelines. Thanks! Josh

droctothorpe · 2023-09-13T16:38:40Z

IMO, the fundamental gap is the lack of an SDK. Data scientists would rather write python than yaml (for good reason). There needs to be (a) some clarification (and documentation) about the benefits of the spark operator over pyspark, and (b) development of an SDK (perhaps an extension to the training operator SDK).

charlesa101 · 2023-09-13T20:18:05Z

@droctothorpe, We currently use the SparkOperator in a few of our projects, It makes it easy for us to deploy Spark Jobs "natively" on K8s, more like how the training operators currently work, so I am not sure what you mean by lack of SDK here.

We use it with Kubeflow Pipeline DSL

    spark_json_template = Template("""
{
    "apiVersion": "sparkoperator.k8s.io/v1beta2",
    "kind": "SparkApplication",
    "metadata": {
      "name": "hello-pipeline",
      "namespace": "kubeflow"},
    "spec": {
      "type": "Scala",
      "mode": "cluster",
      "mainApplicationFile": "$jar_location"
    }""")
    spark_json = spark_json_template.substitute({'jar_location': jar_location})
    spark_job = json.loads(spark_json)
    spark_resource = dsl.ResourceOp(
        name='spark-job',
        k8s_resource=spark_job,
        success_condition='status.state == Succeeded')
...

+1 on this issue, It will be great for SparkOperator to find a new home here

droctothorpe · 2023-09-13T20:43:23Z

@charlesa101 that's json with no customization, and the configuration options are abundant. It's nice to be able to just use ResourceOp though. Thanks for sharing.

Our platform provides both pyspark and Spark Operator support, and the overwhelming majority of users prefer pyspark. That's just one data point though. IMO, a proper, python interface ala the training operator SDK (or pyspark) would promote adoption.

charlesa101 · 2023-09-13T21:47:45Z

@droctothorpe This is based on CRD for SparkOperator, This will work in the same way for PySpark. I'm curious to know more about how your PySpark Operator implementation works. The configurations are abundant but I am not sure there will be a use case for you to have to load up all the configs

I agree with you that It will be great to eventually align the behavior of this operator with the training operators to make it easy to use but I am not sure what you still mean by SDK in this context! Once you have the YAML and CRDs well-defined you can easily use them in your KFP as a component

terrytangyuan · 2023-09-13T22:05:17Z

Here's the Python SDK for training-operator. Basically instead of writing YAML and use it in your KFP component, you can use Python to define and submit jobs directly.

https://github.com/kubeflow/training-operator/tree/master/sdk/python

charlesa101 · 2023-09-13T22:23:12Z

Oh I see what you mean, thanks @terrytangyuan 👍

mwielgus · 2023-10-12T15:23:17Z

What should be the next steps? Do you have enough data points about Spark in Kubeflow?

andreyvelich · 2023-10-24T16:10:19Z

Hi @mwielgus, please can you join Kubeflow Community call next Tuesday on October 31st 8:00am PST ?
We can discuss the next steps and possibilities to move this forward.

Also @thesuperzapper can share some details around using Spark with Kubeflow Notebooks 2.0 (e.g. Kubeflow Workspaces).

mwielgus · 2023-10-31T13:24:32Z

@andreyvelich Yes, I will be there.

andreyvelich · 2023-11-13T16:40:49Z

We had a great discussion around adoption of Spark Operator during KubeCon with @mwielgus and @vara-bonthu.
We might be able to find folks who can maintain this project moving forward.
Let's have a chat tomorrow during Kubeflow Community Call (November 14th at 8:00am PST).

@jbottum We will provide more updates during the call and discuss the next steps.

andreyvelich · 2023-11-21T15:54:31Z

Hi Everyone, as we discussed on the latest Kubeflow community call we started this doc to donate Spark Operator to Kubeflow:
https://docs.google.com/document/d/1rCPEBQZPKnk0m7kcA5aHPf0fISl0MTAzsa4Wg3dfs5M/edit#heading=h.z7wqs2ebrwra
Please take a look and provide your comments.
It would be great if we could quickly discuss it during our today's Kubeflow Community Call at 8am PST (cc @mwielgus @vara-bonthu).

cc @kubeflow/project-steering-group @kubeflow/wg-pipeline-leads @kubeflow/wg-training-leads @kubeflow/wg-notebooks-leads

vara-bonthu · 2023-11-22T13:25:31Z

I am looking forward to the adoption of the Google's Spark K8s Operator, which will contribute to building a larger community and potentially become the official Spark Operator for Apache Spark.

As part of this efforts, it is crucial to establish support for a single official Spark Kubernetes Operator within the Apache Spark community. Collaboration with Apache Spark and gaining their endorsement is of utmost importance in this context.

This collaboration will serve to prevent the Apache Spark community from introducing an entirely new Spark Operator, akin to Apache Flink, which offers an official Flink Operator for Kubernetes. This approach helps avoid potential confusion within the community and ensures that users gravitate toward the approved Apache Spark Operator tool.

andreyvelich · 2023-11-22T16:52:41Z

cc @yuchaoran2011

lfrancke · 2023-11-22T17:20:28Z

If you want the operator to become even semi-"official" it should be donated to the ASF instead.
The ASF - in general - does not give any product the recognition of being the "official X for Y" or the "approved".
(I say this as a member of the ASF but not with any special knowledge or any special powers, just from my knowledge of the policies - especially around trademarks). https://www.apache.org/foundation/marks/

While we're at it: The current name "Google's Spark K8s Operator" might be a violation of the trademark policy already.
I suggest clarifying with the ASF before adopting the name.
The usual "approved" naming scheme is "XYZ for Apache Foo". In this case: Google's Kubernetes operator for Apache Spark" (or similar)
It needs to be made clear, in naming, documentation and communication that this is in no way officially affiliated with the ASF.

With my other hat - as a co-founder of Stackable I'd like to point to another operator for Apache Spark which already exists (built by us): https://github.com/stackabletech/spark-k8s-operator/ and which we recently compared to the Google one.

Happy to help with any ASF related communication.

wilfred-s · 2023-11-23T01:04:20Z

I agree with @lfrancke on the point of donating to the ASF if you want to make it even semi official.
In the Apache YuniKorn community we see a number of groups using the operator. Most of them have made changes to the operator to fix issues or integrate with newer versions of Apache Spark.

terrytangyuan · 2023-11-23T01:36:08Z

If you want the operator to become even semi-"official" it should be donated to the ASF instead.

IMO, "official" should only be earned by merit and community adoption. Although donating to ASF helps the legal side, CNCF provides a good community around K8s and cloud-native technologies.

With my other hat - as a co-founder of Stackable I'd like to point to another operator for Apache Spark which already exists (built by us): https://github.com/stackabletech/spark-k8s-operator/ and which we recently compared to the Google one.

Out of curiosity, why not join the effort of maintaining the existing Spark Operator that's already widely adopted?

thesuperzapper · 2023-11-23T01:51:42Z

I don't think this discussion is about trying to present the Google Spark operator as an "official" option (from a Spark or even a Kubeflow perspective), it's simply about giving a new home for the existing users and contributors of GoogleCloudPlatform/spark-on-k8s-operator on the Kubeflow org, so they can continue working on it in a neutral place, rather than continue struggling under their current home.

It's up to the maintainers of GoogleCloudPlatform/spark-on-k8s-operator to decide where they want to live, and in this specific case, it seems like they need a solution in the short-term solution to prevent those contributors/users from being stuck and unable to continue development.

Longer term, there is a strategic question about whether all three operators can be merged (including the Stackable one and the one that Apple was proposing to donate to the ASF), but I don't think that needs to block this donation, if all parties are willing.

Jeffwan · 2023-11-23T03:56:03Z

Hi forks, long time no see due to busy internal work. I happen to see this thread. Few things to note

spark-operator collaboration has been there for long term. If they need sponsorship, Kubeflow would be a perfect umbrella and it would gradually extends to DATA + AI scope.
about "official", I am in Spark dev list and notice there's a proposal there SPIP : Spark Kubernetes Operator recently. Honestly, I think the GCP version is pretty good and widely used by numerous users and orgs. If kubeflow community can drive its evolvement, that help a lots of Spark users and may avoid reinventing wheels.

vara-bonthu · 2023-11-23T09:35:39Z

Matthew (@thesuperzapper) makes a good point - we are looking for a new home for Google's Spark Operator, and CNCF projects like Kubeflow seem like a good fit because they have a bigger community. But our main goal is to prevent Apache Spark from making another Java Spark Operator. Instead, we think it's important for everyone to work together on one Spark Operator.

@Jeffwan, you're right. We found a proposal that already has votes from Spark maintainers. But we added our comments to the proposal, saying that Google's Spark Operator is widely adopted by hundreds of organizations in production today. Salesforce and few others also added a "+" and said they think Google's Spark Operator is a good idea.

To make sure the Apache Spark community knows what we're thinking, we started a new proposal (SPIP) inside Apache Spark. You can find it here SPARK-46054.

Please share your thoughts and vote on the proposal. We want to work together on one Spark Operator, no matter if it ends up under Apache or Kubeflow. This will make the community bigger and stronger.

@wilfred-s @terrytangyuan with the support from your folks, we can work on endorsing one tool to build bigger community.

lfrancke · 2023-11-23T11:50:17Z

With my other hat - as a co-founder of Stackable I'd like to point to another operator for Apache Spark which already exists (built by us): stackabletech/spark-k8s-operator and which we recently compared to the Google one.

Out of curiosity, why not join the effort of maintaining the existing Spark Operator that's already widely adopted?

I don't want to derail this issue, so I'll try to keep it short.
Our use-case is different: We are building a platform which includes 10+ tools and operators (I recently gave a talk on our experience building a lot of operators). And for us it's important that all operators support the same features, consolidated documentation, CRD docs, vulnerability management, supply chain security stuff, Cyber Resilience Act compliance etc.
For that reason we made a decision to build our own operators to make sure they all are... similar. I hope that makes sense?

@vara-bonthu As mentioned before: It would be against the ASF rules for a project to "endorse" a project. So that is never going to happen if the project is not part of the ASF itself and even then the term "endorse" would almost certainly not be used.

vara-bonthu · 2023-11-23T14:22:24Z

@vara-bonthu As mentioned before: It would be against the ASF rules for a project to "endorse" a project. So that is never going to happen if the project is not part of the ASF itself and even then the term "endorse" would almost certainly not be used.

@lfrancke Thank you for the clarification regarding the term "endorse."

To clarify our intent, we have a straightforward goal here. We are interested in investigating the potential donation of the Spark Operator to either the Apache or Kubeflow projects. Once such a donation is agreed upon, we are committed to aligning with and adhering to the governance policies and guidelines of the chosen organization.

We also aim to prevent the unnecessary duplication of efforts in building multiple Spark Operators, which can potentially lead to confusion among users and organizations.

vikas-saxena02 · 2023-12-12T00:49:54Z

I am in support of this proposal as I have done a variety of usecases that require SparkML due to sheer volumes of data including near realtime scenarios using spark streaming. @andreyvelich @jbottum @akgraner I am more than happy to be part of this initiative as I have the right skills for the same.

andreyvelich · 2023-12-12T16:01:53Z

It's great to hear @vikas-saxena02.
If you are available, please attend one of the upcoming Kubeflow Community Calls on Tuesday at 8am PST, so we can discuss the Spark Operator adoption updates.

terrytangyuan · 2024-04-04T00:53:38Z

See https://github.com/kubeflow/spark-operator

/close

google-oss-prow · 2024-04-04T00:53:42Z

@terrytangyuan: Closing this issue.

In response to this:

See https://github.com/kubeflow/spark-operator

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andreyvelich mentioned this issue Dec 11, 2023

Proposal: Adoption of Spark Kubernetes Operator in Kubeflow #672

Merged

mwielgus mentioned this issue Dec 13, 2023

Move Spark-on-K8s-operator repository to Kubeflow kubeflow/spark-operator#1892

Closed

vara-bonthu mentioned this issue Dec 13, 2023

Move Spark-on-K8s-operator repository to Kubeflow kubeflow/spark-operator#1893

Closed

andreyvelich mentioned this issue Mar 16, 2024

Action items for adoption of Spark Kubernetes Operator in Kubeflow kubeflow/spark-operator#1928

Closed

13 tasks

zevisert mentioned this issue Mar 20, 2024

Is the repo actively taking contributions? kubeflow/spark-operator#1936

Closed

google-oss-prow bot closed this as completed Apr 4, 2024

Adoption of Spark-on-k8s-operator #648

Adoption of Spark-on-k8s-operator #648

Comments

mwielgus commented Sep 5, 2023

mwielgus commented Sep 5, 2023

terrytangyuan commented Sep 5, 2023 • edited Loading

andreyvelich commented Sep 6, 2023

johnugeorge commented Sep 6, 2023

mwielgus commented Sep 6, 2023

tenzen-y commented Sep 6, 2023

jbottum commented Sep 6, 2023

jbottum commented Sep 6, 2023 • edited Loading

juliusvonkohout commented Sep 11, 2023

andreyvelich commented Sep 12, 2023

mwielgus commented Sep 12, 2023

andreyvelich commented Sep 12, 2023

jbottum commented Sep 12, 2023

droctothorpe commented Sep 13, 2023

charlesa101 commented Sep 13, 2023

droctothorpe commented Sep 13, 2023 • edited Loading

charlesa101 commented Sep 13, 2023

terrytangyuan commented Sep 13, 2023 • edited Loading

charlesa101 commented Sep 13, 2023

mwielgus commented Oct 12, 2023

andreyvelich commented Oct 24, 2023

mwielgus commented Oct 31, 2023

andreyvelich commented Nov 13, 2023

andreyvelich commented Nov 21, 2023 • edited Loading

vara-bonthu commented Nov 22, 2023

andreyvelich commented Nov 22, 2023

lfrancke commented Nov 22, 2023

wilfred-s commented Nov 23, 2023

terrytangyuan commented Nov 23, 2023

thesuperzapper commented Nov 23, 2023

Jeffwan commented Nov 23, 2023 • edited Loading

vara-bonthu commented Nov 23, 2023 • edited Loading

lfrancke commented Nov 23, 2023

vara-bonthu commented Nov 23, 2023

vikas-saxena02 commented Dec 12, 2023

andreyvelich commented Dec 12, 2023

terrytangyuan commented Apr 4, 2024

google-oss-prow bot commented Apr 4, 2024

terrytangyuan commented Sep 5, 2023 •

edited

Loading

jbottum commented Sep 6, 2023 •

edited

Loading

droctothorpe commented Sep 13, 2023 •

edited

Loading

terrytangyuan commented Sep 13, 2023 •

edited

Loading

andreyvelich commented Nov 21, 2023 •

edited

Loading

Jeffwan commented Nov 23, 2023 •

edited

Loading

vara-bonthu commented Nov 23, 2023 •

edited

Loading