Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support archives param for spark-submit #2256

Merged

Conversation

kaka-zb
Copy link
Contributor

@kaka-zb kaka-zb commented Oct 15, 2024

Purpose of this PR

As described in #2245, we can consider adding a archives field in deps field of SparkApplication CRD to support --archives option.

resolve #2245

Change Category

Indicate the type of change by marking the applicable boxes:

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

Before submitting your PR, please review the following:

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Contributor

@kaka-zb Thanks for contributing this feature, LGTM, will wait for another approval @jacobsalway @ImpSy .

@jacobsalway
Copy link
Member

The CR field and spark-submit arg construction works as expected in my local testing but I'd like to confirm the expected behaviour so the CRD docs are accurate. When testing locally with a k9s shell, I found the archive file present on the driver but not in the executor working directory. Is this expected @kaka-zb?

@kaka-zb
Copy link
Contributor Author

kaka-zb commented Oct 16, 2024

@jacobsalway It's not expected, the archive file should present on both driver and executor working dir.

The executor pod should start up with a log of the following format, and you can check it
image

@jacobsalway
Copy link
Member

Apologies @kaka-zb you're right. I was using a local://... dependency which isn't copied to the file server as it should be available on all worker nodes anyway. Using a http:// archive/file worked as you described. Interestingly the file is also copied into the driver's working directory but the spark-submit help message says only executors so the CRD doc should be fine.

@jacobsalway
Copy link
Member

/lgtm

@google-oss-prow google-oss-prow bot added the lgtm label Oct 16, 2024
@google-oss-prow google-oss-prow bot merged commit a1de26d into kubeflow:master Oct 16, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Support --archives option in SparkApplication CRD and spark-submit
3 participants