-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-5388] Provide a stable application submission gateway for standalone cluster mode #4216
Conversation
This commit introduces type-safe schemas for all messages exchanged in the REST protocol. Each message is expected to contain an ACTION field that has only one possible value for each message type. Before the message is sent, we validate that all required fields are in fact present, and that the value of the action field is the correct type. The next step is to actually integrate this in standalone mode.
This commit embeds the REST server in the standalone Master and forces Spark submit to submit applications through the REST client. This is the first working end-to-end implementation of a stable submission interface in standalone cluster mode.
Previously the REST protocol was very explicitly tied to the standalone mode. This commit frees the protocol from this restriction.
Previously APP_ARGs, SPARK_PROPERTYs and ENVIRONMENT_VARIABLEs will appear in the JSON at random places. Now they are grouped together at the end of the JSON blob.
This is applicable to application arguments, Spark properties, and environment variables, all of which were previously handled through parameterized fields, which were cumbersome to parse. Since JSON naturally supports nesting, we should take advantage of it too. This commit refactors the code that converts the messages to and from JSON in a way that subclasses can easily override the conversion behavior without duplicating code.
This involves refactoring SparkSubmit a little to put the code that launches the REST client in the right place. This commit also adds port retry logic in the REST server, which was previously missing.
This commit makes the StandaloneRestServer actually handle status requests. The existing polling behavior from o.a.s.deploy.Client is also implemented in the StandaloneRestClient and amended. Additionally, the validation behavior was confusing before this commit. Previously the error message would seem to indicate that the user constructed a malformed message even if the message was constructed on the server side. This commit ensures that the error message is different for these two cases.
Otherwise it's a little ambiguous what we mean by SPARK_VERSION.
This commit does the following major things: (1) Refactor SparkSubmit such that SparkSubmitSuite now passes (2) Refactor the REST messages such that it's easier to test them (3) Add type-safety validation logic for REST fields (4) Move REST fields to its own file (5) Maintain ordering of fields added to REST messages (6) Add an option to disable the REST server, as we do in tests
Test build #26138 has started for PR 4216 at commit
|
Test build #26138 has finished for PR 4216 at commit
|
Test FAILed. |
The motivation is to fix failing tests SparkSubmitSuite and DriverSuite.
Test build #26186 has started for PR 4216 at commit
|
Hi @andrewor14 , I see the bug is targeted at 1.3. This seems like a pretty considerable change to come so close to the branching point. Should we target it at 1.4 instead? |
Test build #26186 has finished for PR 4216 at commit
|
Test PASSed. |
This is actually non-trivial because we must run standalone mode instead of relying on the existing local-cluster mode. This means we must manually start our own Master and Workers, and provide a real jar when submitting the test application, which involves manually packaging our own jar. Further, since the driver output is difficult to obtain programmatically in cluster mode, we need to write the results to a special file and verify them later.
Test build #26217 has started for PR 4216 at commit
|
@vanzin I understand your concern. Although this patch adds many lines, its scope is actually limited only to standalone cluster mode, and the default submit behavior is actually unchanged. The idea is to provide only an alternative to the existing submission gateway, not a replacement (at least for now). Also, it should be fairly straightforward to test this as we only need to test one mode. I will be sure to reiterate on review comments promptly so as not to potentially delay the release. |
Test build #26218 has started for PR 4216 at commit
|
Test build #26892 has started for PR 4216 at commit
|
Test build #26892 has finished for PR 4216 at commit
|
Test FAILed. |
@andrewor14 okay I think this time you are causing the test failure :) |
val sparkProperties: HashMap[String, String] = new HashMap[String, String]() | ||
|
||
// Standalone cluster mode only | ||
var useRest: Boolean = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not actually settable by the user, so for this one it might be good to indicate it's used only internally
I did a quick pass, this is looking good, but there are some comments on the JIRA worth addressing. |
Test build #26902 has started for PR 4216 at commit
|
Test build #26903 has started for PR 4216 at commit
|
Test build #26902 has finished for PR 4216 at commit
|
Test PASSed. |
Test build #26903 has finished for PR 4216 at commit
|
Test PASSed. |
This allows us to use the raw values for these types of fields in the JSON instead of a string version of them.
Conflicts: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
Alright, as of the latest commit I believe I have addressed all comments both on this PR and on the JIRA. This PR is ready to go from my side. |
Test build #26967 has started for PR 4216 at commit
|
Okay, this LGTM pending tests. I believe all of the feedback has been addressed. Thanks @vanzin for thorough and very helpful feedback on this patch. This protocol looks much better than it did at the beginning. |
Test build #26967 has finished for PR 4216 at commit
|
Test PASSed. |
…dalone cluster mode The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there. As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066). By the way this includes ~700 lines of tests and ~200 lines of license. Author: Andrew Or <[email protected]> Closes #4216 from andrewor14/rest and squashes the following commits: 8d7ce07 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 6f0c597 [Andrew Or] Use nullable fields for integer and boolean values dfe4bd7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest b9e2a08 [Andrew Or] Minor comments 02b5cea [Andrew Or] Fix tests d2b1ef8 [Andrew Or] Comment changes + minor code refactoring across the board 9c82a36 [Andrew Or] Minor comment and wording updates b4695e7 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest c9a8ad7 [Andrew Or] Do not include appResource and mainClass as properties 6fc7670 [Andrew Or] Report REST server response back to the user 40e6095 [Andrew Or] Pass submit parameters through system properties cbd670b [Andrew Or] Include unknown fields, if any, in server response 9fee16f [Andrew Or] Include server protocol version on mismatch 09f873a [Andrew Or] Fix style 8188e61 [Andrew Or] Upgrade Jackson from 2.3.0 to 2.4.4 37538e0 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 9165ae8 [Andrew Or] Fall back to Akka if endpoint was not REST 252d53c [Andrew Or] Clean up server error handling behavior further c643f64 [Andrew Or] Fix style bbbd329 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 792e112 [Andrew Or] Use specific HTTP response codes on error f98660b [Andrew Or] Version the protocol and include it in REST URL 721819f [Andrew Or] Provide more REST-like interface for submit/kill/status 581f7bf [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 9e0d1af [Andrew Or] Move some classes around to reduce number of files (minor) 42e5de4 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 1f1c03f [Andrew Or] Use Jackson's DefaultScalaModule to simplify messages 9229433 [Andrew Or] Reduce duplicate naming in REST field ade28fd [Andrew Or] Clean up REST response output in Spark submit b2fef8b [Andrew Or] Abstract the success field to the general response 6c57b4b [Andrew Or] Increase timeout in end-to-end tests bf696ff [Andrew Or] Add checks for enabling REST when using kill/status 7ee6737 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e2f7f5f [Andrew Or] Provide more safeguard against missing fields 9581df7 [Andrew Or] Clean up uses of exceptions 914fdff [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e2104e6 [Andrew Or] stable -> rest 3db7379 [Andrew Or] Fix comments and name fields for better error messages 8d43486 [Andrew Or] Replace SubmitRestProtocolAction with class name df90e8b [Andrew Or] Use Jackson for JSON de/serialization d7a1f9f [Andrew Or] Fix local cluster tests efa5e18 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest e42c131 [Andrew Or] Add end-to-end tests for standalone REST protocol 837475b [Andrew Or] Show the REST port on the Master UI d8d3717 [Andrew Or] Use a daemon thread pool for REST server 6568ca5 [Andrew Or] Merge branch 'master' of github.com:apache/spark into rest 77774ba [Andrew Or] Minor fixes 206cae4 [Andrew Or] Refactor and add tests for the REST protocol 63c05b3 [Andrew Or] Remove MASTER as a field (minor) 9e21b72 [Andrew Or] Action -> SparkSubmitAction (minor) 51c5ca6 [Andrew Or] Distinguish client and server side Spark versions b44e103 [Andrew Or] Implement status requests + fix validation behavior 120ab9d [Andrew Or] Support kill and request driver status through SparkSubmit 544de1d [Andrew Or] Major clean ups in code and comments e958cae [Andrew Or] Supported nested values in messages 484bd21 [Andrew Or] Specify an ordering for fields in SubmitDriverRequestMessage 6ff088d [Andrew Or] Rename classes to generalize REST protocol af9d9cb [Andrew Or] Integrate REST protocol in standalone mode 53e7c0e [Andrew Or] Initial client, server, and all the messages (cherry picked from commit 1390e56) Signed-off-by: Patrick Wendell <[email protected]>
### What changes were proposed in this pull request? This pr aims to upgrade jackson from 2.16.0 to 2.16.1 ### Why are the changes needed? The new version bring some fix: - [#4200](FasterXML/jackson-databind#4200): JsonSetter(contentNulls = FAIL) is ignored in delegating JsonCreator argument - [#4216](FasterXML/jackson-databind#4216): Primitive array deserializer not being captured by DeserializerModifier - [#4219](FasterXML/jackson-databind#4219): JsonNode.findValues() and findParents() missing expected values in 2.16.0 The full release notes as follows: - https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.16.1 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass Github Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44494 from LuciferYang/SPARK-46508. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
This pr aims to upgrade jackson from 2.16.0 to 2.16.1 The new version bring some fix: - [apache#4200](FasterXML/jackson-databind#4200): JsonSetter(contentNulls = FAIL) is ignored in delegating JsonCreator argument - [apache#4216](FasterXML/jackson-databind#4216): Primitive array deserializer not being captured by DeserializerModifier - [apache#4219](FasterXML/jackson-databind#4219): JsonNode.findValues() and findParents() missing expected values in 2.16.0 The full release notes as follows: - https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.16.1 No Pass Github Actions No Closes apache#44494 from LuciferYang/SPARK-46508. Authored-by: yangjie01 <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
The goal is to provide a stable, REST-based application submission gateway that is not inherently based on Akka, which is unstable across versions. This PR targets standalone cluster mode, but is implemented in a general enough manner that can be potentially extended to other modes in the future. Client mode is currently not included in the changes here because there are many more Akka messages exchanged there.
As of the changes here, the Master will advertise two ports, 7077 and 6066. We need to keep around the old one (7077) for client mode and older versions of Spark submit. However, all new versions of Spark submit will use the REST gateway (6066).
By the way this includes ~700 lines of tests and ~200 lines of license.