[SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test #47468

WweiL · 2024-07-24T04:46:50Z

What changes were proposed in this pull request?

Disable the listener test. This test would fail after #46921, which is now reverted. The reason was because with #46921, the server starts a server side python process which serializes the StreamingQueryProgress object with the new StreamingQueryProgress change. But in the client, the client tries to deserialize StreamingQueryProgress use the old StreamingQueryProgress without the change, which caused serde error.

However, as the change is going to spark 4.0, and is considered a generally good improvement and does more good than harm, we would like to disable this test to bring back #46921.

Why are the changes needed?

Unblock bringing back #46921

Does this PR introduce any user-facing change?

No

How was this patch tested?

No need

Was this patch authored or co-authored using generative AI tooling?

No

… the actual StreamingQueryProgress This reverts commit d067fc6, which reverted 042804a, essentially brings it back. 042804a failed the 3.5 client <> 4.0 server test, but the test was decided to turned off for cross-version test in #47468 ### What changes were proposed in this pull request? This PR is created after discussion in this closed one: #46886 I was trying to fix a bug (in connect, query.lastProgress doesn't have `numInputRows`, `inputRowsPerSecond`, and `processedRowsPerSecond`), and we reached the conclusion that what purposed in this PR should be the ultimate fix. In python, for both classic spark and spark connect, the return type of `lastProgress` is `Dict` (and `recentProgress` is `List[Dict]`), but in scala it's the actual `StreamingQueryProgress` object: https://github.com/apache/spark/blob/1a5d22aa2ffe769435be4aa6102ef961c55b9593/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala#L94-L101 This API discrepancy brings some confusion, like in Scala, users can do `query.lastProgress.batchId`, while in Python they have to do `query.lastProgress["batchId"]`. This PR makes `StreamingQuery.lastProgress` to return the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` to return `List[StreamingQueryProgress]`). To prevent breaking change, we extend `StreamingQueryProgress` to be a subclass of `dict`, so existing code accessing using dictionary method (e.g. `query.lastProgress["id"]`) is still functional. ### Why are the changes needed? API parity ### Does this PR introduce _any_ user-facing change? Yes, now `StreamingQuery.lastProgress` returns the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` returns `List[StreamingQueryProgress]`). ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes #47470 from WweiL/bring-back-lastProgress. Authored-by: Wei Liu <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

dongjoon-hyun

Since this is a follow-up of the 3.5-only PR, +1, LGTM, too.

[SPARK-48089][SS][CONNECT] Fix 3.5 <> 4.0 StreamingQueryListener compatibility test #46513

…led 3.5 <> 4.0 test ### What changes were proposed in this pull request? Disable the listener test. This test would fail after #46921, which is now reverted. The reason was because with #46921, the server starts a server side python process which serializes the `StreamingQueryProgress` object with the new `StreamingQueryProgress` change. But in the client, the client tries to deserialize `StreamingQueryProgress` use the old `StreamingQueryProgress` without the change, which caused serde error. However, as the change is going to spark 4.0, and is considered a generally good improvement and does more good than harm, we would like to disable this test to bring back #46921. ### Why are the changes needed? Unblock bringing back #46921 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? No need ### Was this patch authored or co-authored using generative AI tooling? No Closes #47468 from WweiL/3.5-disable-server-listener-test-cross-version. Authored-by: Wei Liu <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

HyukjinKwon · 2024-07-25T00:38:47Z

Merged to branch-3.5

… the actual StreamingQueryProgress This reverts commit d067fc6, which reverted 042804a, essentially brings it back. 042804a failed the 3.5 client <> 4.0 server test, but the test was decided to turned off for cross-version test in apache#47468 ### What changes were proposed in this pull request? This PR is created after discussion in this closed one: apache#46886 I was trying to fix a bug (in connect, query.lastProgress doesn't have `numInputRows`, `inputRowsPerSecond`, and `processedRowsPerSecond`), and we reached the conclusion that what purposed in this PR should be the ultimate fix. In python, for both classic spark and spark connect, the return type of `lastProgress` is `Dict` (and `recentProgress` is `List[Dict]`), but in scala it's the actual `StreamingQueryProgress` object: https://github.com/apache/spark/blob/1a5d22aa2ffe769435be4aa6102ef961c55b9593/sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQuery.scala#L94-L101 This API discrepancy brings some confusion, like in Scala, users can do `query.lastProgress.batchId`, while in Python they have to do `query.lastProgress["batchId"]`. This PR makes `StreamingQuery.lastProgress` to return the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` to return `List[StreamingQueryProgress]`). To prevent breaking change, we extend `StreamingQueryProgress` to be a subclass of `dict`, so existing code accessing using dictionary method (e.g. `query.lastProgress["id"]`) is still functional. ### Why are the changes needed? API parity ### Does this PR introduce _any_ user-facing change? Yes, now `StreamingQuery.lastProgress` returns the actual `StreamingQueryProgress` (and `StreamingQuery.recentProgress` returns `List[StreamingQueryProgress]`). ### How was this patch tested? Added unit test ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47470 from WweiL/bring-back-lastProgress. Authored-by: Wei Liu <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>

WweiL added 2 commits July 23, 2024 21:40

skip

eef77b8

retrigger

2a1ebe2

github-actions bot added SQL STRUCTURED STREAMING CORE PYTHON CONNECT labels Jul 24, 2024

HyukjinKwon approved these changes Jul 24, 2024

View reviewed changes

WweiL mentioned this pull request Jul 24, 2024

[SPARK-48567][SS][FOLLOWUP] StreamingQuery.lastProgress should return the actual StreamingQueryProgress #47470

Closed

dongjoon-hyun changed the title ~~[SPARK-48089][SS][CONNECT][FOLLOWUP] Disable Server Listener failed 3.5 <> 4.0 test~~ [SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test Jul 24, 2024

dongjoon-hyun approved these changes Jul 24, 2024

View reviewed changes

HyukjinKwon closed this Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test #47468

[SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test #47468

WweiL commented Jul 24, 2024

dongjoon-hyun left a comment

HyukjinKwon commented Jul 25, 2024

[SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test #47468

[SPARK-48089][SS][CONNECT][FOLLOWUP][3.5] Disable Server Listener failed 3.5 <> 4.0 test #47468

Conversation

WweiL commented Jul 24, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

dongjoon-hyun left a comment

Choose a reason for hiding this comment

HyukjinKwon commented Jul 25, 2024