Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-36533][SS][FOLLOWUP] Support Trigger.AvailableNow in PySpark #34592

Closed

Conversation

HeartSaVioR
Copy link
Contributor

@HeartSaVioR HeartSaVioR commented Nov 14, 2021

What changes were proposed in this pull request?

This PR proposes to add Trigger.AvailableNow in PySpark on top of #33763.

Why are the changes needed?

We missed adding Trigger.AvailableNow in PySpark in #33763.

Does this PR introduce any user-facing change?

Yes, Trigger.AvailableNow will be available in PySpark as well.

How was this patch tested?

Added simple validation in PySpark doc. Manually tested as below:

>>> spark.readStream.format("text").load("/WorkArea/ScalaProjects/spark-apache/dist/inputs").writeStream.format("console").trigger(once=True).start()
<pyspark.sql.streaming.StreamingQuery object at 0x118dff6d0>
-------------------------------------------
Batch: 0
-------------------------------------------
+-----+
|value|
+-----+
|    a|
|    b|
|    c|
|    d|
|    e|
+-----+


>>> spark.readStream.format("text").load("/WorkArea/ScalaProjects/spark-apache/dist/inputs").writeStream.format("console").trigger(availableNow=True).start()
<pyspark.sql.streaming.StreamingQuery object at 0x118dffe50>
>>> -------------------------------------------
Batch: 0
-------------------------------------------
+-----+
|value|
+-----+
|    a|
|    b|
|    c|
|    d|
|    e|
+-----+


>>> spark.readStream.format("text").option("maxfilespertrigger", "2").load("/WorkArea/ScalaProjects/spark-apache/dist/inputs").writeStream.format("console").trigger(availableNow=True).start()
<pyspark.sql.streaming.StreamingQuery object at 0x118dff820>
>>> -------------------------------------------
Batch: 0
-------------------------------------------
+-----+
|value|
+-----+
|    a|
|    b|
+-----+

-------------------------------------------
Batch: 1
-------------------------------------------
+-----+
|value|
+-----+
|    c|
|    d|
+-----+

-------------------------------------------
Batch: 2
-------------------------------------------
+-----+
|value|
+-----+
|    e|
+-----+

>>>

@HeartSaVioR
Copy link
Contributor Author

cc. @srowen @brkyvz @viirya @xuanyuanking @bozhang2820
Also cc. @HyukjinKwon to double-check I don't miss any rules specific to PySpark side.

@SparkQA
Copy link

SparkQA commented Nov 14, 2021

Test build #145212 has finished for PR 34592 at commit 1f795f4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HeartSaVioR
Copy link
Contributor Author

Thanks for the quick review! Merging to master.

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49681/

@SparkQA
Copy link

SparkQA commented Nov 15, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49681/

@HeartSaVioR
Copy link
Contributor Author

I forgot about SS guide doc; there're code examples on trigger in PySpark. I'll need to update the doc as well. Crafting another follow-up PR sooner...

@HeartSaVioR
Copy link
Contributor Author

#34597

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants