Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16114] [SQL] structured streaming network word count examples #13816

Closed
wants to merge 11 commits into from

Conversation

jjthomas
Copy link
Contributor

What changes were proposed in this pull request?

Network word count example for structured streaming

How was this patch tested?

Run locally

.appName("JavaStructuredNetworkWordCount")
.getOrCreate();

Dataset<String> df = spark.readStream().format("socket").option("host", args[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this

spark
   .readStream()
   .format("socket")
   .option(...)
....

easier to read.

@tdas
Copy link
Contributor

tdas commented Jun 21, 2016

ok to test.


import spark.implicits._

val df = spark.readStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df --> lines

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60970 has finished for PR 13816 at commit 38b5497.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class JavaStructuredNetworkWordCount

@jjthomas
Copy link
Contributor Author

Responded to comments

@SparkQA
Copy link

SparkQA commented Jun 21, 2016

Test build #60981 has finished for PR 13816 at commit 18c83b1.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class JavaStructuredNetworkWordCount

.getOrCreate();

// input lines (may be multiple words on each line)
Dataset<String> lines = spark
Copy link
Contributor

@tdas tdas Jun 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you dont need to convert to Dataset[String] using as, since you are not using the typed groupByKey. keeping as Dataset[Row] is fine, as you done with the scala and python version.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61051 has finished for PR 13816 at commit 46ac930.

  • This patch fails Python style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61054 has finished for PR 13816 at commit 80fee20.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jun 22, 2016

test this again

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #3127 has finished for PR 13816 at commit 80fee20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #3128 has finished for PR 13816 at commit 80fee20.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 22, 2016

Test build #61071 has finished for PR 13816 at commit f7aec9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 27, 2016

Test build #61310 has finished for PR 13816 at commit c3b16a2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* `$ bin/run-example org.apache.spark.examples.sql.streaming.EventTimeWindowExample
* localhost 9999 <checkpoint dir>`
*/
object NetworkEventTimeWindow {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just rename to EventTimeWindow.

* To run this on your local machine, you need to first run a Netcat server
* `$ nc -lk 9999`
* and then run the example
* `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just do $ bin/run-example sql.streaming.JavaStructuredNetworkWordCount. Verify that, and if it works, please change it.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61389 has finished for PR 13816 at commit fb491c6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jun 28, 2016

Test build #61413 has finished for PR 13816 at commit 6ab4453.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.


if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr)
Copy link
Contributor

@tdas tdas Jun 28, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usage has wrong name.

@tdas
Copy link
Contributor

tdas commented Jun 28, 2016

LGTM. Merging this to master and 2.0. Thank @jjthomas

asfgit pushed a commit that referenced this pull request Jun 28, 2016
## What changes were proposed in this pull request?

Network word count example for structured streaming

## How was this patch tested?

Run locally

Author: James Thomas <[email protected]>
Author: James Thomas <[email protected]>

Closes #13816 from jjthomas/master.

(cherry picked from commit 3554713)
Signed-off-by: Tathagata Das <[email protected]>
@asfgit asfgit closed this in 3554713 Jun 28, 2016
@SparkQA
Copy link

SparkQA commented Jun 29, 2016

Test build #61421 has finished for PR 13816 at commit a8c3fec.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants