-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16114] [SQL] structured streaming network word count examples #13816
Conversation
.appName("JavaStructuredNetworkWordCount") | ||
.getOrCreate(); | ||
|
||
Dataset<String> df = spark.readStream().format("socket").option("host", args[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this
spark
.readStream()
.format("socket")
.option(...)
....
easier to read.
ok to test. |
|
||
import spark.implicits._ | ||
|
||
val df = spark.readStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
df --> lines
Test build #60970 has finished for PR 13816 at commit
|
Responded to comments |
Test build #60981 has finished for PR 13816 at commit
|
.getOrCreate(); | ||
|
||
// input lines (may be multiple words on each line) | ||
Dataset<String> lines = spark |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont need to convert to Dataset[String] using as
, since you are not using the typed groupByKey. keeping as Dataset[Row] is fine, as you done with the scala and python version.
Test build #61051 has finished for PR 13816 at commit
|
Test build #61054 has finished for PR 13816 at commit
|
test this again |
Test build #3127 has finished for PR 13816 at commit
|
Test build #3128 has finished for PR 13816 at commit
|
Test build #61071 has finished for PR 13816 at commit
|
Test build #61310 has finished for PR 13816 at commit
|
* `$ bin/run-example org.apache.spark.examples.sql.streaming.EventTimeWindowExample | ||
* localhost 9999 <checkpoint dir>` | ||
*/ | ||
object NetworkEventTimeWindow { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just rename to EventTimeWindow.
* To run this on your local machine, you need to first run a Netcat server | ||
* `$ nc -lk 9999` | ||
* and then run the example | ||
* `$ bin/run-example org.apache.spark.examples.sql.streaming.JavaStructuredNetworkWordCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just do $ bin/run-example sql.streaming.JavaStructuredNetworkWordCount
. Verify that, and if it works, please change it.
Test build #61389 has finished for PR 13816 at commit
|
Test build #61413 has finished for PR 13816 at commit
|
|
||
if __name__ == "__main__": | ||
if len(sys.argv) != 3: | ||
print("Usage: network_wordcount.py <hostname> <port>", file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usage has wrong name.
LGTM. Merging this to master and 2.0. Thank @jjthomas |
## What changes were proposed in this pull request? Network word count example for structured streaming ## How was this patch tested? Run locally Author: James Thomas <[email protected]> Author: James Thomas <[email protected]> Closes #13816 from jjthomas/master. (cherry picked from commit 3554713) Signed-off-by: Tathagata Das <[email protected]>
Test build #61421 has finished for PR 13816 at commit
|
What changes were proposed in this pull request?
Network word count example for structured streaming
How was this patch tested?
Run locally