We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rdd的join和spark sql的join是否相同? 我看了一些资料说,“当前SparkSQL支持三种join算法:Shuffle Hash Join、Broadcast Hash Join以及Sort Merge Join。”。例如这篇文章, https://segmentfault.com/a/1190000021033287 这里面提到的join和书中将的rdd join有大的区别。想问下rdd join和spark join是不同的实现?如果是相同的实现那么文章中的说法是错的?
The text was updated successfully, but these errors were encountered:
@ hangjianglaoweng 本书主要介绍基于RDD接口的Join,书中介绍的join类似于SparkSQL中的Shuffle hash join。SparkSQL面向高性能的SQL查询分析,所以会对SQL语句实现做很多优化,Broadcast Hash Join以及Sort Merge Join是其中的两个优化,针对一些特殊的join场景执行效率会高些。关于SparkSQL的内核解析,可以阅读我师弟的著作《SparkSQL内核剖析》。
Sorry, something went wrong.
@JerryLead 好的,感谢解答。
No branches or pull requests
Rdd的join和spark sql的join是否相同?
我看了一些资料说,“当前SparkSQL支持三种join算法:Shuffle Hash Join、Broadcast Hash Join以及Sort Merge Join。”。例如这篇文章,
https://segmentfault.com/a/1190000021033287
这里面提到的join和书中将的rdd join有大的区别。想问下rdd join和spark join是不同的实现?如果是相同的实现那么文章中的说法是错的?
The text was updated successfully, but these errors were encountered: