-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23096][SS] Migrate rate source to V2 #20688
Conversation
Test build #87740 has finished for PR 20688 at commit
|
Test build #87751 has finished for PR 20688 at commit
|
Retest this please. |
Test build #87799 has finished for PR 20688 at commit
|
@tdas @jose-torres can you please take a review when you have time, thanks! |
Test build #87951 has finished for PR 20688 at commit
|
override def shortName(): String = "rate" | ||
} | ||
|
||
class RateStreamMicroBatchReader(options: DataSourceOptions, checkpointLocation: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
split this into a separate file.
import org.apache.spark.sql.streaming.StreamTest | ||
import org.apache.spark.util.ManualClock | ||
|
||
class RateSourceSuite extends StreamTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you not move this file using "git mv" and then change? Then we would have been able to diff it properly.
This was a pain in the text socket v2 PR as well :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tdas , I think I used "git mv", the thing is that when the diff is larger then x%, it will treat as "git rm" and "git add" (https://makandracards.com/makandra/30957-git-how-to-get-a-useful-diff-when-renaming-files).
Sorry about the inconvenience, but I'm not sure if there's other approaches.
import org.apache.spark.sql.execution.streaming.{RateSourceProvider, RateStreamOffset, ValueRunTimeMsPair} | ||
import org.apache.spark.sql.execution.streaming.sources.RateStreamSourceV2 | ||
import org.apache.spark.sql.execution.streaming.{RateStreamOffset, ValueRunTimeMsPair} | ||
import org.apache.spark.sql.execution.streaming.sources.RateSourceProvider | ||
import org.apache.spark.sql.sources.v2.DataSourceOptions | ||
import org.apache.spark.sql.sources.v2.reader._ | ||
import org.apache.spark.sql.sources.v2.reader.streaming.{ContinuousDataReader, ContinuousReader, Offset, PartitionOffset} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the class names and file names of the different readers consistent with each other? Similar to Kafka?
RateStreamProvider
RateStreamMicroBatchReader, RateStreamMicroBatchDataReaderFactory ....
RateStreamContinuousReader, ....
@tdas I tried different ways to keep history of rename, but seems git always treat as "rm and add" for big changes (https://coderwall.com/p/_csouq/renaming-and-changing-files-in-git-without-losing-history). Sorry to bring inconvenience. Do you have any better solutions? |
Jenkins, retest this please. |
Test build #88085 has finished for PR 20688 at commit
|
Test build #88086 has finished for PR 20688 at commit
|
Sorry it took me so long to get to this. LGTM |
Thanks @tdas and @jose-torres . |
Sorry, I need to revert this PR. It breaks our build. Could you resubmit the PR after fixing the tests? |
Next time, we should re-trigger the tests before merging the code. The most recent tests ran 20 days ago |
## What changes were proposed in this pull request? This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test. ## How was this patch tested? UTs. Author: jerryshao <[email protected]> Closes apache#20688 from jerryshao/SPARK-23096.
## What changes were proposed in this pull request? Roll forward c68ec4e (#20688). There are two minor test changes required: * An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException. * The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils. ## How was this patch tested? existing tests Author: Jose Torres <[email protected]> Author: jerryshao <[email protected]> Closes #20922 from jose-torres/ratefix.
## What changes were proposed in this pull request? This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test. ## How was this patch tested? UTs. Author: jerryshao <[email protected]> Closes apache#20688 from jerryshao/SPARK-23096.
## What changes were proposed in this pull request? Roll forward c68ec4e (apache#20688). There are two minor test changes required: * An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException. * The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils. ## How was this patch tested? existing tests Author: Jose Torres <[email protected]> Author: jerryshao <[email protected]> Closes apache#20922 from jose-torres/ratefix.
Roll forward c68ec4e (apache#20688). There are two minor test changes required: * An error which used to be TreeNodeException[ArithmeticException] is no longer wrapped and is now just ArithmeticException. * The test framework simply does not set the active Spark session. (Or rather, it doesn't do so early enough - I think it only happens when a query is analyzed.) I've added the required logic to SQLTestUtils. existing tests Author: Jose Torres <[email protected]> Author: jerryshao <[email protected]> Closes apache#20922 from jose-torres/ratefix. Ref: LIHADOOP-48531
What changes were proposed in this pull request?
This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test.
How was this patch tested?
UTs.