Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24613][SQL] Cache with UDF could not be matched with subsequent dependent caches #21602

Closed
wants to merge 2 commits into from

Conversation

maryannxue
Copy link
Contributor

What changes were proposed in this pull request?

Wrap the logical plan with a AnalysisBarrier for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again.

How was this patch tested?

Add one test in DatasetCacheSuite

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92149 has finished for PR 21602 at commit 4a5c388.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gatorsmile gatorsmile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -132,4 +132,19 @@ class DatasetCacheSuite extends QueryTest with SharedSQLContext with TimeLimits
df.unpersist()
assert(df.storageLevel == StorageLevel.NONE)
}

test("SPARK-24613 Cache with UDF could not be matched with subsequent dependent caches") {
val expensiveUDF = udf({x: Int => Thread.sleep(10000); x})
Copy link
Contributor

@cloud-fan cloud-fan Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use accumulator and make sure this UDF only run 10 times? sleeping 10 seconds is not good in a unit test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accumulators probably wouldn't work. I'll do verify plan though.

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92158 has finished for PR 21602 at commit 377f213.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

retest this please

@SparkQA
Copy link

SparkQA commented Jun 21, 2018

Test build #92174 has finished for PR 21602 at commit 377f213.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@gatorsmile
Copy link
Member

gatorsmile commented Jun 21, 2018

Thanks! Merged to master.

@asfgit asfgit closed this in b9a6f74 Jun 21, 2018
@gatorsmile
Copy link
Member

This is also a regression. Backported to 2.3 branch too.

asfgit pushed a commit that referenced this pull request Jun 27, 2018
…t dependent caches

Wrap the logical plan with a `AnalysisBarrier` for execution plan compilation in CacheManager, in order to avoid the plan being analyzed again.

Add one test in `DatasetCacheSuite`

Author: Maryann Xue <[email protected]>

Closes #21602 from maryannxue/cache-mismatch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants