Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2817] [SQL] add "show create table" support #1760

Closed
wants to merge 13 commits into from

Conversation

tianyi
Copy link
Contributor

@tianyi tianyi commented Aug 4, 2014

In spark sql component, the "show create table" syntax had been disabled.
We thought it is a useful funciton to describe a hive table.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@chenghao-intel
Copy link
Contributor

Don't forget to commit the golden answer files.
You can get that by running "sbt/sbt -Phive=true hive/test".

@tianyi
Copy link
Contributor Author

tianyi commented Aug 4, 2014

@chenghao-intel is these files all right?

@chenghao-intel
Copy link
Contributor

Jenkins, test this please

@tianyi
Copy link
Contributor Author

tianyi commented Aug 4, 2014

what's wrong with jenkins?

@JoshRosen
Copy link
Contributor

Jenkins, test this please.

(Jenkins only listens to a pre-approved list of GitHub accounts).

@liancheng
Copy link
Contributor

LGTM :)

@marmbrus
Copy link
Contributor

marmbrus commented Aug 4, 2014

test this please

@marmbrus
Copy link
Contributor

marmbrus commented Aug 4, 2014

Can you please add [SQL] to the title of any PRs that affect Spark SQL?

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA tests have started for PR 1760. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17876/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 4, 2014

QA results for PR 1760:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17876/consoleFull

@tianyi tianyi changed the title [SPARK-2817] add "show create table" support [SPARK-2817] [SQL] add "show create table" support Aug 4, 2014
@tianyi
Copy link
Contributor Author

tianyi commented Aug 5, 2014

there are two problem:
1 the result of "show create table" contains some time properties, which could not be the same with the time of the case running. Do we have hadoop and hive environment on jenkins machine? Should I remove some golden files?

2 I could not find how to get the ${system:test.tmp.dir} variable on my laptop, I thought it could be a environment related problem, but it failed on jenkins too. I think I can fix this by modify the pom.xml in "sql/hive/".

@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2014

We can add time / user specific properties to nonDeterministicLineIndicators

Regarding the test that requires a system property, we don't support that yet. I'd remove that test from the whitelist.

Another note: there is no hive/hadoop on jenkins so only tests with golden files will pass there.

@tianyi
Copy link
Contributor Author

tianyi commented Aug 5, 2014

I modified the pom.xml in spark-hive module, to support ${system:test.tmp.dir}.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2014

I don't think thats a maven thing is it? I think thats a hive substitution that we haven't implemented.

@tianyi
Copy link
Contributor Author

tianyi commented Aug 5, 2014

Hi, Michael. I run the test on my laptop with this configuration, it pass the test which failed before. I also check that the apache-hive project create the same configuration in pom.xml.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2014

Oh, I see, its a system property that is set by maven and then threaded though to hive. Thank you for explaining.

I'd still just skip this test as thats not going to work for sbt (which is what Jenkins uses), and I don't want to have to change all the build files for this small detail.

@marmbrus
Copy link
Contributor

marmbrus commented Aug 5, 2014

Another option would be to set this system property somewhere in the HiveComparisionTest instead of in the build file.

@tianyi
Copy link
Contributor Author

tianyi commented Aug 6, 2014

how about adding another rule in rewritePaths funciton in TestHive.scala?

@marmbrus
Copy link
Contributor

marmbrus commented Aug 6, 2014

Yeah, but then we have to handle escaping and such ourselves and if there are other properties the function could get unwieldily. Is there a problem calling System.setProperty(...) in the constructor of TestHive?

@tianyi
Copy link
Contributor Author

tianyi commented Aug 7, 2014

2 question:
Q1: There are few variable in hive test sql, like:
system:build.ivy.lib.dir
system:test.dfs.mkdir
system:test.src.data.dir
system:test.tmp.dir
system:xxx
should I add them all in TestHive ?
Q2: In Hive project, the "test.tmp.dir" had been set to "${project.build.directory}/tmp" , I check all the SystemProperties , the "user.dir" is the closest one to ${project.build.directory}. Should I find best value for other variable?

add "${system:test.tmp.dir}" support
add "last_modified_by" to nonDeterministicLineIndicators in HiveComparisonTest
@tianyi
Copy link
Contributor Author

tianyi commented Aug 7, 2014

I just add "${system:test.tmp.dir}" support for now.

@tianyi
Copy link
Contributor Author

tianyi commented Aug 8, 2014

Hi Michael , could you review these codes again?

@@ -60,6 +60,8 @@ class TestHiveContext(sc: SparkContext) extends HiveContext(sc) {
// without restarting the JVM.
System.clearProperty("spark.hostPort")



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove these spurious new line additions? Below as well.

…n the source tree, and also clean some empty line
@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Aug 12, 2014

QA tests have started for PR 1760. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18355/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 12, 2014

QA results for PR 1760:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18355/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 12, 2014

QA tests have started for PR 1760. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18359/consoleFull

@tianyi
Copy link
Contributor Author

tianyi commented Aug 12, 2014

I'm sorry for forgetting run the test yesterday.
this time, i passed all the test on my laptop except udf_unix_timestamp function, I guess it should be a enviorment problem.

@liancheng
Copy link
Contributor

Be aware that the udf_unix_timestamp case is timezone sensitive. That's why we reset timezone to "America/Los_Angeles" in beforeAll. This may be related to your test failure.

@SparkQA
Copy link

SparkQA commented Aug 12, 2014

QA results for PR 1760:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18359/consoleFull

@tianyi
Copy link
Contributor Author

tianyi commented Aug 13, 2014

@marmbrus I think the test failed because some enviorment problem on jenkins host. could you run it again?

@marmbrus
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA tests have started for PR 1760. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18399/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA results for PR 1760:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18399/consoleFull

@tianyi
Copy link
Contributor Author

tianyi commented Aug 13, 2014

@marmbrus Could you ask someone help me to fix this error ?

Failed example:
srdd.collect()
Exception raised:
Traceback (most recent call last):
File "/usr/lib64/python2.6/doctest.py", line 1253, in run
compileflags, 1) in test.globs
File "<doctest pyspark.sql.SQLContext.inferSchema[6]>", line 1, in
srdd.collect()
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/sql.py", line 1613, in collect
rows = RDD.collect(self)
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/pyspark/rdd.py", line 724, in collect
bytesInJava = self._jrdd.collect().iterator()
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call

self.target_id, self.name)
File "/home/jenkins/workspace/SparkPullRequestBuilder/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value
format(target_id, '.', name), value)
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 35.0 failed 1 times, most recent failure: Lost task 1.0 in stage 35.0 (TID 72, localhost): java.lang.ClassCastException: java.lang.String cannot be cast to java.util.ArrayList
net.razorvine.pickle.objects.ArrayConstructor.construct(ArrayConstructor.java:33)
net.razorvine.pickle.Unpickler.load_reduce(Unpickler.java:617)
net.razorvine.pickle.Unpickler.dispatch(Unpickler.java:170)
net.razorvine.pickle.Unpickler.load(Unpickler.java:84)
net.razorvine.pickle.Unpickler.loads(Unpickler.java:97)
......

@marmbrus
Copy link
Contributor

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA tests have started for PR 1760. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18487/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 13, 2014

QA results for PR 1760:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18487/consoleFull

@marmbrus
Copy link
Contributor

This only failed streaming tests. I'm going to merge into master and 1.1.

Thanks!

@asfgit asfgit closed this in 13f54e2 Aug 13, 2014
asfgit pushed a commit that referenced this pull request Aug 13, 2014
In spark sql component, the "show create table" syntax had been disabled.
We thought it is a useful funciton to describe a hive table.

Author: tianyi <[email protected]>
Author: tianyi <[email protected]>
Author: tianyi <[email protected]>

Closes #1760 from tianyi/spark-2817 and squashes the following commits:

7d28b15 [tianyi] [SPARK-2817] fix too short prefix problem
cbffe8b [tianyi] [SPARK-2817] fix the case problem
565ec14 [tianyi] [SPARK-2817] fix the case problem
60d48a9 [tianyi] [SPARK-2817] use system temporary folder instead of temporary files in the source tree, and also clean some empty line
dbe1031 [tianyi] [SPARK-2817] move some code out of function rewritePaths, as it may be called multiple times
9b2ba11 [tianyi] [SPARK-2817] fix the line length problem
9f97586 [tianyi] [SPARK-2817] remove test.tmp.dir from pom.xml
bfc2999 [tianyi] [SPARK-2817] add "File.separator" support, create a "testTmpDir" outside the rewritePaths
bde800a [tianyi] [SPARK-2817] add "${system:test.tmp.dir}" support add "last_modified_by" to nonDeterministicLineIndicators in HiveComparisonTest
bb82726 [tianyi] [SPARK-2817] remove test which requires a system from the whitelist.
bbf6b42 [tianyi] [SPARK-2817] add a systemProperties named "test.tmp.dir" to pass the test which contains "${system:test.tmp.dir}"
a337bd6 [tianyi] [SPARK-2817] add "show create table" support
a03db77 [tianyi] [SPARK-2817] add "show create table" support

(cherry picked from commit 13f54e2)
Signed-off-by: Michael Armbrust <[email protected]>
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
In spark sql component, the "show create table" syntax had been disabled.
We thought it is a useful funciton to describe a hive table.

Author: tianyi <[email protected]>
Author: tianyi <[email protected]>
Author: tianyi <[email protected]>

Closes apache#1760 from tianyi/spark-2817 and squashes the following commits:

7d28b15 [tianyi] [SPARK-2817] fix too short prefix problem
cbffe8b [tianyi] [SPARK-2817] fix the case problem
565ec14 [tianyi] [SPARK-2817] fix the case problem
60d48a9 [tianyi] [SPARK-2817] use system temporary folder instead of temporary files in the source tree, and also clean some empty line
dbe1031 [tianyi] [SPARK-2817] move some code out of function rewritePaths, as it may be called multiple times
9b2ba11 [tianyi] [SPARK-2817] fix the line length problem
9f97586 [tianyi] [SPARK-2817] remove test.tmp.dir from pom.xml
bfc2999 [tianyi] [SPARK-2817] add "File.separator" support, create a "testTmpDir" outside the rewritePaths
bde800a [tianyi] [SPARK-2817] add "${system:test.tmp.dir}" support add "last_modified_by" to nonDeterministicLineIndicators in HiveComparisonTest
bb82726 [tianyi] [SPARK-2817] remove test which requires a system from the whitelist.
bbf6b42 [tianyi] [SPARK-2817] add a systemProperties named "test.tmp.dir" to pass the test which contains "${system:test.tmp.dir}"
a337bd6 [tianyi] [SPARK-2817] add "show create table" support
a03db77 [tianyi] [SPARK-2817] add "show create table" support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants