Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12971] Fix Hive tests which fail in Hadoop-2.3 SBT build #10884

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -19,20 +19,34 @@ package org.apache.spark.sql.hive

import scala.util.Try

import org.scalatest.BeforeAndAfter
import org.scalatest.BeforeAndAfterEach

import org.apache.spark.sql.{AnalysisException, QueryTest}
import org.apache.spark.sql.catalyst.parser.ParseDriver
import org.apache.spark.sql.catalyst.util.quietly
import org.apache.spark.sql.hive.test.TestHiveSingleton

class ErrorPositionSuite extends QueryTest with TestHiveSingleton with BeforeAndAfter {
class ErrorPositionSuite extends QueryTest with TestHiveSingleton with BeforeAndAfterEach {
import hiveContext.implicits._

before {
override protected def beforeEach(): Unit = {
super.beforeEach()
if (sqlContext.tableNames().contains("src")) {
sqlContext.dropTempTable("src")
}
Seq((1, "")).toDF("key", "value").registerTempTable("src")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this suite was first added, I believe that src referred to a temporary table loaded in TestHive, which is why I used its schema here.

Interestingly, all of the tests in this suite pass even if src doesn't exist, which suggests to me that the tests aren't strong enough to detect certain classes of bugs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are testing query positionering errors here. The query is probably deemed invalid before we check that the src table exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind. The src attributes are also resolved in some tests...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the following tests:

  positionTest("unresolved attribute 1",
    "SELECT x FROM src", "x")

and

  positionTest("unresolved attribute 3",
    "SELECT key, x FROM src", "x")

and

  positionTest("unresolved attribute 4",
    """SELECT key,
      |x FROM src
    """.stripMargin, "x")

In the second and third tests, it looks like the assert is trying to check that "x" is used when positioning the error message and not "key". Therefore, I think it matters that the table has a column named "key" in order for this test to be meaningful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an experiment, I just tried adding a new test which drops the src table, then runs

positionTest("unresolved attribute new test",
    """SELECT x,
      |key FROM src
    """.stripMargin, "key")

and it looks like this actually does fail with

[info] - unresolved attribute 4 *** FAILED *** (47 milliseconds)
[info]   1 did not equal 2 wrong line (ErrorPositionSuite.scala:153)

I guess that the attribute resolution order doesn't relate to the order in which the attributes appear in the query: I think x is always resolved before key, which is why this test always passes irrespective of whether key exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this limitation in the existing test is out-of-scope for the purposes of this patch, which is just to fix Jenkins so my test failure notifications stop blowing up with notifications about this suite (I get a chat message for every test failure).

Seq((1, 1, 1)).toDF("a", "a", "b").registerTempTable("dupAttributes")
}

override protected def afterEach(): Unit = {
try {
sqlContext.dropTempTable("src")
sqlContext.dropTempTable("dupAttributes")
} finally {
super.afterEach()
}
}

positionTest("ambiguous attribute reference 1",
"SELECT a from dupAttributes", "a")

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,11 @@ abstract class HiveComparisonTest
""".stripMargin
})

super.afterAll()
try {
TestHive.reset()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't hurt to throw in an extra reset() at the end of suite teardown here and might happen to fix some other isolation issues.

} finally {
super.afterAll()
}
}

protected def prepareAnswer(
Expand Down