Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13923] [SQL] Implement SessionCatalog #11750

Closed
wants to merge 20 commits into from

Conversation

andrewor14
Copy link
Contributor

What changes were proposed in this pull request?

As part of the effort to merge SQLContext and HiveContext, this patch implements an internal catalog called SessionCatalog that handles temporary functions and tables and delegates metastore operations to ExternalCatalog. Currently, this is still dead code, but in the future it will be part of SessionState and will replace o.a.s.sql.catalyst.analysis.Catalog.

A recent patch #11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use SessionCatalog to implement the parsed commands.

How was this patch tested?

800+ lines of tests in SessionCatalogSuite.

@andrewor14 andrewor14 changed the title [SPARK-13923] Implement SessionCatalog [SPARK-13923] [SQL] Implement SessionCatalog Mar 16, 2016
@andrewor14
Copy link
Contributor Author

@yhuai @rxin

* If no such database is specified, create it in the current database.
*/
def createTable(
currentDb: String,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is somewhat strange to pass in currentDb and then rely on some table definition's database. Have you thought about just figuring out that part in the caller outside the session catalog? i.e. the catalog itself doesn't need to handle currentDb.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed offline, let's move the tracking of current db into SessionCatalog.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed offline, this is because we need to deal with temporary tables. An alternative that I will implement here is to just keep track of the current database in this class so we don't need to pass it in everywhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure we document that in the session catalog code and explain why we are tracking current db in here.

@SparkQA
Copy link

SparkQA commented Mar 16, 2016

Test build #53263 has finished for PR 11750 at commit 3b2e48a.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -167,7 +170,7 @@ abstract class ExternalCatalog {
* @param name name of the function
* @param className fully qualified class name, e.g. "org.apache.spark.util.MyFunc"
*/
case class CatalogFunction(name: String, className: String)
case class CatalogFunction(name: FunctionIdentifier, className: String)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catalogFunc.name.funcName is kind of weird (we do not need to change it right now).

@rxin
Copy link
Contributor

rxin commented Mar 16, 2016

If there are no major problems, let's merge this as soon as tests pass. We can address minor comments as followups. This will unblock a bunch of other stuff.

@yhuai
Copy link
Contributor

yhuai commented Mar 16, 2016

Yea, let's merge this once it passes tests.

if (tempTables.containsKey(name) && !ignoreIfExists) {
throw new AnalysisException(s"Temporary table '$name' already exists.")
}
tempTables.put(name, tableDefinition)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't really ignoreIfExists but updateIfExists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignoreIfExists is like "ignoring the exception if the table already exists", but I guess it doesn't convey whether the table itself is overridden. I could rename this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make this name more explicit. I think in other places, ignoreIfExists means that if a table exists, we do nothing.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #53358 has finished for PR 11750 at commit ad43a5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #2647 has finished for PR 11750 at commit ad43a5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #2648 has finished for PR 11750 at commit ad43a5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #2646 has finished for PR 11750 at commit ad43a5f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@yhuai
Copy link
Contributor

yhuai commented Mar 17, 2016

OK. Let me merge this. Let's address comments in the follow-up pr.

@asfgit asfgit closed this in ca9ef86 Mar 17, 2016
roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

As part of the effort to merge `SQLContext` and `HiveContext`, this patch implements an internal catalog called `SessionCatalog` that handles temporary functions and tables and delegates metastore operations to `ExternalCatalog`. Currently, this is still dead code, but in the future it will be part of `SessionState` and will replace `o.a.s.sql.catalyst.analysis.Catalog`.

A recent patch apache#11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use `SessionCatalog` to implement the parsed commands.

## How was this patch tested?

800+ lines of tests in `SessionCatalogSuite`.

Author: Andrew Or <[email protected]>

Closes apache#11750 from andrewor14/temp-catalog.
asfgit pushed a commit that referenced this pull request Mar 23, 2016
## What changes were proposed in this pull request?

`SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.

As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext

## How was this patch tested?

This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.

Author: Andrew Or <[email protected]>
Author: Yin Huai <[email protected]>

Closes #11836 from andrewor14/use-session-catalog.
@andrewor14 andrewor14 deleted the temp-catalog branch March 23, 2016 22:14
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Mar 24, 2016
## What changes were proposed in this pull request?

`SessionCatalog`, introduced in apache#11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`.

As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely:
- SPARK-14013: Properly implement temporary functions in `SessionCatalog`
- SPARK-13879: Decide which DDL/DML commands to support natively in Spark
- SPARK-?????: Implement the ones we do want to support through `SessionCatalog`.
- SPARK-?????: Merge SQL/HiveContext

## How was this patch tested?

This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`.

Author: Andrew Or <[email protected]>
Author: Yin Huai <[email protected]>

Closes apache#11836 from andrewor14/use-session-catalog.
ghost pushed a commit to dbtsai/spark that referenced this pull request Mar 28, 2016
## What changes were proposed in this pull request?

This patch addresses the remaining comments left in apache#11750 and apache#11918 after they are merged. For a full list of changes in this patch, just trace the commits.

## How was this patch tested?

`SessionCatalogSuite` and `CatalogTestCases`

Author: Andrew Or <[email protected]>

Closes apache#12006 from andrewor14/session-catalog-followup.
asfgit pushed a commit that referenced this pull request Mar 28, 2016
## What changes were proposed in this pull request?

Session catalog was added in #11750. However, it doesn't really support temporary functions properly; right now we only store the metadata in the form of `CatalogFunction`, but this doesn't make sense for temporary functions because there is no class name.

This patch moves the `FunctionRegistry` into the `SessionCatalog`. With this, the user can call `catalog.createTempFunction` and `catalog.lookupFunction` to use the function they registered previously. This is currently still dead code, however.

## How was this patch tested?

`SessionCatalogSuite`.

Author: Andrew Or <[email protected]>

Closes #11972 from andrewor14/temp-functions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants