-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13923] [SQL] Implement SessionCatalog #11750
Conversation
* If no such database is specified, create it in the current database. | ||
*/ | ||
def createTable( | ||
currentDb: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is somewhat strange to pass in currentDb and then rely on some table definition's database. Have you thought about just figuring out that part in the caller outside the session catalog? i.e. the catalog itself doesn't need to handle currentDb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, let's move the tracking of current db into SessionCatalog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as discussed offline, this is because we need to deal with temporary tables. An alternative that I will implement here is to just keep track of the current database in this class so we don't need to pass it in everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make sure we document that in the session catalog code and explain why we are tracking current db in here.
Test build #53263 has finished for PR 11750 at commit
|
This is a standalone commit such that in the future we can split it out into a separate patch if preferrable.
This allows us to not pass it into every single method like we used to before this commit.
3b2e48a
to
ad43a5f
Compare
@@ -167,7 +170,7 @@ abstract class ExternalCatalog { | |||
* @param name name of the function | |||
* @param className fully qualified class name, e.g. "org.apache.spark.util.MyFunc" | |||
*/ | |||
case class CatalogFunction(name: String, className: String) | |||
case class CatalogFunction(name: FunctionIdentifier, className: String) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
catalogFunc.name.funcName
is kind of weird (we do not need to change it right now).
If there are no major problems, let's merge this as soon as tests pass. We can address minor comments as followups. This will unblock a bunch of other stuff. |
Yea, let's merge this once it passes tests. |
if (tempTables.containsKey(name) && !ignoreIfExists) { | ||
throw new AnalysisException(s"Temporary table '$name' already exists.") | ||
} | ||
tempTables.put(name, tableDefinition) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't really ignoreIfExists but updateIfExists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ignoreIfExists
is like "ignoring the exception if the table already exists", but I guess it doesn't convey whether the table itself is overridden. I could rename this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make this name more explicit. I think in other places, ignoreIfExists
means that if a table exists, we do nothing.
Test build #53358 has finished for PR 11750 at commit
|
Test build #2647 has finished for PR 11750 at commit
|
Test build #2648 has finished for PR 11750 at commit
|
Test build #2646 has finished for PR 11750 at commit
|
OK. Let me merge this. Let's address comments in the follow-up pr. |
## What changes were proposed in this pull request? As part of the effort to merge `SQLContext` and `HiveContext`, this patch implements an internal catalog called `SessionCatalog` that handles temporary functions and tables and delegates metastore operations to `ExternalCatalog`. Currently, this is still dead code, but in the future it will be part of `SessionState` and will replace `o.a.s.sql.catalyst.analysis.Catalog`. A recent patch apache#11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use `SessionCatalog` to implement the parsed commands. ## How was this patch tested? 800+ lines of tests in `SessionCatalogSuite`. Author: Andrew Or <[email protected]> Closes apache#11750 from andrewor14/temp-catalog.
## What changes were proposed in this pull request? `SessionCatalog`, introduced in #11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`. As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely: - SPARK-14013: Properly implement temporary functions in `SessionCatalog` - SPARK-13879: Decide which DDL/DML commands to support natively in Spark - SPARK-?????: Implement the ones we do want to support through `SessionCatalog`. - SPARK-?????: Merge SQL/HiveContext ## How was this patch tested? This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`. Author: Andrew Or <[email protected]> Author: Yin Huai <[email protected]> Closes #11836 from andrewor14/use-session-catalog.
## What changes were proposed in this pull request? `SessionCatalog`, introduced in apache#11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`. As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely: - SPARK-14013: Properly implement temporary functions in `SessionCatalog` - SPARK-13879: Decide which DDL/DML commands to support natively in Spark - SPARK-?????: Implement the ones we do want to support through `SessionCatalog`. - SPARK-?????: Merge SQL/HiveContext ## How was this patch tested? This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`. Author: Andrew Or <[email protected]> Author: Yin Huai <[email protected]> Closes apache#11836 from andrewor14/use-session-catalog.
## What changes were proposed in this pull request? This patch addresses the remaining comments left in apache#11750 and apache#11918 after they are merged. For a full list of changes in this patch, just trace the commits. ## How was this patch tested? `SessionCatalogSuite` and `CatalogTestCases` Author: Andrew Or <[email protected]> Closes apache#12006 from andrewor14/session-catalog-followup.
## What changes were proposed in this pull request? Session catalog was added in #11750. However, it doesn't really support temporary functions properly; right now we only store the metadata in the form of `CatalogFunction`, but this doesn't make sense for temporary functions because there is no class name. This patch moves the `FunctionRegistry` into the `SessionCatalog`. With this, the user can call `catalog.createTempFunction` and `catalog.lookupFunction` to use the function they registered previously. This is currently still dead code, however. ## How was this patch tested? `SessionCatalogSuite`. Author: Andrew Or <[email protected]> Closes #11972 from andrewor14/temp-functions.
What changes were proposed in this pull request?
As part of the effort to merge
SQLContext
andHiveContext
, this patch implements an internal catalog calledSessionCatalog
that handles temporary functions and tables and delegates metastore operations toExternalCatalog
. Currently, this is still dead code, but in the future it will be part ofSessionState
and will replaceo.a.s.sql.catalyst.analysis.Catalog
.A recent patch #11573 parses Hive commands ourselves in Spark, but still passes the entire query text to Hive. In a future patch, we will use
SessionCatalog
to implement the parsed commands.How was this patch tested?
800+ lines of tests in
SessionCatalogSuite
.