Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39503][SQL] Add session catalog name for v1 database table and function #36936

Closed
wants to merge 8 commits into from

Conversation

ulysses-you
Copy link
Contributor

@ulysses-you ulysses-you commented Jun 21, 2022

What changes were proposed in this pull request?

  • Add session catalog name in identifiers, then all identifiers will be 3 part name
  • Make CatalogImpl#refreshTable support 3 part name. This option is required since a lot of command use this method.

Why are the changes needed?

To make it more clearer that this table or function comes from which catalog. It affects:

  • the scan table of the query plan
  • the target table of the data writing
  • desc database
  • desc table
  • desc function

Note that, we do not support view catalog so we do not add catalog for view at this pr.

Does this PR introduce any user-facing change?

maybe yes, so add a new config spark.sql.legacy.identifierOutputCatalogName to restore the old behavior

How was this patch tested?

pass CI

@github-actions github-actions bot added the SQL label Jun 21, 2022
@ulysses-you ulysses-you force-pushed the output-catalog branch 3 times, most recently from 0eada9f to 0db8c67 Compare June 22, 2022 07:49
@ulysses-you ulysses-you marked this pull request as draft June 22, 2022 10:21
@ulysses-you ulysses-you force-pushed the output-catalog branch 2 times, most recently from db01710 to 47dfa76 Compare June 23, 2022 09:20
Copy link
Contributor

@tomvanbussel tomvanbussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! This will make it a lot clearer whether the session catalog was used or not. To that end it is a bit weird to me that we do not include the session catalog name for views. That could give the wrong impression that views and tables live in different catalogs. Which is not true, as AFAIK it's allowed by Spark to refer to a view as spark_catalog.default.view_name.

Displaying the catalog for views may also allow us to remove some of the additional code that had to be added (e.g. ...WithoutCatalog). I understand we currently do not support views with the v2 plugins, but just seems like a technicality to me, and something that we may want to change in the future, not something that we would like to "leak" to the users like this.

@ulysses-you
Copy link
Contributor Author

@tomvanbussel yeah, I'm fine to also support add session catalog for view. But I want to clarify some difference and trade-off for view.

  • For temp view, it does not belong to any catalog, so it should be as is
  • For global temp view, ideally, we should treat it as a special catalog, but for now we can not due to we do not support v2 view catalog, then we also can not add session catalog for it
  • What we can add support is permanent view

@tomvanbussel
Copy link
Contributor

tomvanbussel commented Jun 23, 2022

@ulysses-you Fully agreed that we should leave local temporary views as they are. They also do not belong to a database AFAIK, so it indeed would be weird to include the session catalog in the identifier. For global temporary views, I believe we should display the catalog, as they belong to the global_temp database in the session catalog.

I'm not sure if I'm the right person to discuss how views should be integrated with the v2 catalog, but it seems like a neat idea. Might be worth exploring in a follow-up.

@gatorsmile
Copy link
Member

@ulysses-you ulysses-you force-pushed the output-catalog branch 2 times, most recently from 499dcec to 2e95e2b Compare June 27, 2022 01:58
@ulysses-you ulysses-you marked this pull request as ready for review June 27, 2022 01:58
@@ -34,13 +38,35 @@ sealed trait IdentifierWithDatabase {
private def quoteIdentifier(name: String): String = name.replace("`", "``")

def quotedString: String = {
if (SQLConf.get.getConf(LEGACY_IDENTIFIER_OUTPUT_CATALOG_NAME) && database.isDefined) {
val replacedId = quoteIdentifier(identifier)
val replacedDb = database.map(quoteIdentifier(_))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val replacedDb = database.map(quoteIdentifier(_))
val replacedDb = quoteIdentifier(database.get)

buildConf("spark.sql.legacy.identifierOutputCatalogName")
.internal()
.doc("When set to true, the identifier will output catalog name if database is defined. " +
"When set to false, it restores the legacy behavior that does not output catalog name.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: usually true means legacy behavior, we probably need to rename the config a little bit.

@cloud-fan
Copy link
Contributor

looks good in general. I'm taking a closer look at where we use unquotedStringWithoutCatalog

@ulysses-you
Copy link
Contributor Author

ulysses-you commented Jun 28, 2022

The change list to help reviewer:

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala                  |   3 ++-
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala               |  16 +++++++++++-----
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala                    |  12 ++++++++++--
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala                          |  40 +++++++++++++++++++++++++++++++---------
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala                 |  47 ++++++++++++++++++++++++++++++-----------------
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala                              |   9 +++++++++
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala                                   |   3 ++-
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala                      |   3 ++-
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzePartitionCommand.scala         |   5 +++--
sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala          |   9 ++++++---
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala                             |  21 ++++++++++++---------
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala                       |   9 ++++++---
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala                          |  36 +++++++++++++++++++++++-------------
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/LogicalRelation.scala             |   4 +++-
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/rules.scala                       |  19 +++++++++++--------
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala    |   1 +
sql/core/src/main/scala/org/apache/spark/sql/internal/BaseSessionStateBuilder.scala                  |   8 ++++----
sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala                              |  57 ++++++++++++++++++++++++++++++++++----------------------
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala                          |  19 ++++++++++---------
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala                         |   9 ++++++---
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala                               |   4 +++-
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala                |   4 +++-

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala          |  41 +++++++++++++++++++++++------------------
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/ExpressionParserSuite.scala         |   9 ++++++---
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala                            |  10 +++++++---
sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala                     |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala                        |  24 +++++++++++++++---------
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeNamespaceSuite.scala       |  12 +++++++-----
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala           |   2 ++
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala       |   1 +
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala                  |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala                    |  24 ++++++++++++++----------
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala                              |   7 ++++---
sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala                                     |   9 +++++----
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala                       |  22 ++++++++++++++--------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala                   |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala                 |  13 +++++++------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala                       |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala                      |  11 +++++++----
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DescribeTableSuite.scala         |   2 ++

sql/core/src/test/resources/sql-tests/results/charvarchar.sql.out                                    |  16 +++++++++++++++-
sql/core/src/test/resources/sql-tests/results/describe-part-after-analyze.sql.out                    |   7 +++++++
sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out               |   5 +++++
sql/core/src/test/resources/sql-tests/results/describe.sql.out                                       |  23 +++++++++++++++--------
sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out                                    | 104 +++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------
sql/core/src/test/resources/sql-tests/results/explain-cbo.sql.out                                    |   8 ++++----
sql/core/src/test/resources/sql-tests/results/explain.sql.out                                        |  96 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
sql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out                             |   2 +-
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out                                    |  10 ++++++----
sql/core/src/test/resources/sql-tests/results/udaf.sql.out                                           |   2 +-
sql/core/src/test/resources/sql-tests/results/udf/udf-udaf.sql.out                                   |   2 +-

@@ -895,7 +905,7 @@ case class ShowTablesCommand(
val normalizedSpec = PartitioningUtils.normalizePartitionSpec(
partitionSpec.get,
table.partitionSchema,
tableIdent.quotedString,
tableIdent.quotedString(SESSION_CATALOG_NAME),
sparkSession.sessionState.conf.resolver)
val partition = catalog.getPartition(tableIdent, normalizedSpec)
val database = tableIdent.database.getOrElse("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also want to add the catalog name to the output of this function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @tomvanbussel for pointing out this, actually it is out of scope of this pr. We can support improve the output attribute of some related commands in future.


if (replacedDb.isDefined) s"`${replacedDb.get}`.`$replacedId`" else s"`$replacedId`"
def quotedString(catalog: Option[String]): String = {
Copy link
Contributor

@tomvanbussel tomvanbussel Jun 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add these new methods. These methods make it easier to omit the catalog than to include it. I'm worried that this leads to most new code (and some existing code) not including the catalog. AFAIK it's always safe to include the catalog name if the database name is set (as in that case we know that it won't be a local temporary view), so the following should work:

def quotedString: String = {
  val replacedId = quoteIdentifier(identifier)
  val replacedDb = database.map(quoteIdentifier)
  if (database.isDefined) {
    if (SQLConf.get.getConf(LEGACY_NON_IDENTIFIER_OUTPUT_CATALOG_NAME) || isTempView) {
      s"`${replacedDb.get}`.`$replacedId`"
    } else {
      s"`$SESSION_CATALOG_NAME`.`${replacedDb.get}`.`$replacedId`"
    }
  } else {
    s"`$replacedId`"
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is hacky that we access the conf inside identifier that it may appear in anywhere. I make a new pr which pull out the config see #37021

@@ -234,7 +235,7 @@ case class UnresolvedGenerator(name: FunctionIdentifier, children: Seq[Expressio
override def nullable: Boolean = throw new UnresolvedException("nullable")
override lazy val resolved = false

override def prettyName: String = name.unquotedString
override def prettyName: String = name.unquotedString(SESSION_CATALOG_NAME)
override def toString: String = s"'$name(${children.mkString(", ")})"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will include the unquoted identifier without the catalog, as the toString method of FunctionIdentifier is implemented as unquotedString. Is this intentional?

@ulysses-you
Copy link
Contributor Author

I create a new pr #37021 for this issue but a new approach:

  • add catalog field in identifier, so identifier just print catalog if defined
  • inject catalog at the beginning of identifier life

cloud-fan pushed a commit that referenced this pull request Jul 1, 2022
… function

### What changes were proposed in this pull request?

- Add session catalog name in identifiers, then all identifiers will be 3 part name

### Why are the changes needed?

To make it more clearer that this table or function comes from which catalog. It affects:

- the scan table/permanent view of the query plan
- the target table of the data writing
- desc database
- desc table
- desc function

Note that, we do not support temporary view since it does not belong to any database and catalog

This a new appraoch of #36936 that:
- add catalog field in identifier, so identifier just print catalog if defined
- inject catalog at the beginning of identifier life

### Does this PR introduce _any_ user-facing change?

maybe yes, so add a new config `spark.sql.legacy.nonIdentifierOutputCatalogName` to restore the old behavior

### How was this patch tested?

change list:
```scala
docs/sql-migration-guide.md                                                                          |   1 +

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/InMemoryCatalog.scala              |  10 +++++++---
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala               |  28 ++++++++++++++++++---------
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala                    |   1 +
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/identifiers.scala                          |  56 ++++++++++++++++++++++++++++++++++++++++++++---------
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala                       |   4 ++--
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Implicits.scala          |  13 +++++++++----
sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/LookupCatalog.scala               |   8 ++++++--
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala                              |   9 +++++++++
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala                                   |   7 ++++---
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala                             |  14 ++++++++------
sql/core/src/main/scala/org/apache/spark/sql/execution/command/functions.scala                       |   6 ++++--
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DescribeNamespaceExec.scala    |   1 +
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala                        |   5 +++--
sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala                              |   5 +++--

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogSuite.scala         |  5 +++--
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalogSuite.scala          |  58 ++++++++++++++++++++++++++++++-------------------------
sql/catalyst/src/test/scala/org/apache/spark/sql/connector/catalog/LookupCatalogSuite.scala          |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/DataFrameJoinSuite.scala                                |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWriterV2Suite.scala                            |  13 +++++++++----
sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala                                     |   7 +++++--
sql/core/src/test/scala/org/apache/spark/sql/execution/QueryExecutionSuite.scala                     |   4 +++-
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala                            |  39 ++++++++++++++++++++++++------------
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewTestSuite.scala                        |   7 +++++--
sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala                        |  36 +++++++++++++++++++++--------------
sql/core/src/test/scala/org/apache/spark/sql/execution/command/PlanResolutionSuite.scala             |  51 +++++++++++++++++++++++++++++----------------------
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeNamespaceSuite.scala       |  12 +++++++-----
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/DescribeTableSuite.scala           |   2 ++
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v1/ShowTblPropertiesSuite.scala       |   3 ++-
sql/core/src/test/scala/org/apache/spark/sql/execution/command/v2/DescribeNamespaceSuite.scala       |   1 +
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala                  |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala                    |  25 +++++++++++++-----------
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala                              |   7 ++++---
sql/hive/src/test/scala/org/apache/spark/sql/hive/UDFSuite.scala                                     |   9 +++++----
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala                       |  27 ++++++++++++++++----------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveExplainSuite.scala                   |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveSQLViewSuite.scala                   |   4 +++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveTableScanSuite.scala                 |  13 +++++++------
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala                       |   3 ++-
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala                      |   8 +++++---
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/DescribeTableSuite.scala         |   2 ++
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/command/ShowFunctionsSuite

sql/core/src/test/resources/sql-tests/results/charvarchar.sql.out                                    |  20 ++++++++++++++++++-
sql/core/src/test/resources/sql-tests/results/describe-table-after-alter-table.sql.out               |   5 +++++
sql/core/src/test/resources/sql-tests/results/describe.sql.out                                       |  22 +++++++++++++--------
sql/core/src/test/resources/sql-tests/results/explain-aqe.sql.out                                    | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------------------------------
sql/core/src/test/resources/sql-tests/results/explain-cbo.sql.out                                    |   8 ++++----
sql/core/src/test/resources/sql-tests/results/explain.sql.out                                        | 100 +++++++++++++++++++++++++++++++++++++++++++++++------------------------------------------------
sql/core/src/test/resources/sql-tests/results/join-lateral.sql.out                                   |   2 +-
sql/core/src/test/resources/sql-tests/results/postgreSQL/create_view.sql.out                         |  40 +++++++++++++++++++++++++-------------
sql/core/src/test/resources/sql-tests/results/postgreSQL/numeric.sql.out                             |   2 +-
sql/core/src/test/resources/sql-tests/results/show-tables.sql.out                                    |  10 ++++++----
sql/core/src/test/resources/sql-tests/results/show-tblproperties.sql.out                             |   2 +-
sql/core/src/test/resources/sql-tests/results/udaf.sql.out                                           |   4 ++--
sql/core/src/test/resources/sql-tests/results/udf/udf-udaf.sql.out                                   |   4 ++--
```

Closes #37021 from ulysses-you/output-catalog-2.

Authored-by: ulysses-you <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
@ulysses-you
Copy link
Contributor Author

close this pr. in favor of #37021

@ulysses-you ulysses-you closed this Jul 4, 2022
@ulysses-you ulysses-you deleted the output-catalog branch July 4, 2022 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants