-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48413][SQL] ALTER COLUMN with collation #46734
Conversation
@@ -445,7 +446,8 @@ case class AlterTableChangeColumnCommand( | |||
// name(by resolver) and dataType. | |||
private def columnEqual( | |||
field: StructField, other: StructField, resolver: Resolver): Boolean = { | |||
resolver(field.name, other.name) && field.dataType == other.dataType | |||
resolver(field.name, other.name) && | |||
DataType.equalsIgnoreCompatibleCollationAndNullability(field.dataType, other.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment of the command in line 359 is not outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, please check.
* Check if `from` is equal to `to` type except for collations and nullability, which are | ||
* both checked to be compatible so that data of type `from` can be interpreted as of type `to`. | ||
*/ | ||
private[sql] def equalsIgnoreCompatibleCollationAndNullability( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's already give this a name that reflects that is returns true of the it is allowed to evolve the type. This will be true for more cases than collations in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #46734 (comment). Part of the effort was to change the name of this method to equalsIgnoreCompatibleCollation
.
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/DataType.scala
Outdated
Show resolved
Hide resolved
equalsIgnoreCompatibleCollationAndNullability(fromValue, toValue) | ||
|
||
case (StructType(fromFields), StructType(toFields)) => | ||
fromFields.length == toFields.length && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do we use a map to not depend on the order of fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #46734 (comment).
|
||
case (StructType(fromFields), StructType(toFields)) => | ||
fromFields.length == toFields.length && | ||
fromFields.zip(toFields).forall { case (fromField, toField) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do a forAll on fromFields and look up each field by name in toFields. That way we do not depend on the order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #46734 (comment).
case (_: StringType, _: StringType) => true | ||
|
||
case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) => | ||
(tn || !fn) && equalsIgnoreCompatibleCollationAndNullability(fromElement, toElement) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to allow setting the type nullable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the approach, we now only check for possible collation difference; nullability must remain the same. In this way by checking only for possible collation difference we scope down the changes whereby previously we were checking for complete data type equality (only comment changes were allowed in alter column command).
@@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand( | |||
val newDataSchema = table.dataSchema.fields.map { field => | |||
if (field.name == originColumn.name) { | |||
// Create a new column from the origin column with the new comment. | |||
val newField = field.copy(dataType = newColumn.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make it more explicit that we effectively allow the data type to change now but we only allow type changes that differ by collation?
E.g. by splitting columnEqual
into checking that names match on one side and that the type change is supported on the other + add a withNewType
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified in comment, split the columnEqual
to two functions and added withNewType
in StructField
, please check.
@@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand( | |||
val newDataSchema = table.dataSchema.fields.map { field => | |||
if (field.name == originColumn.name) { | |||
// Create a new column from the origin column with the new comment. | |||
val newField = field.copy(dataType = newColumn.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines 359-367: the comment needs to be updated to be clear we now allow changing collation.
Side note: it gives the hive-style syntax for ALTER COLUMN but that's only one of the two syntaxes, see:
https://github.com/apache/spark/blob/master/sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4#L130
The other one is the one (partially) documented and the more capable/preferred one: https://spark.apache.org/docs/3.5.1/sql-ref-syntax-ddl-alter-table.html#parameters-4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment, please check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but I would limit the scope of the withNextType method to make sure it is not used by accident
sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala
Outdated
Show resolved
Hide resolved
We can address this in a followup but we also shouldn't allow alter from string -> collated string if table is partitioned or bucketed by a that column |
This PR is currently blocked by #46758, need to fix to pass the tests. |
I actually think that this is possible: Collations do not affect how we store files (we keep partitioning by UTF8_BINARY), and hence it would be possible to allow this. Am I missing something? |
@@ -432,6 +445,10 @@ case class AlterTableChangeColumnCommand( | |||
}.getOrElse(throw QueryCompilationErrors.cannotFindColumnError(name, schema.fieldNames)) | |||
} | |||
|
|||
// Change the dataType of the column. | |||
private def withNewType(column: StructField, dataType: DataType): StructField = | |||
column.copy(dataType = dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This looks more concise than calling withNewType
, do we really need to create this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was suggested by @johanl-db, please check #46734 (comment).
thanks, merging to master! |
### What changes were proposed in this pull request? Add support for changing collation of a column with `ALTER COLUMN` command. Use existing support for `ALTER COLUMN` with type to enable changing collations of column. Syntax example: ``` ALTER TABLE t1 ALTER COLUMN col TYPE STRING COLLATE UTF8_BINARY_LCASE ``` ### Why are the changes needed? Enable changing collation on column. ### Does this PR introduce _any_ user-facing change? Yes, it adds support for changing collation of column. ### How was this patch tested? Added tests to `DDLSuite` and `DataTypeSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46734 from nikolamand-db/SPARK-48413. Authored-by: Nikola Mandic <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request? Fixing the analysis check error introduced in [this](#46734) PR, where the `ALTER COLUMN` command would fail on tables that contain catalog in their names, ensuring consistent behavior across all table naming conventions. ### Why are the changes needed? Change is needed to enable altering column with all datasources and identifier variations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests to `CollationSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #48582 from jovanm-db/alterColumn. Authored-by: Jovan Markovic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
### What changes were proposed in this pull request? Fixing the analysis check error introduced in [this](apache#46734) PR, where the `ALTER COLUMN` command would fail on tables that contain catalog in their names, ensuring consistent behavior across all table naming conventions. ### Why are the changes needed? Change is needed to enable altering column with all datasources and identifier variations. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Added tests to `CollationSuite`. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#48582 from jovanm-db/alterColumn. Authored-by: Jovan Markovic <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
Add support for changing collation of a column with
ALTER COLUMN
command. Use existing support forALTER COLUMN
with type to enable changing collations of column. Syntax example:Why are the changes needed?
Enable changing collation on column.
Does this PR introduce any user-facing change?
Yes, it adds support for changing collation of column.
How was this patch tested?
Added tests to
DDLSuite
andDataTypeSuite
.Was this patch authored or co-authored using generative AI tooling?
No.