-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48413][SQL] ALTER COLUMN with collation #46734
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -408,6 +408,37 @@ object DataType { | |
} | ||
} | ||
|
||
/** | ||
* Check if `from` is equal to `to` type except for collations and nullability, which are | ||
* both checked to be compatible so that data of type `from` can be interpreted as of type `to`. | ||
*/ | ||
private[sql] def equalsIgnoreCompatibleCollationAndNullability( | ||
from: DataType, | ||
to: DataType): Boolean = { | ||
(from, to) match { | ||
case (_: StringType, _: StringType) => true | ||
|
||
case (ArrayType(fromElement, fn), ArrayType(toElement, tn)) => | ||
nikolamand-db marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(tn || !fn) && equalsIgnoreCompatibleCollationAndNullability(fromElement, toElement) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we want to allow setting the type nullable? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Changed the approach, we now only check for possible collation difference; nullability must remain the same. In this way by checking only for possible collation difference we scope down the changes whereby previously we were checking for complete data type equality (only comment changes were allowed in alter column command). |
||
|
||
case (MapType(fromKey, fromValue, fn), MapType(toKey, toValue, tn)) => | ||
nikolamand-db marked this conversation as resolved.
Show resolved
Hide resolved
|
||
(tn || !fn) && | ||
// Map keys cannot change collation. | ||
equalsIgnoreCompatibleNullability(fromKey, toKey) && | ||
equalsIgnoreCompatibleCollationAndNullability(fromValue, toValue) | ||
|
||
case (StructType(fromFields), StructType(toFields)) => | ||
fromFields.length == toFields.length && | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. or do we use a map to not depend on the order of fields? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please check #46734 (comment). |
||
fromFields.zip(toFields).forall { case (fromField, toField) => | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather do a forAll on fromFields and look up each field by name in toFields. That way we do not depend on the order There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please check #46734 (comment). |
||
fromField.name == toField.name && | ||
(toField.nullable || !fromField.nullable) && | ||
equalsIgnoreCompatibleCollationAndNullability(fromField.dataType, toField.dataType) | ||
} | ||
|
||
case (fromDataType, toDataType) => fromDataType == toDataType | ||
} | ||
} | ||
|
||
/** | ||
* Returns true if the two data types share the same "shape", i.e. the types | ||
* are the same, but the field names don't need to be the same. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -396,8 +396,9 @@ case class AlterTableChangeColumnCommand( | |
val newDataSchema = table.dataSchema.fields.map { field => | ||
if (field.name == originColumn.name) { | ||
// Create a new column from the origin column with the new comment. | ||
val newField = field.copy(dataType = newColumn.dataType) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make it more explicit that we effectively allow the data type to change now but we only allow type changes that differ by collation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarified in comment, split the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lines 359-367: the comment needs to be updated to be clear we now allow changing collation. Side note: it gives the hive-style syntax for ALTER COLUMN but that's only one of the two syntaxes, see: The other one is the one (partially) documented and the more capable/preferred one: https://spark.apache.org/docs/3.5.1/sql-ref-syntax-ddl-alter-table.html#parameters-4 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the comment, please check. |
||
val withNewComment: StructField = | ||
addComment(field, newColumn.getComment()) | ||
addComment(newField, newColumn.getComment()) | ||
// Create a new column from the origin column with the new current default value. | ||
if (newColumn.getCurrentDefaultValue().isDefined) { | ||
if (newColumn.getCurrentDefaultValue().get.nonEmpty) { | ||
|
@@ -445,7 +446,8 @@ case class AlterTableChangeColumnCommand( | |
// name(by resolver) and dataType. | ||
private def columnEqual( | ||
field: StructField, other: StructField, resolver: Resolver): Boolean = { | ||
resolver(field.name, other.name) && field.dataType == other.dataType | ||
resolver(field.name, other.name) && | ||
DataType.equalsIgnoreCompatibleCollationAndNullability(field.dataType, other.dataType) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The comment of the command in line 359 is not outdated There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fixed, please check. |
||
} | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's already give this a name that reflects that is returns true of the it is allowed to evolve the type. This will be true for more cases than collations in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check #46734 (comment). Part of the effort was to change the name of this method to
equalsIgnoreCompatibleCollation
.