-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22002][SQL] Read JDBC table use custom schema support specify partial fields. #19231
Conversation
docs/sql-programming-guide.md
Outdated
@@ -1333,7 +1333,7 @@ the following case-insensitive options: | |||
<tr> | |||
<td><code>customSchema</code></td> | |||
<td> | |||
The custom schema to use for reading data from JDBC connectors. For example, "id DECIMAL(38, 0), name STRING"). The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading. | |||
The custom schema to use for reading data from JDBC connectors. For example, <code>"id DECIMAL(38, 0), name STRING"</code>. You can also specify partial fields, others use default values. For example, <code>"id DECIMAL(38, 0)"</code>. The column names should be identical to the corresponding column names of JDBC table. Users can specify the corresponding data types of Spark SQL instead of using the defaults. This option applies only to reading. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
others
-> and the others use the default type mapping
@@ -993,7 +996,10 @@ class JDBCSuite extends SparkFunSuite | |||
Seq(StructField("NAME", StringType, true), StructField("THEID", IntegerType, true))) | |||
val df = sql("select * from people_view") | |||
assert(df.schema.size === 2) | |||
assert(df.schema === schema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert it back.
Change the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to
fields(i) = StructField(columnName, columnType, nullable)
You also need to update some test cases due to the above change, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't change it, because scale
is used to infer column types: https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32
Test build #81758 has finished for PR 19231 at commit
|
// This is resolved by names, use the custom filed dataType to replace the default dateType. | ||
val newSchema = tableSchema.map { col => | ||
userSchema.find(f => nameEquality(f.name, col.name)) match { | ||
case Some(c) => col.copy(dataType = c.dataType, metadata = Metadata.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset metadata to empty, otherwise it is not equal to the schema generated by CatalystSqlParser.parseTableSchema
.
Anyway, the type is fixed and don't need it to infer column types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not changing the following line https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala#L309
to
fields(i) = StructField(columnName, columnType, nullable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because scale
is used to infer column types, we shouldn't remove it: https://github.com/apache/spark/blob/v2.2.0/sql/core/src/main/scala/org/apache/spark/sql/jdbc/OracleDialect.scala#L32
->
Calling |
Test build #81790 has finished for PR 19231 at commit
|
Test build #81798 has finished for PR 19231 at commit
|
Test build #81800 has finished for PR 19231 at commit
|
// This is resolved by names, use the custom filed dataType to replace the default dataType. | ||
val newSchema = tableSchema.map { col => | ||
userSchema.find(f => nameEquality(f.name, col.name)) match { | ||
case Some(c) => col.copy(dataType = c.dataType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should keep the original nullability.
LGTM |
Thanks! Merged to master. |
…l schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of apache#19231 which modified the behavior to remove metadata from JDBC table schema. This pr adds a test to check if the schema doesn't have metadata. ## How was this patch tested? Added a test and existing tests. Author: Takuya UESHIN <[email protected]> Closes apache#20585 from ueshin/issues/SPARK-22002/fup1.
…l schema doesn't have metadata. ## What changes were proposed in this pull request? This is a follow-up pr of #19231 which modified the behavior to remove metadata from JDBC table schema. This pr adds a test to check if the schema doesn't have metadata. ## How was this patch tested? Added a test and existing tests. Author: Takuya UESHIN <[email protected]> Closes #20585 from ueshin/issues/SPARK-22002/fup1. (cherry picked from commit 0c66fe4) Signed-off-by: gatorsmile <[email protected]>
What changes were proposed in this pull request?
#18266 add a new feature to support read JDBC table use custom schema, but we must specify all the fields. For simplicity, this PR support specify partial fields.
How was this patch tested?
unit tests