[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax #10380

CK50 · 2015-12-18T14:24:15Z

In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()):

INSERT INTO $table VALUES ( ?, ?, ..., ? )

But some technologies require a list of column names:

INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? )

This was blocking the use of e.g. the Progress JDBC Driver for Cassandra.

Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc().

If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types.

This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names.

srowen · 2015-12-18T15:04:17Z

Jenkins, add to whitelist

SparkQA · 2015-12-18T16:36:53Z

Test build #48007 has finished for PR 10380 at commit 0772db3.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-18T16:38:24Z

Test build #2232 has finished for PR 10380 at commit 0772db3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

hvanhovell · 2015-12-18T17:20:56Z

@CK50 does the change in syntax work on all dialects currently supported by Spark SQL?

hvanhovell · 2015-12-18T17:24:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

-      fieldsLeft = fieldsLeft - 1
-    }
-    conn.prepareStatement(sql.toString())
+  def insertStatement(conn: Connection,


Nit: This method is only used by the savePartition we could integrate it, since most of the code was moved to JdbcDialects.getInsertStatement.

@hvanhovell
It works fine on Oracle and on Cassandra (Progress JDBC driver for Cassandra).
Commenting on other RDBMS: I was surprised that the column-free syntax is supported on so many databases. From all my work on different RDBMS, the syntax with column-names is much more the standard. - But I have not tested on other than Oracle and Cassandra.

@hvanhovell
re integrating JdbcDialects.getInsertStatement: I can certainly do so, if desired.

@CK50 I don't have a strong opinion on this. I'd personally remove it, that's all.

hvanhovell · 2015-12-18T18:00:02Z

@CK50 I did some quick googling, all of the dialects support this syntax (it might have been a dumb question on my part). LGTM.

SparkQA · 2015-12-18T21:13:55Z

Test build #48017 has finished for PR 10380 at commit a59c1aa.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-12-20T09:17:49Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

@@ -60,20 +60,6 @@ object JdbcUtils extends Logging {
  }

  /**
-   * Returns a PreparedStatement that inserts a row into table via conn.


Hm, the only problem here is that this is a public method, and while it feels like it was intended to be a Spark-only utility method, I'm not sure it's marked as such.

It's not a big deal to retain it and implement in terms of the new method. However it's now a function of a dialect, which is not an argument here. I suppose any dialect will do since they all behave the same now. This method could then be deprecated.

However: yeah, the behavior is actually the same for all dialects now. Really, this has come full circle and can just be a modification to this method, which was already the same for all dialects. Is there reason to believe the insert statement might vary later? Then I could see keeping the current structure here and just deprecating this method.

Yeah, you are right about that. We'll break a public API. Is that a problem, since this is probably going into 2.0?

That's a fair point, though I think the intent was to back-port this to 1.6.x at least, as it's a moderately important fix. Conceptually, I don't think the API has changed; the insert statement is still not dialect-specific. Hence it seems like the current API is even desirable to maintain for now.

Ok, we have keep the method if we are back-porting. I have yet to encouter a database that doesn't support this insert syntax (did a bit more googling); so it seems safe to put in the generic method.

srowen · 2015-12-22T08:55:50Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

+  def insertStatement(conn: Connection,
+                      table: String,
+                      rddSchema: StructType,
+                      dialect: JdbcDialect): PreparedStatement = {


Here you've still changed the API though. I think the point is that it's not actually dialect-specific even after your change, right?

SparkQA · 2015-12-22T08:58:17Z

Test build #48168 has finished for PR 10380 at commit c130e31.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

moved generation of SQL insert statement back into JdbcUtils.scala

SparkQA · 2015-12-22T12:41:33Z

Test build #48189 has finished for PR 10380 at commit 79d3e0d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-12-22T13:27:42Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala

-      fieldsLeft = fieldsLeft - 1
-    }
-    conn.prepareStatement(sql.toString())
+    val sql = rddSchema.fields.map(field => field.name)


Yes, very close now. I think that can be tightened up in a few minor ways, like map(_.name) and map(_ => "?"). There's an extra space inside the paren in line 67, and the VALUES clause inserts extra spaces before between and after ?s. Finally the wrapping is a little funny. Maybe break this down into a val for the column names clause, and a val for the ? placeholders clause, and then return their interpolation into the final single format string INSERT INTO $table $columns VALUES $placeholders or something.

SparkQA · 2015-12-22T13:48:14Z

Test build #48194 has finished for PR 10380 at commit cae0f58.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-12-22T16:27:41Z

Test build #48199 has finished for PR 10380 at commit 5a7f262.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-12-22T16:46:45Z

Yes, nice and neat and addresses the issue directly. This LGTM

srowen · 2015-12-24T13:41:14Z

Merged to master/1.6

…NSERT syntax In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()): INSERT INTO $table VALUES ( ?, ?, ..., ? ) But some technologies require a list of column names: INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? ) This was blocking the use of e.g. the Progress JDBC Driver for Cassandra. Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc(). If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types. This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names. Author: CK50 <[email protected]> Closes #10380 from CK50/master-SPARK-12010-2. (cherry picked from commit 502476e) Signed-off-by: Sean Owen <[email protected]>

CK50 added 2 commits December 18, 2015 06:14

Initial version of suggested simple approach

3211963

comment removed

0772db3

hvanhovell reviewed Dec 18, 2015
View reviewed changes

dissolved insertStatement method, as only used once

a59c1aa

srowen reviewed Dec 20, 2015
View reviewed changes

Reintroduced insertStatement method

c130e31

srowen reviewed Dec 22, 2015
View reviewed changes

Removed dialect parameter from insertStatement method and

79d3e0d

moved generation of SQL insert statement back into JdbcUtils.scala

Removing obsolete method from JdbcDialects.scala

cae0f58

srowen reviewed Dec 22, 2015
View reviewed changes

Final trimming

5a7f262

asfgit closed this in 502476e Dec 24, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax #10380

[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax #10380

CK50 commented Dec 18, 2015

srowen commented Dec 18, 2015

SparkQA commented Dec 18, 2015

SparkQA commented Dec 18, 2015

hvanhovell commented Dec 18, 2015

hvanhovell Dec 18, 2015

CK50 Dec 18, 2015

CK50 Dec 18, 2015

hvanhovell Dec 18, 2015

hvanhovell commented Dec 18, 2015

SparkQA commented Dec 18, 2015

srowen Dec 20, 2015

hvanhovell Dec 20, 2015

srowen Dec 21, 2015

hvanhovell Dec 21, 2015

srowen Dec 22, 2015

SparkQA commented Dec 22, 2015

SparkQA commented Dec 22, 2015

srowen Dec 22, 2015

SparkQA commented Dec 22, 2015

SparkQA commented Dec 22, 2015

srowen commented Dec 22, 2015

srowen commented Dec 24, 2015

[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax #10380

[SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax #10380

Conversation

CK50 commented Dec 18, 2015

srowen commented Dec 18, 2015

SparkQA commented Dec 18, 2015

SparkQA commented Dec 18, 2015

hvanhovell commented Dec 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvanhovell commented Dec 18, 2015

SparkQA commented Dec 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Dec 22, 2015

SparkQA commented Dec 22, 2015

Choose a reason for hiding this comment

SparkQA commented Dec 22, 2015

SparkQA commented Dec 22, 2015

srowen commented Dec 22, 2015

srowen commented Dec 24, 2015