Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.lang.IllegalArgumentException: Unsupported JDBC protocol: 'postgresql' #126

Closed
hafizmujadid opened this issue Nov 18, 2015 · 3 comments
Assignees
Labels
Milestone

Comments

@hafizmujadid
Copy link

How to use postgresql jdbc driver with spark-redshift?
following code give me exception
java.lang.IllegalArgumentException: Unsupported JDBC protocol: 'postgresql'

val df1: DataFrame = sqlContext.read
            .format("com.databricks.spark.redshift")
            .option("url", "jdbc:postgresql://host:5439/db?user=test&password=test")
            .option("dbtable", "wdata")
            .option("tempdir", "s3n://accessKEy:SecretKEy@redshift/dir/")
            .load()
        df1.show()

Full stacktrace is as follow

Exception in thread "main" java.lang.IllegalArgumentException: Unsupported JDBC protocol: 'postgresql'
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$getDriverClass$2.apply(RedshiftJDBCWrapper.scala:68)
    at com.databricks.spark.redshift.JDBCWrapper$$anonfun$getDriverClass$2.apply(RedshiftJDBCWrapper.scala:52)
    at scala.Option.getOrElse(Option.scala:120)
    at com.databricks.spark.redshift.JDBCWrapper.getDriverClass(RedshiftJDBCWrapper.scala:51)
    at com.databricks.spark.redshift.JDBCWrapper.getConnector(RedshiftJDBCWrapper.scala:138)
    at com.databricks.spark.redshift.RedshiftRelation$$anonfun$schema$1.apply(RedshiftRelation.scala:59)
    at com.databricks.spark.redshift.RedshiftRelation$$anonfun$schema$1.apply(RedshiftRelation.scala:56)
    at scala.Option.getOrElse(Option.scala:120)
    at com.databricks.spark.redshift.RedshiftRelation.schema$lzycompute(RedshiftRelation.scala:56)
    at com.databricks.spark.redshift.RedshiftRelation.schema(RedshiftRelation.scala:55)
    at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:31)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:120)
@JoshRosen JoshRosen added the bug label Nov 18, 2015
@JoshRosen JoshRosen self-assigned this Nov 18, 2015
@JoshRosen
Copy link
Contributor

This is both a bug and documentation issue introduced by https://github.com/databricks/spark-redshift/pull/90/files#diff-69806564231efb590460b162532ba683R44, which expects the JDBC subprotocol to be postgres instead of the more-standard postgresql.

As a user, there are two ways that you can work around this:

  1. Change the subprotocol in your URI, e.g. use "jdbc:postgres://host:5439/db?user=test&password=test"
  2. Leave the URI unchanged and use the jdbcdriver configuration option to explicitly specify that the PostgreSQL driver should be used: .option("jdbcdriver", "org.postgresql.Driver").

Since this behavior is confusing, I should probably fix it and add regression tests, even though Postgres driver support is a lower-priority feature. I'll look into writing a patch for this when I have some time. When I do patch this, I'll make sure that the non-standard postgres URIs continue to work so as not to break backwards compatibility.

@JoshRosen JoshRosen added this to the 0.5.3 milestone Nov 18, 2015
@hafizmujadid
Copy link
Author

thank you so much josh :)

On Wed, Nov 18, 2015 at 8:54 PM, Josh Rosen [email protected]
wrote:

This is both a bug and documentation issue introduced by
https://github.com/databricks/spark-redshift/pull/90/files#diff-69806564231efb590460b162532ba683R44,
which expects the JDBC subprotocol to be postgres instead of the
more-standard postgresql.

As a user, there are two ways that you can work around this:

  1. Change the subprotocol in your URI, e.g. use
    "jdbc:postgres://host:5439/db?user=test&password=test"
  2. Leave the URI unchanged and use the jdbcdriver configuration option
    to explicitly specify that the PostgreSQL driver should be used: .option("jdbcdriver",
    "org.postgresql.Driver").

Since this behavior is confusing, I should probably fix it and add
regression tests, even though Postgres driver support is a lower-priority
feature. I'll look into writing a patch for this when I have some time.
When I do patch this, I'll make sure that the non-standard postgres URIs
continue to work so as not to break backwards compatibility.


Reply to this email directly or view it on GitHub
#126 (comment)
.

Regards: HAFIZ MUJADID

@JoshRosen
Copy link
Contributor

Digging into this in a little bit more detail, it looks like there's no need to have a backwards-compatibility branch for the incorrect postgres subprotocol since the official Postgres driver won't understand that prefix anyways. I'm about to submit a patch to fix this and clean up some slight code comment and documentation inaccuracies related to JDBC driver precedence.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants