-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iceberg is not a valid Spark SQL Data Source #1756
Comments
This occurs because of a bad error message in Spark. What it really means
is that you attempted to read the table using a datasource v1 read path
when the datasource in question is a v2 one. This is occuring because you
are using the native spark session catalog rather than the iceberg v2
replacement session catalog.
So from here you have a two options I think,
You can override the session catalog with the iceberg V2 session catalog
You can read the table using the v2 path explicitly, spark.read.table()
…On Thu, Nov 12, 2020, 5:47 AM chaiyuan2046 ***@***.***> wrote:
spark : spark-3.0.1-bin-hadoop2.7
iceberg : iceberg-spark3-runtime-0.9.1.jar
==============================================
*Step one:*
bin/spark-sql
--conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg
*Step two:*
CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success
*Step three:*
select * from ib_test; --The error blow happens:
Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid
Spark SQL Data Source.;
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421)
at
org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256)
at
org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at
org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at
org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 96 more
Any ideas for this question?tks
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1756>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADE2YPBNYRXGD3YDBWJ5PDSPPDM7ANCNFSM4TTFYMWQ>
.
|
@RussellSpitzer @chaiyuan2046 I am facing this issue as well with EMR spark (EMR 6.2, Spark 3.0.1). Create succeeds but the insert fails with the same exception. Could you please point the right set of confis? My configuration is as below.
|
Are your referring to your tables as local.database.table? |
@RussellSpitzer Thank you so much. Your comment helped! I was actually referring to my tables as I think this issue can be closed. For completeness, I will mention below the SQLs that worked with the configs I set.
After this goto the location (specified in |
I am new to use spark and I would like to know how to |
I think this will be done when you follow the steps in the Spark Getting Started Guide: https://iceberg.apache.org/docs/latest/getting-started/ |
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible. |
This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale' |
spark : spark-3.0.1-bin-hadoop2.7
iceberg : iceberg-spark3-runtime-0.9.1.jar
==============================================
Step one:
bin/spark-sql
--conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg
Step two:
CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success
Step three:
select * from ib_test; --The error blow happens:
Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 96 more
Any ideas for this question?tks
The text was updated successfully, but these errors were encountered: