Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iceberg is not a valid Spark SQL Data Source #1756

Closed
chaiyuan2046 opened this issue Nov 12, 2020 · 8 comments
Closed

iceberg is not a valid Spark SQL Data Source #1756

chaiyuan2046 opened this issue Nov 12, 2020 · 8 comments
Labels

Comments

@chaiyuan2046
Copy link

spark : spark-3.0.1-bin-hadoop2.7
iceberg : iceberg-spark3-runtime-0.9.1.jar

==============================================
Step one:
bin/spark-sql
--conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg

Step two:
CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success

Step three:
select * from ib_test; --The error blow happens:

Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 96 more

Any ideas for this question?tks

@RussellSpitzer
Copy link
Member

RussellSpitzer commented Nov 12, 2020 via email

@codope
Copy link
Member

codope commented Aug 6, 2021

@RussellSpitzer @chaiyuan2046 I am facing this issue as well with EMR spark (EMR 6.2, Spark 3.0.1). Create succeeds but the insert fails with the same exception. Could you please point the right set of confis? My configuration is as below.

spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1\
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

@RussellSpitzer
Copy link
Member

Are your referring to your tables as local.database.table?

@codope
Copy link
Member

codope commented Aug 6, 2021

Are your referring to your tables as local.database.table?

@RussellSpitzer Thank you so much. Your comment helped! I was actually referring to my tables as local.db.table. I changed that to local.default.table because show current namespace listed default as the database. It worked!

I think this issue can be closed. For completeness, I will mention below the SQLs that worked with the configs I set.

CREATE TABLE local.default.t1 (id bigint, data string) USING iceberg;
INSERT INTO local.default.t1 VALUES (1, 'a'), (2, 'b'), (3, 'c');

After this goto the location (specified in spark.sql.catalog.local.warehouse) on the namenode and the data as well as metadata should be visible under $PWD/warehouse/default/t1.

@shiyuhang0
Copy link

This occurs because of a bad error message in Spark. What it really means is that you attempted to read the table using a datasource v1 read path when the datasource in question is a v2 one. This is occuring because you are using the native spark session catalog rather than the iceberg v2 replacement session catalog. So from here you have a two options I think, You can override the session catalog with the iceberg V2 session catalog You can read the table using the v2 path explicitly, spark.read.table()

On Thu, Nov 12, 2020, 5:47 AM chaiyuan2046 @.***> wrote: spark : spark-3.0.1-bin-hadoop2.7 iceberg : iceberg-spark3-runtime-0.9.1.jar ============================================== Step one: bin/spark-sql --conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg Step two: CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success Step three: select * from ib_test; --The error blow happens: Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256) at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 96 more Any ideas for this question?tks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1756>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADE2YPBNYRXGD3YDBWJ5PDSPPDM7ANCNFSM4TTFYMWQ .

I am new to use spark and I would like to know how to override the session catalog with the iceberg V2 session catalog

@szehon-ho
Copy link
Collaborator

I think this will be done when you follow the steps in the Spark Getting Started Guide: https://iceberg.apache.org/docs/latest/getting-started/

Copy link

github-actions bot commented Mar 1, 2024

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

@github-actions github-actions bot added the stale label Mar 1, 2024
Copy link

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants