iceberg is not a valid Spark SQL Data Source #1756

chaiyuan2046 · 2020-11-12T11:47:12Z

spark : spark-3.0.1-bin-hadoop2.7
iceberg : iceberg-spark3-runtime-0.9.1.jar

==============================================
Step one:
bin/spark-sql
--conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg

Step two:
CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success

Step three:
select * from ib_test; --The error blow happens：

Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.;
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 96 more

Any ideas for this question？tks

RussellSpitzer · 2020-11-12T13:20:35Z

This occurs because of a bad error message in Spark. What it really means is that you attempted to read the table using a datasource v1 read path when the datasource in question is a v2 one. This is occuring because you are using the native spark session catalog rather than the iceberg v2 replacement session catalog. So from here you have a two options I think, You can override the session catalog with the iceberg V2 session catalog You can read the table using the v2 path explicitly, spark.read.table()

…

On Thu, Nov 12, 2020, 5:47 AM chaiyuan2046 ***@***.***> wrote: spark : spark-3.0.1-bin-hadoop2.7 iceberg : iceberg-spark3-runtime-0.9.1.jar ============================================== *Step one:* bin/spark-sql --conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg *Step two:* CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success *Step three:* select * from ib_test; --The error blow happens： Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256) at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 96 more Any ideas for this question？tks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1756>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADE2YPBNYRXGD3YDBWJ5PDSPPDM7ANCNFSM4TTFYMWQ> .

codope · 2021-08-06T10:13:58Z

@RussellSpitzer @chaiyuan2046 I am facing this issue as well with EMR spark (EMR 6.2, Spark 3.0.1). Create succeeds but the insert fails with the same exception. Could you please point the right set of confis? My configuration is as below.

spark-sql --packages org.apache.iceberg:iceberg-spark3-runtime:0.11.1\
    --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
    --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \
    --conf spark.sql.catalog.local.type=hadoop \
    --conf spark.sql.catalog.local.warehouse=$PWD/warehouse

RussellSpitzer · 2021-08-06T11:52:00Z

Are your referring to your tables as local.database.table?

codope · 2021-08-06T12:16:48Z

Are your referring to your tables as local.database.table?

@RussellSpitzer Thank you so much. Your comment helped! I was actually referring to my tables as local.db.table. I changed that to local.default.table because show current namespace listed default as the database. It worked!

I think this issue can be closed. For completeness, I will mention below the SQLs that worked with the configs I set.

CREATE TABLE local.default.t1 (id bigint, data string) USING iceberg;
INSERT INTO local.default.t1 VALUES (1, 'a'), (2, 'b'), (3, 'c');

After this goto the location (specified in spark.sql.catalog.local.warehouse) on the namenode and the data as well as metadata should be visible under $PWD/warehouse/default/t1.

shiyuhang0 · 2022-03-01T17:10:24Z

This occurs because of a bad error message in Spark. What it really means is that you attempted to read the table using a datasource v1 read path when the datasource in question is a v2 one. This is occuring because you are using the native spark session catalog rather than the iceberg v2 replacement session catalog. So from here you have a two options I think, You can override the session catalog with the iceberg V2 session catalog You can read the table using the v2 path explicitly, spark.read.table()
…
On Thu, Nov 12, 2020, 5:47 AM chaiyuan2046 @.***> wrote: spark : spark-3.0.1-bin-hadoop2.7 iceberg : iceberg-spark3-runtime-0.9.1.jar ============================================== Step one: bin/spark-sql --conf spark.sql.warehouse.dir=hdfs://xxxx:8020/user/iceberg Step two: CREATE TABLE ib_test(id bigint, data string) USING iceberg; --success Step three: select * from ib_test; --The error blow happens： Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.; at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:421) at org.apache.spark.sql.execution.datasources.FindDataSourceTable.$anonfun$readDataSourceTable$1(DataSourceStrategy.scala:256) at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792) at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) ... 96 more Any ideas for this question？tks — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1756>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AADE2YPBNYRXGD3YDBWJ5PDSPPDM7ANCNFSM4TTFYMWQ .

I am new to use spark and I would like to know how to override the session catalog with the iceberg V2 session catalog

szehon-ho · 2022-03-01T18:47:43Z

I think this will be done when you follow the steps in the Spark Getting Started Guide: https://iceberg.apache.org/docs/latest/getting-started/

github-actions · 2024-03-01T00:13:11Z

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions · 2024-03-16T00:10:54Z

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'

github-actions bot added the stale label Mar 1, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Mar 16, 2024

nbenezri mentioned this issue Dec 30, 2024

iceberg is not a valid Spark SQL Data Source aws-samples/dbt-glue#492

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iceberg is not a valid Spark SQL Data Source #1756

iceberg is not a valid Spark SQL Data Source #1756

chaiyuan2046 commented Nov 12, 2020

RussellSpitzer commented Nov 12, 2020 via email

codope commented Aug 6, 2021

RussellSpitzer commented Aug 6, 2021

codope commented Aug 6, 2021

shiyuhang0 commented Mar 1, 2022

szehon-ho commented Mar 1, 2022

github-actions bot commented Mar 1, 2024

github-actions bot commented Mar 16, 2024

iceberg is not a valid Spark SQL Data Source #1756

iceberg is not a valid Spark SQL Data Source #1756

Comments

chaiyuan2046 commented Nov 12, 2020

RussellSpitzer commented Nov 12, 2020 via email

codope commented Aug 6, 2021

RussellSpitzer commented Aug 6, 2021

codope commented Aug 6, 2021

shiyuhang0 commented Mar 1, 2022

szehon-ho commented Mar 1, 2022

github-actions bot commented Mar 1, 2024

github-actions bot commented Mar 16, 2024