You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While running a simple model that select from parquet table create iceberg table I get: Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.
{{ config(
materialized='incremental',
incremental_strategy='append',
unique_key=['name'],
file_format='iceberg',
iceberg_expire_snapshots='False',
table_properties={'write.target-file-size-bytes': '268435456'}
) }}
WITH
parquet_tracks AS (
SELECT DISTINCT
id,
name
FROM iceberg_dev.stg_tracks
LIMIT 1
)
SELECT *
FROM parquet_tracks
Expected behavior
dbt run works and create the iceberg table with the data.
Screenshots and log output
(glue-dbt) glue_dbt % dbt run
07:30:35 Running with dbt=1.9.1
07:30:35 Registered adapter: glue=1.9.0
07:30:35 Unable to do partial parsing because profile has changed
07:30:36 Found 1 model, 5 data tests, 515 macros
07:30:36
07:30:36 Concurrency: 50 threads (target='dev')
07:30:36
07:30:37 1 of 1 START sql incremental model iceberg_dev.dbt_tracks .............. [RUN]
07:30:38 Glue adapter: Parameter validation failed:
Invalid type for parameter Name, value: , type: <class 'jinja2.runtime.Undefined'>, valid types: <class 'str'>
07:32:39 Glue adapter: Glue returned `error` for statement None for code SqlWrapper2.execute('''/* {"app": "dbt", "dbt_version": "1.9.1", "profile_name": "glue_dbt", "target_name": "dev", "node_id": "model.glue_dbt.dbt_tracks"} */
insert into table iceberg_dev.dbt_tracks
select id, name from dbt_tracks_tmp
''', use_arrow=False, location='s3://<bucket>/lake/test_schema'), Py4JJavaError: An error occurred while calling o241.sql.
: java.util.concurrent.ExecutionException: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.
at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at org.sparkproject.guava.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at org.sparkproject.guava.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at org.sparkproject.guava.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at org.sparkproject.guava.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache.get(LocalCache.java:4789)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getCachedPlan(SessionCatalog.scala:212)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.org$apache$spark$sql$execution$datasources$FindDataSourceTable$$readDataSourceTable(DataSourceStrategy.scala:277)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:332)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anonfun$apply$2.applyOrElse(DataSourceStrategy.scala:329)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$2(AnalysisHelper.scala:199)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:77)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsDownWithPruning$1(AnalysisHelper.scala:199)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:353)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning(AnalysisHelper.scala:197)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsDownWithPruning$(AnalysisHelper.scala:193)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsDownWithPruning(LogicalPlan.scala:34)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning(AnalysisHelper.scala:128)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsWithPruning$(AnalysisHelper.scala:125)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsWithPruning(LogicalPlan.scala:34)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators(AnalysisHelper.scala:85)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperators$(AnalysisHelper.scala:84)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:34)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:329)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable.apply(DataSourceStrategy.scala:249)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:239)
at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
at scala.collection.immutable.List.foldLeft(List.scala:91)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeBatch$1(RuleExecutor.scala:236)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$6(RuleExecutor.scala:319)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.catalyst.rules.RuleExecutor$RuleExecutionContext$.withContext(RuleExecutor.scala:368)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5(RuleExecutor.scala:319)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$5$adapted(RuleExecutor.scala:309)
at scala.collection.immutable.List.foreach(List.scala:431)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:309)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:195)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:191)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeSameContext(Analyzer.scala:303)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$2(Analyzer.scala:299)
at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:216)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:299)
at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:245)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:182)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:108)
at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:182)
at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:270)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:360)
at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:269)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:93)
at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:219)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:277)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:711)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:277)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:276)
at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:93)
at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:90)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:82)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:102)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:639)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:901)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:630)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:660)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:569)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.
at org.apache.spark.sql.errors.QueryCompilationErrors$.invalidDataSourceError(QueryCompilationErrors.scala:1542)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:436)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:345)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:268)
at org.apache.spark.sql.execution.datasources.FindDataSourceTable$$anon$1.call(DataSourceStrategy.scala:255)
at org.sparkproject.guava.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4792)
at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
... 76 more
07:32:39 1 of 1 ERROR creating sql incremental model iceberg_dev.dbt_tracks ..... [ERROR in 121.93s]
07:32:40
07:32:40 Finished running 1 incremental model in 0 hours 2 minutes and 4.19 seconds (124.19s).
07:32:40
07:32:40 Completed with 1 error, 0 partial successes, and 0 warnings:
07:32:40
07:32:40 Database Error in model dbt_tracks (models/glue/dbt_tracks.sql)
System information
The output of dbt --version:
Core:
- installed: 1.9.1
- latest: 1.9.1 - Up to date!
Plugins:
- glue: 1.9.0 - Up to date!
- redshift: 1.9.0 - Up to date!
- postgres: 1.9.0 - Up to date!
- spark: 1.9.0 - Up to date!```
Describe the bug
While running a simple model that select from parquet table create iceberg table I get:
Caused by: org.apache.spark.sql.AnalysisException: iceberg is not a valid Spark SQL Data Source.
I tried to follow #405 and apache/iceberg#1756 but without luck.
Steps To Reproduce
profile.yml
dbt_project.yml
model/glue/dbt_tracks.sql
Expected behavior
dbt run works and create the iceberg table with the data.
Screenshots and log output
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:Python 3.13.1
Additional context
I tried a few variations of profile.yml. including:
glue 4/5
with/without:
extra_jars
spark.sql.catalog.glue_catalog
spark.sql.defaultCatalog=glue_catalog
datalake_formats
parquet table is an external table based on json:
The text was updated successfully, but these errors were encountered: