Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More than 5 concurrent inserts fail via Iceberg connector #21251

Closed
pratyakshsharma opened this issue Oct 26, 2023 · 2 comments
Closed

More than 5 concurrent inserts fail via Iceberg connector #21251

pratyakshsharma opened this issue Oct 26, 2023 · 2 comments
Labels
bug iceberg Apache Iceberg related

Comments

@pratyakshsharma
Copy link
Contributor

pratyakshsharma commented Oct 26, 2023

#16983 aimed to fix concurrent inserts via iceberg connector in case of hive metastore. However the number of attempts to insert is controlled by the table property commit.retry.num-retries which is by default set to 4. To fix concurrent inserts to iceberg tables, there are 2 scenarios -

  1. New table creation: This table property can be added with a custom value at the time of creating a new table. Fix concurrent insertions for Iceberg tables #21250 aims to fix this
  2. Existing tables: ALTER TABLE SET TBLPROPERTIES is not supported via presto currently similar to other engines like spark - https://iceberg.apache.org/docs/1.4.0/spark-ddl/#alter-table--set-tblproperties

Your Environment

  • Presto version used: 0.284
  • Storage (HDFS/S3/GCS..): S3/HDFS
  • Data source and connector used: Iceberg
  • Deployment (Cloud or On-prem): Local
  • Pastebin link to the complete debug logs:

Expected Behavior

Any number of concurrent inserts should be able to go through without any errors.

Current Behavior

Below exception is thrown for more than 5 concurrent insert statements:

org.apache.iceberg.exceptions.CommitFailedException: 111: Metadata location [hdfs://localhost:9000/user/hive/warehouse/iceberg_table1/metadata/00072-57523571-bc74-43f8-a677-aa2fceeb317e.metadata.json] is not same as table metadata location [hdfs://localhost:9000/user/hive/warehouse/iceberg_table1/metadata/00073-b9560ca4-b815-4de3-b4e8-bc9d6d7bb00d.metadata.json] for default.iceberg_table1
	at com.facebook.presto.iceberg.HiveTableOperations.commit(HiveTableOperations.java:275)
	at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$5(BaseTransaction.java:422)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:418)
	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:302)
	at com.facebook.presto.iceberg.IcebergAbstractMetadata.finishInsert(IcebergAbstractMetadata.java:270)
	at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:452)
	at com.facebook.presto.metadata.MetadataManager.finishInsert(MetadataManager.java:858)
	at com.facebook.presto.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$3(LocalExecutionPlanner.java:3392)
	at com.facebook.presto.operator.TableFinishOperator.getOutput(TableFinishOperator.java:289)
	at com.facebook.presto.operator.Driver.processInternal(Driver.java:428)
	at com.facebook.presto.operator.Driver.lambda$processFor$9(Driver.java:311)
	at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:732)
	at com.facebook.presto.operator.Driver.processFor(Driver.java:304)
	at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1079)
	at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:165)
	at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:603)
	at com.facebook.presto.$gen.Presto_null__testversion____20231010_093550_1.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

Possible Solution

Already explained above

Steps to Reproduce

Setup JMeter to connect to local presto server and try running some insert statement with more than 5 threads.

Context

Real world pipelines can have more than 5 concurrent inserts as well at a given point of time.

@elbinpallimalilibm
Copy link
Contributor

@tdcmeehan Can you point to the PR which fixed this defect?

@pratyakshsharma
Copy link
Contributor Author

This is not fixed yet. You can track this PR - #21250 @elbinpallimalilibm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug iceberg Apache Iceberg related
Projects
Archived in project
Status: Done
Development

No branches or pull requests

3 participants