You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've run in Kubernetes, as Iceberg, and as Hive table inserts into AWS glue. I'm unsure if this is an AWS Glue problem or a Trino issue. It happens when I run an insert script, at the same time and with concurrency.
Start two of the same insert scripts simultaneously, and they will somehow collide causing this error.
Run using concurrency with more than 1 worker, and the error occurs.
The error is different with Iceberg tables, but it's pretty much the same. There is some blocking that is coming from AWS, but the error codes are generic and difficult to find a resolution for.
Question: Is this a limitation from AWS glue, or a problem with Trino handing multiple connections with GLUE?
MY CODE:
fromtrino.dbapiimportconnectimportconcurrent.futuresfrommultiprocessingimportfreeze_supportconn=connect(
host="localhost",
port=8080,
user="trino",
catalog="iceberg",
schema="order_data",
)
#create table called iceberg using the iceberg connectorcur=conn.cursor()
defdoWork(i):
print(i)
try:
cur.execute("INSERT INTO order_test3 (order_id, order_date, order_customer_id, order_status) VALUES ('"+str(i)+"', '2021-01-01', '"+str(i)+"', 'COMPLETE')")
# print responseresponse=cur.fetchall()
# closeprint(response)
print(i)
exceptExceptionase:
print(i)
print("Error")
print(e)
# print detailed errorprint(e.args[0])
defmain ():
withconcurrent.futures.ProcessPoolExecutor(max_workers=10) asexecutor:
foriinrange(1500,1510):
print(i)
executor.submit(doWork, i)
if__name__=='__main__':
# enable support for multiprocessing for main freeze_support()
main()
THE ERROR:
TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed to commit to Glue table: order_data.order_test3", query_id=20230328_033606_00016_fe4cq)
This error should be classified better. It should be translated to TrinoException with error code TRANSACTION_CONFLICT. However, the root cause is the concurrency model used by Iceberg.
The table can only be updated by one writer at once. When there are multiple writers, Glue will reject the update for all but the first writer due to the table version being outdated. This is what saves you from losing updates. Iceberg will retry up to 4 times, but with a concurrency of 10, it's likely that one of your workers is always "unlucky" and fails each retry attempt.
Iceberg is not designed for doing many small inserts from independent writers. Some options:
Coordinate among the workers so that only one is inserting at once.
Stage all the data somewhere, then insert it at once in one batch.
Write the data to a queue such as Kafka or Kinesis, then load it from there.
I've run in Kubernetes, as Iceberg, and as Hive table inserts into AWS glue. I'm unsure if this is an AWS Glue problem or a Trino issue. It happens when I run an insert script, at the same time and with concurrency.
MY CODE:
THE ERROR:
TrinoQueryError(type=INTERNAL_ERROR, name=GENERIC_INTERNAL_ERROR, message="Failed to commit to Glue table: order_data.order_test3", query_id=20230328_033606_00016_fe4cq)
The text was updated successfully, but these errors were encountered: