[Iceberg]Support setting warehouse data directory for Hadoop catalog #24397

hantangwangd · 2025-01-18T14:08:18Z

Description

This PR enable Presto Iceberg Hadoop catalog to specify an independent warehouse data directory to store table data files, in this way, we can manage metadata files on HDFS and store data files on Object Stores in a formal production environment.

See issue: #24383

Motivation and Context

Enabling Presto Iceberg to leverage the powerful capabilities of object storages

Impact

Hadoop catalog has the capability of leveraging object stores

Test Plan

Build an object storage environment base on MinIO docker, configure iceberg.catalog.warehouse to a local file path, and iceberg.catalog.hadoop.warehouse.datadir to a s3 path, fully run IcebergDistributedTestBase, IcebergDistributedSmokeTestBase, and TestIcebergDistributedQueries in CI test

Contributor checklist

Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
Documented new properties (with its default value), SQL syntax, functions, or other functionality.
If release notes are required, they follow the release notes guidelines.
Adequate tests were added if applicable.
CI passed.

Release Notes

== RELEASE NOTES ==

Iceberg Connector Changes
* Add table properties ``write.data.path`` to specify independent data write paths for Iceberg tables
* Add connector configuration property ``iceberg.catalog.hadoop.warehouse.datadir`` for Hadoop catalog to specify root data write path for its new created tables

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java

steveburnett

Thanks for the doc! Looks good, only a few nits only of punctuation and capitalization.

presto-docs/src/main/sphinx/connector/iceberg.rst

steveburnett

LGTM! (docs)

Pulled updated branch, new local doc build, looks good. Thanks!

hantangwangd · 2025-01-27T16:55:58Z

Thanks @steveburnett for your suggestion, fixed!

agrawalreetika

Thanks for the change. LGTM

presto-iceberg/pom.xml

imjalpreet

@hantangwangd Thanks for the PR! I took a first pass and had some minor comments.

imjalpreet · 2025-01-28T13:17:59Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

@@ -1023,6 +1040,62 @@ public void setTableProperties(ConnectorSession session, ConnectorTableHandle ta
        transaction.commitTransaction();
    }

+    protected Map<String, String> populateTableProperties(ConnectorTableMetadata tableMetadata, com.facebook.presto.iceberg.FileFormat fileFormat, ConnectorSession session, CatalogType catalogType)
+    {
+        ImmutableMap.Builder<String, String> propertiesBuilder = ImmutableMap.builderWithExpectedSize(5);


nit: should we update the builder size here? I think initially when we had 5 properties it was set to 5 but I can see now we have more than 5.

Sure, I have change it to 16, which is a little bigger than current properties count. Since some more table properties will be added in the future.

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

imjalpreet · 2025-01-28T13:27:09Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergConfig.java

+    }
+
+    @Config("iceberg.catalog.warehouse.datadir")
+    @ConfigDescription("Iceberg catalog default root data writing directory")


nit: should we mention that it is only supported for Hadoop catalog in the ConfigDescription?

Does it make sense to add a check to throw an error at the server startup if this config is present in the config file when the catalog type is not Hadoop?

Does it make sense to add a check to throw an error at the server startup if this config is present in the config file when the catalog type is not Hadoop?

I added this check in IcebergCommonModule, but I'm not quite sure if we should throw an error or just ignore this property in non-Hadoop catalogs. Currently, iceberg.catalog.warehouse will be just ignored instead of throw an error. Perhaps we can listen to what other reviewers are thinking @tdcmeehan @ZacBlanco @agrawalreetika @kiersten-stokes?

should we mention that it is only supported for Hadoop catalog in the ConfigDescription?

Sure, added!

imjalpreet · 2025-01-28T13:36:47Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergNativeCatalogFactory.java

@@ -52,6 +52,7 @@ public class IcebergNativeCatalogFactory
    private final String catalogName;
    protected final CatalogType catalogType;
    private final String catalogWarehouse;
+    private final String catalogWarehouseDataDir;


Should we move this to IcebergNativeMetadata since it's only being used in that class and not in this class?

IcebergNativeMetadata already holds this field. I think this is mainly a problem with parameter passing. IcebergNativeMetadata needs to be passed in the value of catalogWarehouseDataDir, so we need to hold catalogWarehouseDataDir in IcebergNativeCatalogFactory or IcebergNativeMetaFactory, so that we can pass the value of catalogWarehouseDataDir to create a new IcebergNativeMetadata when executing IcebergNativeMetadataFactory.create().

So I hold catalogWarehouseDir in IcebergNativeCatalogFactory and pass it to IcebergNativeMetadata through IcebergNativeCatalogFactory, because this is the way I believe the smallest changes will be made. Do you have a better way to pass catalogWarehouseDir into IcebergNativeMetadata?

imjalpreet · 2025-01-28T13:49:05Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedSmokeTestBase.java

@@ -150,6 +152,34 @@ public void testShowCreateTable()
                        ")", schemaName, getLocation(schemaName, "orders")));
    }

+    @Test
+    public void testTableWithSpecifiedWriteDataLocation()


It looks like testTableWithSpecifiedWriteDataLocation and testShowCreateTableWithSpecifiedWriteDataLocation are the same. Can you please check?

I think they look the same mainly because they are both implementations which implies not supported by default in the base test class. And in the relevant subclass of Hadoop catalog, they will be overwritten with different implementation logic.

imjalpreet · 2025-01-28T13:49:49Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedSmokeTestBase.java

+            throws IOException
+    {
+        String dataWriteLocation = Files.createTempDirectory("test_table_with_specified_write_data_location3").toAbsolutePath().toString();
+        assertQueryFails(String.format("create table test_table_with_specified_write_data_location3(a int, b varchar) with (write_data_path = '%s')", dataWriteLocation),


Should we attempt to create a partitioned table for this test?

Sure, done!

ZacBlanco

I just have a few high level comments. At the core, I understand what we're trying to solve, but I'm not sure yet if this is the right solution. What happens when we have a table which already exists at the warehouse directory but has a data directory which doesn't align with the new datadir property? Should we respect the property or the table in the case of an insert? This could be confusing for users.

And if the table already exists and doesn't doesn't have a metadata folder which is in a "safe" filesystem, should we error, or warn the user? Do we even have a way of knowing that a filesystem is "safe" (supports atomic renames)?

Correct me if I'm wrong, but in theory we could have already supported this use case within the iceberg connector by using SetTableProperty procedure and just setting "write.data.path" on the individual tables, right? From my understanding, all this change does is provide a default for new tables and makes it a viewable table property. I'm wondering if it might be better providing this as schema-level property that users can set, similar to how the hive connector has a schema-level "location" property. Then, we can set defaults for the data path on schema creation, but override it on a table-level if we prefer.

However, I don't believe the metadata path could be set that way since the HadoopCatalog relies on listing the warehouse metadata directories to get the schemas and tables.

Just some thoughts for discussion. I think I just want to refine our approach and make sure there isn't any ambiguous behavior from the user perspective.

ZacBlanco · 2025-01-29T18:34:28Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergQueryRunner.java

+        icebergProperties.put("iceberg.file-format", format.name());
+        icebergProperties.putAll(getConnectorProperties(CatalogType.valueOf(catalogType), icebergDataDirectory));
+        icebergProperties.putAll(extraConnectorProperties);
+        queryRunner.createCatalog(ICEBERG_CATALOG, "iceberg", ImmutableMap.copyOf(icebergProperties));


Is there any reason to change this?

Yes, we need a way to let the properties explicitly set in the test class to override the default properties set by the underlying query runner. For example, we need to explicitly specify iceberg.catalog.warehouse in the test class.

presto-docs/src/main/sphinx/connector/iceberg.rst

ZacBlanco · 2025-01-29T18:39:36Z

presto-docs/src/main/sphinx/connector/iceberg.rst

@@ -370,6 +393,9 @@ Property Name                             Description
 ``location``                               Optionally specifies the file system location URI for
                                           the table.

+``write_data_path``                        Optionally specifies the file system location URI for


I would actually prefer if we used the same property name as in iceberg to make it less confusing to users. I know we haven't followed this before but I feel like it makes more sense than what we have now. It allows for more continuity and easier for users to look up the iceberg property reference as well.

Agree, I think this makes sense. Then how do you think we should handle the previous table properties?

We don't have to do it in this PR but here's a solution I think would work for deprecation of the old names:

Inside of IcebergTableProperties create a map of (new property name -> old property name) say, deprecatedProperties.

Then we update all existing table properties with iceberg-equivalents to the iceberg name (if one exists).

Inside of the constructor or getTableProperties method of IcebergTableProperties, we take the deprecatedProperties map and iterate over of the existing properties and if it has an entry in the deprecatedProperties map, add a new hidden property with the old name and same parameters as the new property.

In each of the getXXX methods, we would also need to fall back to checking for the old property name in the map. We would also need a method to add a warning to the session that the table property name is deprecated. We can leave it like that for a few releases and eventually remove the deprecated property names.

This could all be done in a separate PR

It sounds good. I have changed the newly added table property name to write.data.path, which is exactly the same as Iceberg's corresponding table property name. Please take a look when available, thanks!

hantangwangd · 2025-01-30T17:23:54Z

@ZacBlanco Thanks for your feedback, very glad to be able to clarify and refine the approch through discussion. Overall, I think the core point of this change is to separate the data write path and metadata path of Iceberg tables in Hadoop catalog, and provide a simpler and more convenient way for users to specify the data write path of tables. I will reply item by item below. Please let me know if you have any comments.

What happens when we have a table which already exists at the warehouse directory but has a data directory which doesn't align with the new datadir property? Should we respect the property or the table in the case of an insert?

Undoubtedly, when inserting data, the iceberg table's table property write.data.path will be respected. In general, iceberg.catalog.warehouse.datadir is a Hadoop catalog level configuration property used to define the default data write path on table creating. It will not affect the data write path of existing tables.

We can explicitly specify the write data path of a table when creating it, and later change the write data path at any time through the SetTableProperty procedure, so the write data path of a table does not need to be aligned with the warehouse data dir.
We can change the location of the warehouse data dir at any time, which will only affect newly created tables, and will not affect the data write path of existing tables.
When we want to change the data write path of a table, we should explicitly use the SetTableProperty procedure (which needs to support property write.data.location). And the strategy of the Iceberg table itself is also similar, that is, the existing data files will not be affected by the later modified write.data.location.

Do you think the above behaviors make sense?

And if the table already exists and doesn't doesn't have a metadata folder which is in a "safe" filesystem, should we error, or warn the user? Do we even have a way of knowing that a filesystem is "safe" (supports atomic renames)?

I think we could specify the requirements for the file system where the metadata is located in the document. But I don't think Presto engine is responsible for identifying and distinguishing specific file systems and reminding users. This is the user's own responsibility, because they have a better understanding of their own usage scenarios. For example, for a considerable number of users, they do not need to consider transaction consistency issues caused by cross engine interoperability.

in theory we could have already supported this use case within the iceberg connector by using SetTableProperty procedure and just setting "write.data.path" on the individual tables

SetTableProperty procedure and CreateTable can both specify and set table properties, but their usage scenarios are different. I don't think SetTableProperty can replace CreateTable to specify table properties, its main purpose is to alter table properties. And I think users may also not be able to accept the requirement to execute set properties procedure after creating a table in order to specify the data write path. What do you think?

I'm wondering if it might be better providing this as schema-level property that users can set, similar to how the hive connector has a schema-level "location" property. Then, we can set defaults for the data path on schema creation, but override it on a table-level if we prefer.

Firstly, for Hadoop catalog, we cannot arbitrarily set the value of location which also maintains metadata data like hive. Therefore, we should only set a property like data.location which only contains data files. Secondly, I don't think this schema-level concept you mentioned conflicts with the catalog level data dir. Their roles are similar, just at different level. As described above, they all used to define the default data write path on table creating, they will not affect the data write path of existing tables, and can be overridden on a table-level if we prefer.

Therefore, if necessary, we can add this schema-level data dir property in the future. I think it can work well with catalog-level data dir. What's your opinion?

steveburnett · 2025-02-03T14:56:01Z

New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

:pr:`12345`

I have updated the Release Notes Guidelines to remove the examples of manually adding the PR link.

ZacBlanco · 2025-02-03T18:41:42Z

Thanks for your response. I did some more thinking around this and concluded that we don't really have a consistent way of specifying location for each connector. But I think we should strive to do so.

Some examples:

In the hive connector, we allow schemas to set the location property. It can also be set through a external_location table property, otherwise it is controlled by the HMS configuration (external to Presto).
In the iceberg connector when using the Hive catalog, the location is determined by either the HMS configuration (external to presto), or configured by the user specifying a location property.
In the Iceberg REST connector, table location is specified via the catalog-level iceberg.catalog.warehouse property.

My thoughts are that

We should definitely provide the ability to set both the iceberg write.data.location and write.metadata.location through the table properties in the CREATE TABLE statements. I think that we should use the iceberg property names directly rather than doing something like location or write_data_location and write_metadata_location.
I think providing this configuration in Presto's iceberg catalog configuration for the HadoopCatalog inthis PR is probably fine because the difference with the HMS and Hive connector is that it's an external process with its own configuration. With the hadoop catalog, we don't really have that option since it's all filesystem-based.
a. Though this makes me wonder whether the RestCatalog should even utilize iceberg.catalog.warehouse as I might expect RestCatalog implementations to have some kind of similar default data and metadata locations. I looked at Nessie's docs and saw that they have the ability to specify default locations. I think we might want to ensure that those configs are respected over any Presto configurations. We may even want to remove the ability to specify default locations for the Presto iceberg catalog when configured with Rest/Nessie. The Hive connector doesn't really allow it, and I think it wouldn't be clear to users as to which configuration is the correct one to respect.

ZacBlanco

Thanks for the great work - I do have a concern about the additional load on CI the two new test classes might have. Please let me know your thoughts

ZacBlanco · 2025-02-03T19:21:21Z

...eberg/src/test/java/com/facebook/presto/iceberg/hadoop/TestIcebergDistributedOnS3Hadoop.java

I'm a little concerned about the impact of runtime of the CI when adding this test class and the TestIcebergSmokeOnS3Hadoop test class. These classes take quite a bit of time to complete, so I think we should try to be prudent about adding new configurations permutations that run through the whole gamut of tests.

That being said, since the only real difference in the configuration is the warehouse default data dir, and the fact we're running on S3 I think we ought to enable just one version of this test. Either the default which operates directly on the local filesystem or this one. I would actually err towards using this class because it represents something closer to real-world use case while still exercising the Hadoop catalog.

It makes sense to enable just one version of these three test classes. And I agree with you that the new version tests on S3 are more suitable to be enabled, since they still exercising the Hadoop catalog.

I tried to ignore the original test classes, but led to a series of problems in the CI environment. Will make some further attempts. Would be great appreciate if you have any good ideas.

have you tried just putting @Test(enabled = false) at the class-level?

It doesn't seem to work. I previously used @ignore, but it also didn't work.

A feasible way is to declare them as abstract classes, this works well in CI environment, see #24508. Not so sure if there is a better way.

...eberg/src/test/java/com/facebook/presto/iceberg/hadoop/TestIcebergDistributedOnS3Hadoop.java

.../java/com/facebook/presto/iceberg/hadoop/TestIcebergHadoopCatalogOnS3DistributedQueries.java

...eberg/src/test/java/com/facebook/presto/iceberg/hadoop/TestIcebergDistributedOnS3Hadoop.java

presto-iceberg/src/test/java/com/facebook/presto/iceberg/hadoop/TestIcebergSmokeOnS3Hadoop.java

hantangwangd · 2025-02-05T14:57:01Z

@ZacBlanco Thanks for your investigation and perspective. I think a significant portion of the current inconsistency is due to Iceberg connector with Hive catalog. In presto, its configuration is closer to Hive connector than Iceberg lib's Hive catalog, because it is a presto native implementation based on presto-hive-metastore rather than Iceberg lib's Hive catalog.

As far as I can see, in Iceberg lib, warehouse is a commonly used catalog properties, see Iceberg catalog properties. It has a role in all catalogs: for Hadoop/Nessie/Jdbc catalogs, it is a mandatory property to be configured, while in Hive/Rest/InMemory catalogs, it is an optional property. So it seems that in presto, catalog property iceberg.catalog.warehouse (which maps to the catalog property warehouse of Iceberg lib) should also be a common catalog property in Iceberg connector and should not be removed when configured with Rest/Nessie.

As you said, the newly added configuration property is an external process with Iceberg's own configuration, and it should only be used for Hadoop catalog. Therefore, in order to avoid the confusion with property iceberg.catalog.warehouse and causing users to believe that it is also a commonly used catalog property, I think we can make it more clear by doing the following things:

1. Name this newly added configuration property iceberg.catalog.hadoop.warehouse.datadir instead of iceberg.catalog.warehouse.datadir.
1. Add a check to throw an error at the server startup if this config property is present when the catalog type is not Hadoop, as suggested by @imjalpreet .
1. Clearly state in the document that this is a Hadoop catalog specific configuration property.

Do you think this makes sense?

hantangwangd · 2025-02-05T15:08:34Z

New release note guidelines as of last week: PR #24354 automatically adds links to this PR to the release notes. Please remove the manual PR link in the following format from the release note entries for this PR.

Thanks @steveburnett for your message. I have fixed the release note and supplemented some content to the document, please take a look when convenient.

steveburnett

Thanks for the expanded documentation! I made a few minor revision suggestions to use imperative, to emphasize what the user should and should not do.

presto-docs/src/main/sphinx/connector/iceberg.rst

ZacBlanco · 2025-02-05T16:38:22Z

@ZacBlanco Thanks for your investigation and perspective. I think a significant portion of the current inconsistency is due to Iceberg connector with Hive catalog. In presto, its configuration is closer to Hive connector than Iceberg lib's Hive
As you said, the newly added configuration property is an external process with Iceberg's own configuration, and it should only be used for Hadoop catalog. Therefore, in order to avoid the confusion with property iceberg.catalog.warehouse and causing users to believe that it is also a commonly used catalog property, I think we can make it more clear by doing the following things:
* 1. Name this newly added configuration property `iceberg.catalog.hadoop.warehouse.datadir` instead of `iceberg.catalog.warehouse.datadir`.

* 2. Add a check to throw an error at the server startup if this config property is present when the catalog type is not Hadoop, as suggested by @imjalpreet .

* 3. Clearly state in the document that this is a Hdoop catalog specific configuration property.
Do you think this makes sense?

This approach sounds good to me

steveburnett

Thanks for the quick revision! I'm glad you rephrased my suggestion for technical accuracy. Just a few minor nits of formatting and spacing, but everything looks good.

presto-docs/src/main/sphinx/connector/iceberg.rst

ZacBlanco

Mostly nits. A couple questions for discussion

presto-docs/src/main/sphinx/connector/iceberg.rst

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

ZacBlanco · 2025-02-06T06:00:08Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

+    {
+        try {
+            String writeDataLocation = table.properties().get(WRITE_DATA_LOCATION);
+            return Optional.of(Strings.isNullOrEmpty(writeDataLocation) ? table.location() : writeDataLocation);


technically the default from the LocationProviders class in iceberg is `table.location() + "/data"

After rechecking this part of the code, I found that this method seems unnecessary. It's OK to just use tableLocation as before. Because when constructing the LocationProvider later, we passed both tableLocation and tableProperties into it. And the LocationProvider will find the appropriate write data location based on tableProperties and tableLocation.

I have refactored this part of code, please take a look when convenient, thanks.

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergNativeMetadata.java

presto-iceberg/src/test/java/com/facebook/presto/iceberg/IcebergDistributedSmokeTestBase.java

ZacBlanco · 2025-02-06T06:10:10Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/TestIcebergSystemTablesHadoop.java

+    }
+
+    @Test
+    public void testPropertiesTableWithSpecifiedDataWriteLocation()


Hmm this is getting me thinking that maybe we ought to have a test class which creates a query runner configured with all of the catalog types. Then we don't have to create new test classes which have the overhead of starting new query runners all the time.

I think this is fine for the PR, but I am concerned that we're creating a lot of overhead in testing for these different config scenarios. I think it might be worth an effort to try to consolidate to reduce the CI times.

This sounds like a good idea, , I think it worth trying. And the testing scenarios of TestIcebergSystemTables are relatively simple, make it more suitable for trying this.

ZacBlanco · 2025-02-06T06:12:33Z

presto-iceberg/src/test/java/com/facebook/presto/iceberg/hadoop/TestIcebergSmokeHadoop.java

+                            "   metadata_delete_after_commit = false,\n" +
+                            "   metadata_previous_versions_max = 100,\n" +
+                            "   metrics_max_inferred_column = 100,\n" +
+                            "   \"write.data.path\" = '%s'\n" +


I think it's ok to leave for now, but I noticed adding a new table property is quite cumbersome because we have to update all of these tests which look for the exact SQL create statement. It's a bit of a burden for maintenance. Do you have any thoughts about how we might be able to make it less fragile?

Perhaps we can extract a method that can verify the result of the show create table statement, which takes the basic information, columns, and properties fo the table as parameters. All relevant tests can utilize this method, which may result in fewer changes to the test cases when adding new table property in the future.

steveburnett

LGTM! (docs)

Pull updated branch, new local doc build, looks good.

ZacBlanco

Thanks for the updates. One final question

ZacBlanco · 2025-02-07T19:18:56Z

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergAbstractMetadata.java

+                propertiesBuilder.put(WRITE_DATA_LOCATION, writeDataLocation);
+            }
+            else {
+                throw new PrestoException(NOT_SUPPORTED, "Table property write.data.path is not supported with catalog type: " + catalogType);


Is this actually the case? If the property is set on the table correctly it will be reflected in the location provider, right? I don't think we need to disable it for other catalogs?

It's a good question. Enabling this table property directly on all catalogs may cause unexpected problems. For example, if we specify this property in a table on Hive file catalog, when we trying to drop the table, we would fail with the following message:

Table test_table contains Iceberg path override properties and cannot be dropped from Presto

That would lead in a lot of troubles for test on Iceberg hive catalog.

So I choose not to enable it on other catalogs for now, and will enable them after confirming that there are no issues. Do you think it's OK?

…ther than `write_data_path`

…warehouse.datadir` to `iceberg.catalog.hadoop.warehouse.datadir`

hantangwangd force-pushed the support_seperate_write_data_location branch from 0279ea7 to 2390fb0 Compare January 18, 2025 15:14

agrawalreetika reviewed Jan 20, 2025

View reviewed changes

presto-iceberg/src/main/java/com/facebook/presto/iceberg/IcebergUtil.java Outdated Show resolved Hide resolved

hantangwangd force-pushed the support_seperate_write_data_location branch from 2390fb0 to c3c00e1 Compare January 26, 2025 18:05

hantangwangd marked this pull request as ready for review January 26, 2025 19:07

hantangwangd requested review from steveburnett, elharo, ZacBlanco and a team as code owners January 26, 2025 19:07

hantangwangd requested review from presto-oss, tdcmeehan and agrawalreetika January 26, 2025 19:07

hantangwangd changed the title ~~[WIP][Iceberg]Support setting warehouse data directory for Hadoop catalog~~ [Iceberg]Support setting warehouse data directory for Hadoop catalog Jan 27, 2025

hantangwangd requested review from imjalpreet and kiersten-stokes January 27, 2025 01:04

hantangwangd force-pushed the support_seperate_write_data_location branch from c3c00e1 to d18cd8b Compare January 27, 2025 14:05

steveburnett requested changes Jan 27, 2025

View reviewed changes

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

hantangwangd force-pushed the support_seperate_write_data_location branch from d18cd8b to 5fd6b9c Compare January 27, 2025 16:01

steveburnett previously approved these changes Jan 27, 2025

View reviewed changes

agrawalreetika previously approved these changes Jan 28, 2025

View reviewed changes

imjalpreet reviewed Jan 28, 2025

View reviewed changes

presto-iceberg/pom.xml Show resolved Hide resolved

imjalpreet reviewed Jan 28, 2025

View reviewed changes

tdcmeehan self-assigned this Jan 29, 2025

ZacBlanco reviewed Jan 29, 2025

View reviewed changes

hantangwangd dismissed stale reviews from agrawalreetika and steveburnett via 19b2e45 January 30, 2025 16:52

hantangwangd force-pushed the support_seperate_write_data_location branch from 5fd6b9c to 19b2e45 Compare January 30, 2025 16:52

ZacBlanco reviewed Feb 3, 2025

View reviewed changes

ZacBlanco mentioned this pull request Feb 4, 2025

Use iceberg property names in IcebergTableProperties #24483

Open

hantangwangd force-pushed the support_seperate_write_data_location branch 3 times, most recently from f3c462f to f08edd5 Compare February 5, 2025 14:00

steveburnett requested changes Feb 5, 2025

View reviewed changes

hantangwangd force-pushed the support_seperate_write_data_location branch 2 times, most recently from b39e317 to 3356d39 Compare February 5, 2025 18:09

steveburnett requested changes Feb 5, 2025

View reviewed changes

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

presto-docs/src/main/sphinx/connector/iceberg.rst Show resolved Hide resolved

presto-docs/src/main/sphinx/connector/iceberg.rst Outdated Show resolved Hide resolved

hantangwangd force-pushed the support_seperate_write_data_location branch from 3356d39 to 93191b4 Compare February 6, 2025 00:52

ZacBlanco reviewed Feb 6, 2025

View reviewed changes

steveburnett previously approved these changes Feb 6, 2025

View reviewed changes

hantangwangd dismissed steveburnett’s stale review via b7206dc February 6, 2025 16:08

hantangwangd force-pushed the support_seperate_write_data_location branch from ce4d9fc to b7206dc Compare February 6, 2025 16:08

ZacBlanco reviewed Feb 7, 2025

View reviewed changes

hantangwangd added 8 commits February 8, 2025 08:10

[Iceberg]Enable setting separate data write location on table creation

9ef1489

[Iceberg]Support setting warehouse data directory for Hadoop catalog

582d617

[ToBeSquashed]Address imjalpreet's comments

5fc3029

[ToBeSquashed]Address ZacBlanco's comments - use write.data.path ra…

d25df82

…ther than `write_data_path`

[ToBeSquashed]Address ZacBlanco's comments - Rename `iceberg.catalog.…

c55519e

…warehouse.datadir` to `iceberg.catalog.hadoop.warehouse.datadir`

[ToBeSquashed]Address ZacBlanco's comments

5b02f90

[ToBeSquashed]Fix document

89e89dd

[ToBeSquashed]Address comments

d57de41

hantangwangd force-pushed the support_seperate_write_data_location branch from b7206dc to 0749403 Compare February 8, 2025 01:54

[ToBeSquashed]Fix problems introduced by rebase to latest commit list

acf8a00

hantangwangd force-pushed the support_seperate_write_data_location branch from 0749403 to acf8a00 Compare February 8, 2025 03:18

hantangwangd mentioned this pull request Feb 10, 2025

[Iceberg]Refactor validation for SHOW CREATE TABLE for relevant tests #24524

Draft

6 tasks

[Iceberg]Support setting warehouse data directory for Hadoop catalog #24397

Are you sure you want to change the base?

[Iceberg]Support setting warehouse data directory for Hadoop catalog #24397

Conversation

hantangwangd commented Jan 18, 2025 • edited Loading

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

Release Notes

steveburnett left a comment

Choose a reason for hiding this comment

steveburnett left a comment

Choose a reason for hiding this comment

hantangwangd commented Jan 27, 2025

agrawalreetika left a comment

Choose a reason for hiding this comment

imjalpreet left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZacBlanco left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZacBlanco Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

hantangwangd Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

hantangwangd commented Jan 30, 2025

steveburnett commented Feb 3, 2025

ZacBlanco commented Feb 3, 2025 • edited Loading

ZacBlanco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hantangwangd commented Feb 5, 2025 • edited Loading

hantangwangd commented Feb 5, 2025

steveburnett left a comment

Choose a reason for hiding this comment

ZacBlanco commented Feb 5, 2025

steveburnett left a comment

Choose a reason for hiding this comment

ZacBlanco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

steveburnett left a comment

Choose a reason for hiding this comment

ZacBlanco left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hantangwangd commented Jan 18, 2025 •

edited

Loading

ZacBlanco left a comment •

edited

Loading

ZacBlanco Feb 3, 2025 •

edited

Loading

hantangwangd Feb 5, 2025 •

edited

Loading

ZacBlanco commented Feb 3, 2025 •

edited

Loading

hantangwangd commented Feb 5, 2025 •

edited

Loading