-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure root path in Catalog or FileIO? #488
Comments
cc: @liurenjie1024 @Xuanwo let me know what you think, I've shared my thoughts above. |
Thanks @fqaiser94 for raising this. In short, I'm also in favor of option 1 to align with java/python implementations. The root path is a property of underlying storage layer(Apache OpenDAL), and I've checked code that currently we haven't exposed it to user. Currently the root is always set to top most dir(see fs storage). Also as @fqaiser94 currently iceberg community has no support for relative path, so setting root to top most dir fits into this requirement best to me. |
Thanks for starting this discussion first. My reason for not including a My initial idea was to move So I visit the pyiceberg's logic and found that it has a def _get_default_warehouse_location(self, database_name: str, table_name: str) -> str:
database_properties = self.load_namespace_properties(database_name)
if database_location := database_properties.get(LOCATION):
database_location = database_location.rstrip("/")
return f"{database_location}/{table_name}"
if warehouse_path := self.properties.get(WAREHOUSE_LOCATION):
warehouse_path = warehouse_path.rstrip("/")
return f"{warehouse_path}/{database_name}.db/{table_name}"
raise ValueError("No default path is set, please specify a location when creating a table") I believe it makes more sense for us to share the implementation of the default warehouse location logic.
Maybe we can align with this behavior? |
Sounds reasonable to me. A more complete method is here: https://github.com/apache/iceberg-python/blob/055938d36d46efb94849b1d861cd8a8a111f6ae5/pyiceberg/catalog/__init__.py#L880
|
Thank you for the correction. This logic seems sound to me. |
Ack, let me see if I can refactor a little bit to match the suggested logic. |
Accidentally closed the previous PR (and discovered you can't reopen closed PRs after a while 🤯). New PR here: #511 |
There was an unresolved discussion about whether we should:
MemoryCatalog
FileIO
with aroot
path (this is the approach currently implemented in iceberg-rust code)FileIO
FileIO
implementations I looked at (S3FileIO
,GcsFileIO
) allow configuring aroot
path. I suspect this may affect interoperability with the Java ecosystem, not 100% sure though.Overall, I have concerns about option 2 (which is what is currently implemented) and am currently in favour of changing the code to follow option 1: allow users to configure a path as a root of the catalog and create tables underneath that.
Thoughts?
The text was updated successfully, but these errors were encountered: