-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: Deprecate ContentFile#path API and add location API which returns String #11092
API: Deprecate ContentFile#path API and add location API which returns String #11092
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice to keep the name, but overloading could not apply to the return type.
CharSequence path(); | ||
|
||
/** Return the fully qualified path to the file, suitable for constructing a Hadoop Path */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether we still want to keep the Hadoop Path
reference here as it quite evolved over the years and we do not necessarily depend on Hadoop anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I was largely concerned if we didn't preserve this property of being suitable for a hadoop path at the API level, once path
is deprecated, it may break the users that do rely on having this. But if we're saying that we don't really need to concern ourselves on maintaining this, we may not need to include it. cc @rdblue thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @rdblue @danielcweeks and here are some key points on why we shouldn't document anything related to hadoop:
- Hadoop pathing doesn't work with storage systems like GCS
- Locations are a super set of what can be represented in a hadoop path. Even today
path
API can return paths which simply cannot be represented in a hadoop Path. - The hadoop path normalization logic simply conflicts with absolute paths
So updated to remove this part of the comment. I'll leave the existing path
API comment as is since it's being deprecated anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this change. Internally, we always use String
objects for serializability and most places that consume have explicit conversion to String
already.
95b75b5
to
963d321
Compare
Co-authored-by: Eduard Tudenhoefner <[email protected]>
963d321
to
ea7d2ec
Compare
THanks for reviewing @flyrain @nastra @aokolnychyi , I'll go ahead and merge |
…s String (apache#11092) Co-authored-by: Eduard Tudenhoefner <[email protected]>
ContentFile#path returns CharSequence instead of String. The original intention of this API returning CharSequence was to directly surface Avro UTF8 without any conversion, however for distributed cases the CharSequence needs to be converted into String since CharSequence itself isn't serializable. Furthermore, in terms of naming we tend to use location in other places like manifestListLocation or metadataLocation, also in FileIO. This change performs the following:
Over time up until 2.0 we can remove references to path and use location. After 2.0 implementations of ContentFile would need to implement location.