Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request Support for viewfs:// #645

Closed
zhezh opened this issue Aug 25, 2021 · 3 comments · Fixed by #665
Closed

Request Support for viewfs:// #645

zhezh opened this issue Aug 25, 2021 · 3 comments · Fixed by #665

Comments

@zhezh
Copy link

zhezh commented Aug 25, 2021

Problem description

Hi, smart_open currently supports hdfs://, viewfs:// is also a hadoop file system which is similar to hdfs://.

I find that tf.io.gfile.GFile of tensorflow can support both hdfs:// and viewfs://, is it possible that smart_open add support for viewfs://?

Thanks~

@mpenkov
Copy link
Collaborator

mpenkov commented Aug 25, 2021

It's possible, but only if someone steps up and implements it in a PR.

@zhezh
Copy link
Author

zhezh commented Aug 26, 2021

hi @mpenkov , I delved into the code a little bit. I find that open hdfs:// is implemented by hdfs dfs -<cmd>.
viewfs:// is also supported by hdfs dfs, which is similar to hdfs://.
I assume that we just need to parse viewfs:// to use hdfs:// scheme.

However I am not familiar with smart_open process, and have not succeeded to register viewfs://.
Would you pls give some hint on where should I alter the code?
Thanks~

ChandanChainani added a commit to ChandanChainani/smart_open that referenced this issue Oct 21, 2021
As stated by @zhezh that hdfs support viewfs
we just need to convert viewfs to hdfs uri internally
and it will work.
@ChandanChainani
Copy link
Contributor

@mpenkov I have created a PR for this feature request can you review it?

mpenkov added a commit that referenced this issue Feb 19, 2022
* Added support for viewfs:// URLs. #645

As stated by @zhezh that hdfs support viewfs
we just need to convert viewfs to hdfs uri internally
and it will work.

* passed `split_uri.scheme` instead of hdfs

* Update hdfs.py

* rewrite tests in pytest style

* parameterize tests to cover viewfs schema

* disable hdfs tests under windows

Getting this error: [WinError 6] The handle is invalid

* flake8

* Update CHANGELOG.md

Co-authored-by: Michael Penkov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants