-
Notifications
You must be signed in to change notification settings - Fork 14.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fsspec base Hook for filesystem operations #33578
Conversation
We are about to unsuspend yandex provider (which was the main reason why the PR failed) . You should rebase after we merge it: #33574 To be honest, I am not sure if we need it. I think there is a very little benefit from having such "common" provider. We already have (and continue having) a big share of problems resulting from common.sql extraction - and there was a good reason for that extraction - i.e. Open Lineage integration might benefit from it. With Filesstem, I am not sure what are the benefits. The key thing is -extracing code is cool (because of DRY) but it introduces coupling (i.e. all users of the common code become implicitly dependent). And it introduces all kinds of issues where change in the common code might lead to dubtle bugs in related providers. I yet fail to see what would be the benefits of introducing such common code. But maybe I am short-sited? |
I understand your point of view and agree that having such generic class mixed with a lot of other ones make changes hard to do and to predict all the impacts. What about a less "common" provider, more focused on
Having a
Do you think that this use case would benefit from introducing a FileSystem connection, or do I mis-conceived my operator and/or workflow ? |
I suggest you to do a different thing. Write a taskflow operator for your case using hooks. I wrote a blog post about it some time ago - and IMHO using this kind of approach is way better than trying to make a "generic" transfer operator https://medium.com/apache-airflow/generic-airflow-transfers-made-easy-5fe8e5e7d2c2 |
@@ -145,7 +146,7 @@ def wrapper(*args, **kwargs) -> T: | |||
return cast(T, wrapper) | |||
|
|||
|
|||
class S3Hook(AwsBaseHook): | |||
class S3Hook(AwsBaseHook, FsApiHook): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering, what kind of behaviour expected here?
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Related: #8804, #3526
As explained in the related issue, it can be hard to cover all the
<provider>To<provider>
use cases. It could be nice to have a generic "file transfer" operator which can handle most of the storage provider (local filesystem as well as cloud blob storages). Such kind of storage wrapper already exists as thefsspec
library which is already an Airflow dependency.This PR aims at providing a base "class" (it's more like a mixin) to mix with blob storage provider (such as S3, GCS, etc.) to allow easy creation of fsspec.FileSystem from such hooks.
I am not very fluent with the Hook/Connection API, so I am not sure that this PR make sense.