-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Google Cloud Storage filesystem/target. #999
Conversation
Aw damit, I hoped this would be PR 1000. Oh well. :) |
I'm preparing to send https://github.com/spotify/luigi/compare/master...vine:bq?expand=1 too so...maybe then? :) I still have to write a test for it though since we don't have an internal test for that either :X BTW Would you guys prefer to use enum34 in luigi? There's a bunch of enums in bigquery and I'm not sure whether to just go with "static field in object" or import enum34. |
Looks good except build errors |
By the way you know you can run pep8 check locally right? See CONTRIBUTING.md :) |
I've been doing that but seems I made a change afterwards :X . This thing is giving me python3 headaches; let me see if I can nail it down. |
In theory, GCS is supported by boto, which would be a better option. Unfortunately Google's API authentication is a mess - the boto GCS implementation basically only supports the CLI use-case (it expects a boto config to house all the configuration). It has minimal support for using oauth tokens, but there is no way to use a service account, which is what they recommend you use for automated tasks. You can see it at https://github.com/GoogleCloudPlatform/gcs-oauth2-boto-plugin . Because of this insanity we use the google-api-python-client package to speak to GCS via their JSON protocol. This is...not easy. Either way, this change does work and passes quite a few tests. Unfortunately since the API is auto-generated, you can be sure that there are no fake implementations of it (e.g. moto). Hence the test is an integration test and requires credentials on your box to access GCS. You can get creds by going through the gcloud tools setup guide. After that just create a project & pick a bucket and everything should work :)
Ok I think this build should be good. PTAL |
@@ -0,0 +1,422 @@ | |||
# -*- coding: utf-8 -*- | |||
# | |||
# Copyright 2015 Twitter Inc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you really work at Twitter? I thought you worked for wine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are true - Vine is owned by Twitter.
Add a Google Cloud Storage filesystem/target.
@alexvanboxel This is merged now, so go ahead and touch functionality if you want. :) |
In theory, GCS is supported by boto, which would be a better option. Unfortunately Google's API authentication is a
mess - the boto GCS implementation basically only supports the CLI use-case (it expects a boto config to house all
the configuration). It has minimal support for using oauth tokens, but there is no way to use a service account,
which is what they recommend you use for automated tasks. You can see it at https://github.com/GoogleCloudPlatform/gcs-oauth2-boto-plugin .
Because of this insanity we use the google-api-python-client package to speak to GCS via their JSON protocol. This
is...not easy. Either way, this change does work and passes quite a few tests. Unfortunately since the API is
auto-generated, you can be sure that there are no fake implementations of it (e.g. moto). Hence the test is an
integration test and requires credentials on your box to access GCS.
You can get creds by going through the gcloud tools setup guide. After that just create a project & pick a bucket
and everything should work :)