Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Google Cloud Storage filesystem/target. #999

Merged
merged 1 commit into from
Jun 9, 2015
Merged

Conversation

mikekap
Copy link
Contributor

@mikekap mikekap commented Jun 8, 2015

In theory, GCS is supported by boto, which would be a better option. Unfortunately Google's API authentication is a
mess - the boto GCS implementation basically only supports the CLI use-case (it expects a boto config to house all
the configuration). It has minimal support for using oauth tokens, but there is no way to use a service account,
which is what they recommend you use for automated tasks. You can see it at https://github.com/GoogleCloudPlatform/gcs-oauth2-boto-plugin .

Because of this insanity we use the google-api-python-client package to speak to GCS via their JSON protocol. This
is...not easy. Either way, this change does work and passes quite a few tests. Unfortunately since the API is
auto-generated, you can be sure that there are no fake implementations of it (e.g. moto). Hence the test is an
integration test and requires credentials on your box to access GCS.

You can get creds by going through the gcloud tools setup guide. After that just create a project & pick a bucket
and everything should work :)

@Tarrasch
Copy link
Contributor

Tarrasch commented Jun 8, 2015

Aw damit, I hoped this would be PR 1000. Oh well. :)

@mikekap
Copy link
Contributor Author

mikekap commented Jun 8, 2015

I'm preparing to send https://github.com/spotify/luigi/compare/master...vine:bq?expand=1 too so...maybe then? :) I still have to write a test for it though since we don't have an internal test for that either :X

BTW Would you guys prefer to use enum34 in luigi? There's a bunch of enums in bigquery and I'm not sure whether to just go with "static field in object" or import enum34.

@erikbern
Copy link
Contributor

erikbern commented Jun 8, 2015

Looks good except build errors

@Tarrasch
Copy link
Contributor

Tarrasch commented Jun 8, 2015

By the way you know you can run pep8 check locally right? See CONTRIBUTING.md :)

@mikekap
Copy link
Contributor Author

mikekap commented Jun 8, 2015

I've been doing that but seems I made a change afterwards :X . This thing is giving me python3 headaches; let me see if I can nail it down.

In theory, GCS is supported by boto, which would be a better option. Unfortunately Google's API authentication is a
mess - the boto GCS implementation basically only supports the CLI use-case (it expects a boto config to house all
the configuration). It has minimal support for using oauth tokens, but there is no way to use a service account,
which is what they recommend you use for automated tasks. You can see it at https://github.com/GoogleCloudPlatform/gcs-oauth2-boto-plugin .

Because of this insanity we use the google-api-python-client package to speak to GCS via their JSON protocol. This
is...not easy. Either way, this change does work and passes quite a few tests. Unfortunately since the API is
auto-generated, you can be sure that there are no fake implementations of it (e.g. moto). Hence the test is an
integration test and requires credentials on your box to access GCS.

You can get creds by going through the gcloud tools setup guide. After that just create a project & pick a bucket
and everything should work :)
@mikekap
Copy link
Contributor Author

mikekap commented Jun 8, 2015

Ok I think this build should be good. PTAL

@@ -0,0 +1,422 @@
# -*- coding: utf-8 -*-
#
# Copyright 2015 Twitter Inc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really work at Twitter? I thought you worked for wine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both are true - Vine is owned by Twitter.

Tarrasch added a commit that referenced this pull request Jun 9, 2015
Add a Google Cloud Storage filesystem/target.
@Tarrasch Tarrasch merged commit b3be368 into spotify:master Jun 9, 2015
@Tarrasch
Copy link
Contributor

Tarrasch commented Jun 9, 2015

@alexvanboxel This is merged now, so go ahead and touch functionality if you want. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants