Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigQuery authentication on remote servers #8489

Closed
andrioni opened this issue Oct 6, 2014 · 17 comments
Closed

BigQuery authentication on remote servers #8489

andrioni opened this issue Oct 6, 2014 · 17 comments
Milestone

Comments

@andrioni
Copy link

andrioni commented Oct 6, 2014

Hello, I can't get the BigQuery auth to work on remote servers:

WARNING:root:This function, oauth2client.tools.run(), and the use of the gflags library are deprecated and will be removed in a future version of the library.
Your browser has been opened to visit:

    big-url-here

If your browser is on a different machine then exit and re-run
this application with the command-line parameter

  --noauth_local_webserver

Even if I try to open that URL on my workstation, it tries to redirect me to a local server. The documentation says that:

Should the browser not be available, or fail to launch, a code will be provided to complete the process manually.

however, I've yet to see this alternative.

Any help?

@jreback
Copy link
Contributor

jreback commented Oct 7, 2014

cc @jacobschaer

@andrewryno
Copy link

What I wound up doing to get around this was to run a basic query using an iPython notebook, authenticating with Google BigQuery, and then copying the resulting bigquery_credentials.dat file to the remote servers. Initially I had to just rename the .bigquery.v2.token that the old client used. Annoying, but it at least works since it has a refresh token so I don't need to touch it. Would be nice to get a better flow going.

@jacobschaer
Copy link
Contributor

@andrewryno Just curious, what would a good flow be for you? I haven't had this use case in a while... When we originally wrote the module it would yield a token string to enter into your browser. What version of the bigquery libraries are you using?

@andrewryno
Copy link

@jacobschaer I've thought about it a bit but haven't come up with a great solution. As long as I can do it in a REPL on a remote server without a browser, that's fine. I can't remember which tool it was (old bq version possibly), but they output the URL for you to go to manually, set the redirect_uri to urn:ietf:wg:oauth:2.0:oob which gives you a code to enter into the CLI. Won't try to open a browser automatically but you do it manually to finish the flow. It's the installed application flow instead of a user flow.

@parthea
Copy link
Contributor

parthea commented Sep 13, 2015

This exists in the latest code (0.17.0). When I run df.to_gbq() without the bigquery_credentials.dat the code seems to fail silently. I see the following message: Your browser has been opened to visit: When I click on the link, it redirects me to http://localhost:8080/, which fails since the iPython server is remote.

I would like to try and develop a feature to improve the authentication flow on remote servers if no one else is working on it.

@parthea
Copy link
Contributor

parthea commented Sep 13, 2015

I see there is a PR for this already, although I haven't had a chance to try it #8590 . The PR was closed and code wasn't merged. Do we re-open #8590 or create a new PR?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

@parthea you could certainly pull that out and re-purpose it (e.g. you can incorporate parts of that and/or copy) and then just open a new PR

@parthea
Copy link
Contributor

parthea commented Oct 12, 2015

I have a new proposal in #11141. Let me know if it is ok and I will implement.

@meetwudi
Copy link

Well, my suggestion will be don't use Google Account credentials, which is very personal and should not be presented in any public servers. Please consider using Service Account instead.

The Google OAuth 2.0 system supports server-to-server interactions such as those between a web application and a Google service. For this scenario you need a service account, which is an account that belongs to your application instead of to an individual end user.

So to sum up, reasons to use Service Account are:

  1. SAFETY. Binding services with individual's Google Account can lead to severe security issues and might expose your service to malicious attack. This is because you cannot restrict the usage of personal account.
  2. Easy process. All you need to do is downloading JSON Key File from Developer Console, and read it in your app.

Also please note that, Service Account is designed for your app to access Google API without requiring user to sign in. If, you want users to access their personal data (with their own permission), then use browser flow instead.

meetwudi added a commit to meetwudi/pandas that referenced this issue Oct 15, 2015
It is recommended by Google that we use Service Account credentials make
API calls from server. See [this issue
comment](pandas-dev#8489 (comment))
for details.

Signed-off-by: John Wu <[email protected]>
@meetwudi
Copy link

To further illustrate my solution, refer to my rough implementation #11335

@parthea
Copy link
Contributor

parthea commented Oct 15, 2015

So to sum up, reasons to use Service Account are:

  1. SAFETY. Binding services with individual's Google Account can lead to severe security issues
    and might expose your service to malicious attack. This is because you cannot restrict the usage of > personal account.
  2. Easy process. All you need to do is downloading JSON Key File from Developer Console, and > >read it in your app.

Using JupyterHub may reduce the concern . See JupyterHub

From JupyterHub GettingStarted:
Single User Server: a dedicated, single-user, Jupyter Notebook is started for each user on the system when they log in. The object that starts these processes is called a Spawner.

I'm not too familiar with JupyterHub. I'm going to start playing around with it to see if there is a better (secure) solution for authenticating users using the browser flow. JupyterHub looks very promising.

@meetwudi
Copy link

@parthea I am not familiar with JupiterHub either. But I don't think it will be a good idea to rely on other tools. I mean, scenarios are different, and we cannot make any assumptions about what kind of environment panda will be run against.

@parthea
Copy link
Contributor

parthea commented Oct 15, 2015

IPython (Jupyter) is listed in the pandas ecosystem under IDE.

@meetwudi
Copy link

@parthea It's still just one of them, right? So it's an assumption, which I personally don't agree with it. Like in our use case we don't use that at all. Sure problem can be solve if you use both Jupiter with JupiterHub, specifically, which is not a common solution, in terms of using pandas. :)

@parthea
Copy link
Contributor

parthea commented Oct 16, 2015

It sounds like supporting both service account and user account is the best option. In that case, it is up to the user to select which configuration is the most appropriate for their specific setup.

@meetwudi
Copy link

+1 for supporting both of them. My reason will be backward compatibility, though.

@tworec
Copy link
Contributor

tworec commented Dec 22, 2015

@ RTBHouse we are using service account auth since may. Yesterday I eventually decided to create pull request. See #11881 . I think it will fulfill requirements mentioned here.

@jreback jreback modified the milestones: 0.18.0, Next Major Release Dec 29, 2015
@jreback jreback closed this as completed in 6a32f10 Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants