Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for azure authentication mechanisms. #28

Closed
superdupershant opened this issue Dec 5, 2021 · 9 comments
Closed

Support for azure authentication mechanisms. #28

superdupershant opened this issue Dec 5, 2021 · 9 comments
Labels
enhancement New feature or request

Comments

@superdupershant
Copy link
Collaborator

Describe the feature

Beyond simple PAT tokens supporting some of Azure AAD based authentication would great.

Additional context

dbt-sqlserver is a good example of how to get vaild auth tokens, and databricks-sql-connector already supports taking auth_token arguments.

Who will this benefit?

Users trying to use AAD based SSO or other features.

@superdupershant superdupershant added the enhancement New feature or request label Dec 5, 2021
@github-actions
Copy link

github-actions bot commented Jun 4, 2022

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 4, 2022
@dataders
Copy link
Contributor

dataders commented Jun 5, 2022

I'm a big supported of this feature, and I'd love to help out, as I originally "assisted" with adding AAD auth to dbt-sqlserver. 👀: @JCZuurmond

@superdupershant
Copy link
Collaborator Author

Awesome to hear that Anders! At least for Databricks it would be pretty straight forward if we had an easy way to pull in all the Auth code from dbt-sqlserver. Whichever AAD auth mode the user selects eventually they get a token back, that token would just be used in place of the password with the Databricks connection and if the workspace is AAD enabled the connection will succeed.

@github-actions github-actions bot removed the Stale label Jun 6, 2022
@dataders
Copy link
Contributor

dataders commented Jun 6, 2022

in dbt-sqlserver, we're doing exactly what you say. basically use the Azure Python SDK's azure-identity package to get a token, then send it out when connecting with pyodbc. Here's the helper functions that do the work bulk of the work right now. There's some opportunity to clean this up, and potentially publish it as a single class in a standalone PyPI package that both dbt-sqlserver and dbt-databricks could make use of?

For more, info see our guide on how to authenticate using AAD and dbt.

The hardest pill for me to swallow is drawing a dependency on the Azure CLI. As Scott Henderson writes:

to use Azure, you typically need to use the az command line utility. This authenticates you to Azure, and allows you management access to practically all their APIs. It’s a wonderful, functional tool, written in Python and provided as open source.

However, if you want to install this command - beware! It’s an absolute monster, weighing in at over a gigabyte in its current incarnation. This problem has been known about since a bug was raised in 2018, and as a user back then it was nowhere near as bad - maybe a few hundred meg at that stage.

The root cause is the Azure Python APIs, which are horrendously bloated. Microsoft’s backward compatibility is legendary, of course, but what has happened in the Python API is that each incompatible change has caused an in-API code fork to occur - exploring the repo is an Inception-like experience, with each subdirectory looking much the same as the others, all alike.

Plus, in order to support these old APIs, Microsoft has taken to packaging an entire python runtime with the utility, to ensure it runs correctly. And all the python bytecode cache. The only thing missing is the kitchen sink.

@bilalaslamseattle
Copy link
Collaborator

@ueshin this issue is almost a year old now. It seems like the most bitter pill to swallow is the Azure CLI (yikes).

@sdebruyn
Copy link

For those authenticating using Azure, you can use Az CLI to get a valid token and then use that in the regular config:

  1. Sign in with Az CLI: az login
  2. Fetch an access token for Databricks (the ID is static for all Databricks workspaces): aad_token_response=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d)
  3. Parse the access token (you need jq installed): aad_token=$(jq .accessToken -r <<< "$aad_token_response")
  4. Store the token in an environment variable export DATABRICKS_AAD_TOKEN=$aad_token

Then in your profiles.yml you can authenticate using the stored token: token: "{{ env_var('DATABRICKS_AAD_TOKEN') }}"

@sdebruyn
Copy link

sdebruyn commented Mar 20, 2023

But as discussed above, it would be nicer to use the azure-identity package to retrieve a token automatically so that you can also use this in setups with managed identity, service principals etc. without having to use the CLI.

@andrefurlan-db
Copy link
Collaborator

Support for Azure AD OAuth has been added on #327

@AkhilGNair
Copy link

AkhilGNair commented Mar 4, 2024

Just a note on @sdebruyn's answer (very useful, thanks)

az account get-access-token --resource $DATABRICKS_RESOURCE_ID --query accessToken -otsv

also works, rather than hopping over to jq

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants