Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection test: [ERROR] - dbt-databricks behind proxy #111

Closed
thuanvan opened this issue Jun 10, 2022 · 19 comments · Fixed by #311
Closed

Connection test: [ERROR] - dbt-databricks behind proxy #111

thuanvan opened this issue Jun 10, 2022 · 19 comments · Fixed by #311
Labels
bug Something isn't working

Comments

@thuanvan
Copy link

Describe the bug

A clear and concise description of what the bug is. What command did you run? What happened?
dbt debug gives error
Connection test: [ERROR]

1 check failed:
dbt was unable to connect to the specified database.
The database returned the following error:

Runtime Error
Database Error
failed to connect

ENV set
HTTP_PROXY
HTTPS_PROXY

Does not seemed that proxy environment are being used
curl to host/http_path is OK

Steps To Reproduce

In as much detail as possible, please provide steps to reproduce the issue. Sample data that triggers the issue, example model code, etc is all very helpful here.

dbt debug

Expected behavior

A clear and concise description of what you expected to happen.
connection test OK

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

The output of dbt --version:
Core:

  • installed: 1.1.0
  • latest: 1.1.0 - Up to date!

Plugins:

  • databricks: 1.1.0 - Up to date!
  • spark: 1.1.0 - Up to date!

The operating system you're using:
ubuntu
The output of python --version:
Python 3.8.10

Additional context

Add any other context about the problem here.

@thuanvan thuanvan added the bug Something isn't working label Jun 10, 2022
@bilalaslamseattle
Copy link
Collaborator

@thuanvan thanks for filing this. Investigating whether our Python connector supports HTTP proxies. Will get back to you!

@thuanvan
Copy link
Author

thuanvan commented Jun 15, 2022

getting this in curl test
curl --netrc -v https://adb-**REDACTED**.azuredatabricks.net:443/sql/protocolv1/o/**REDACTED**/**REDACTED**

Error 500 javax.servlet.ServletException: org.apache.thrift.transport.TTransportException

@bilalaslamseattle
Copy link
Collaborator

@thuanvan I verified that we don't support proxy yet. We'll get this prioritized on our roadmap.

@thuanvan
Copy link
Author

Thanks for confirming. Thank you for prioritizing it. We'll look into on how to get a proxy-bypass.

@thuanvan
Copy link
Author

odd. since we have working instances where we go through proxy. Can you elaborate?

@bilalaslamseattle
Copy link
Collaborator

@thuanvan I'm waiting for the engineer to come back from vacation. He'll look into it.

@xg1990
Copy link

xg1990 commented Jul 14, 2022

I did some analysis previously: The dbt-databricks adaptor is based on thrift protocol. It is RPC not REST. And I cannot find it supports PROXY (in an easy way).

Our team's workaround is to use databricks IP whitelisting to protect the databricks workspace.
Privatelink for databricks is still a beta feature, it might take a while to become GA

@bilalaslamseattle
Copy link
Collaborator

@xg1990 @thuanvan we are going to add proxy support to the Python connector first (databricks/databricks-sql-python#22). Then we will add support to dbt-databricks.

@thuanvan
Copy link
Author

When databricks/databricks-sql-python#22 is fixed will this also fixed for this issue?

@bilalaslamseattle
Copy link
Collaborator

@thuanvan we're still waiting for databricks/databricks-sql-python#22 to land.

@susodapop
Copy link

Hey folks just letting you know that we have a fix for this under review in databricks-sql-connector here. The fix will be included in the connector version 2.3.1. If you want to test it in the interim we have a dev version that you can pip install databricks-sql-connector==2.3.1.dev1.

@alexdiem
Copy link

@susodapop I have just encountered this problem, however your suggestion did not fix it unfortunately. I just get a lot of "Hey I was called!" messages before ending up with "failed to connect"

@susodapop
Copy link

@alexdiem thanks for the report! The hey I was called! message won't be present in the final release 😄

There's not enough information to reproduce your issue in your message. What values did you use for your proxy environment variables? Of course redact any sensitive information.

@alexdiem
Copy link

It is the exact same problem as Thuan has (we are colleagues in the same office), and several others have it as well. I have set
export HTTP_PROXY="http://test:test@**REDACTED**:8080"
export HTTPS_PROXY="http://test:test@**REDACTED**:8080"
It is very odd because it used to work, and I did not make any changes to the proxy settings

@msdotnetclr
Copy link

msdotnetclr commented Mar 24, 2023

Looks like the problem is in thrift.transport.THttpClient:

Problem code:

    @staticmethod
    def basic_proxy_auth_header(proxy):
        if proxy is None or not proxy.username:
            return None
        ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
                        urllib.parse.unquote(proxy.password))
        cr = base64.b64encode(ap).strip()
        return "Basic " + cr

In my test, the HTTP(S)_PROXY environment variables values are correctly captured but ap is a "regular" string as opposed to a byte string thus the base64.b64encode() call fails.
Fix:

    @staticmethod
    def basic_proxy_auth_header(proxy):
        if proxy is None or not proxy.username:
            return None
        ap = "%s:%s" % (urllib.parse.unquote(proxy.username),
                        urllib.parse.unquote(proxy.password))
        cr = base64.b64encode(ap.encode()).decode().strip()
        return "Basic " + cr

However, since the problem is in the thrift package, we can't simply fix it in this project...

@susodapop
Copy link

@msdotnetclr we've actually fixed this in databricks-sql-connector without needing to update the upstream thrift dependency. It needs to merge and be deployed to Pypi, then we'll update the dbt-databricks dependency and proxies will work.

@susodapop
Copy link

Here's the PR that fixes it databricks-sql-connector: databricks/databricks-sql-python#81

@msdotnetclr
Copy link

Ah, nice. I only started to play around with the connector this morning and did not get to look into other linked issues. Good to know there is a better fix already!

@susodapop
Copy link

susodapop commented Apr 15, 2023

The fix has merged into databricks-sql-connector and is part of release v2.5.0.

I'll open a PR here that bumps the dependency so we pick up the proxy fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants