-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer overflow in search_for_pool #128
Comments
Trace: [ODBC][8][1672885744.938200][__handles.c][499] [ODBC][8][1672885744.938341][SQLAllocHandle.c][531] |
I compiled with address sanitization: ================================================================= |
Oof. Looks like there is an inherent limit of 1024 characters for the connection string that assumed pretty much all over the place in the code. We are running into the same issue that was reported to Microsoft here: dotnet/runtime#63630 After initially declining to fix the issue they did after the customer let them know it was for accessing Databricks via the Simba Spark ODBC driver. We have to attach a JWT token to the connection string and that itself is well over 1024 characters. |
My first guess is its this structure in driver_manager.h
typedef struct connection_pool_head
{
struct connection_pool_head *next;
char driver_connect_string[ 1024 ];
int dsn_length;
char server[ 128 ];
int server_length;
char user[ 128 ];
int user_length;
char password[ 128 ];
int password_length;
int num_entries; /* always at least 1 */
CPOOLENT *entries;
} CPOOLHEAD;
Could you try changing your local unixODBC copy to increase that and
check it fixes your problem. If so, it may be worth making
driver_connect_string a char * and allocating it rather than being a
fixed buffer.
|
Looking a bit further, there is also a limit of 1k in SQLDriverConnect
when the connection string is generated by the GUI, but I don;t think
thats going to happen now as the GUI is AFAIK not used now.
|
Moved conversation to here: #129 |
On 05/01/2023 12:49, Mike wrote:
Moved conversation to here: #129
<#129>
—
Reply to this email directly, view it on GitHub
<#128 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62MIQYBCU2WDVOMDPYTWQ27NDANCNFSM6AAAAAATRN24VY>.
You are receiving this because you commented.Message ID:
***@***.***>
I have some code to commit that should work dynamically. I will commit
and you can check, should do the same as your patch
|
This needs to be dynamically allocated as well. unixODBC/DriverManager/drivermanager.h Line 384 in 46b1179
|
On 05/01/2023 13:13, Mike wrote:
This needs to be dynamically allocated as well.
https://github.com/lurcher/unixODBC/blob/46b117957c5402316e91797a01ed1b6a1183ac57/DriverManager/drivermanager.h#L384
—
Reply to this email directly, view it on GitHub
<#128 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62KCNNTRNFPLAJVC7ODWQ3CHLANCNFSM6AAAAAATRN24VY>.
You are receiving this because you commented.Message ID:
***@***.***>
Good catch. Though I am not sure that member is used now. Further
changed pushed.
|
Your fix didn't work for me. My compiler must not work the same way yours does. The definition of calloc is implementation specific if size is 0 per the few sites I read. I'd assume that's the issue. It's been ages since I've done this stuff. I had to do calloc(1, <size of allocation) or swap the params. The first seems to be standard used in other parts of the code. Second, the additional member discussed is most certainly used. Here is the stack from sanitization without the second commit you made:
Here is the diff I needed to get me going based on current master:
Note: I removed the +1 on the case of a defined connection_string_length because copy_nts uses memcpy for that. No additional null needed. |
On 05/01/2023 14:24, Mike wrote:
Your fix didn't work for me. My compiler must not work the same way
yours does. The definition of calloc is implementation specific if
size is 0 per the few sites I read. I'd assume that's the issue. It's
been ages since I've done this stuff.
That was just stupid code from me, fix committed. Sorry.
|
That fixes my issue! Thanks. Any idea when you will be cutting a release? I notice you don't do them very often. |
On 05/01/2023 14:45, Mike wrote:
That fixes my issue! Thanks. Any idea when you will be cutting a
release? I notice you don't do them very often.
I prefer stability over lots of releases. It takes some time for the
distros to update anyway. So no real plans but always up for being
convinced.
|
This is exactly my concern. Databricks is really pushing their SQL offering as a convenient way to access the output of spark processes that are generated from the spark jobs. (It's what got me going down this path). Most enterprise users are going to want to use the oauth token based interface as opposed to a PAT/secret since it's way easier to manage and support. Tokens themselves are over 1k in length. If it takes months for this to make it's way into the distros and into conda feeds, releasing it once critical mass starts running into it will mean even longer delays and ugly workarounds. Selfishly I don't know how I'm going to solve this problem for the general case. If one of our quants/data scientists installs our internal access package it pulls in unixodbc from conda. I don't exactly know how I'm going to override the packaged versions of unixodbc with the a custom build. |
On 05/01/2023 15:58, Mike wrote:
|It takes some time for the distros to update anyway. |
This is exactly my concern. Databricks is really pushing their SQL
offering as a convenient way to access the output of spark processes
that are generated from the spark jobs. (It's what got me going down
this path). Most enterprise users are going to want to use the oauth
token based interface as opposed to a PAT/secret since it's way easier
to manage and support. If it takes months for this to make it's way
into the distros and into conda feeds, releasing it once critical mass
starts running into it will mean even longer delays and ugly workarounds.
Selfishly I don't know how I'm going to solve this problem for the
general case. if one of our quants/data scientists installs our
internal package it pulls in unixodbc from conda. I don't exactly know
how I'm going to override the packaged versions of unixodbc with the a
custom build.
—
Reply to this email directly, view it on GitHub
<#128 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62IQMIIBVI5M3OKVSS3WQ3VS7ANCNFSM6AAAAAATRN24VY>.
You are receiving this because you commented.Message ID:
***@***.***>
Does it have any SQLSertConnectAttr option to pass in the token instead
of the connection string.
|
It looks like yes, but only if you initially passed in a token in the connection string. I also don't know how to do this from pyodbc which would be one of the most common use cases I'd assume. It's also not really documented anywhere: https://learn.microsoft.com/en-us/azure/databricks/integrations/jdbc-odbc-bi#--authentication-parameters |
FWIW I looked through the pyodbc code and even if that was a supported method (Removing the Auth_AccessToken causes the driver to throw saying it's required) I don't see how I could do this in the pyodbc space. |
On 05/01/2023 17:07, Mike wrote:
FWIW I looked through the pyodbc code and even if that was a supported
method (Removing the Auth_AccessToken causes the driver to throw
saying it's required) I don't see how I could do this in the pyodbc space.
—
Reply to this email directly, view it on GitHub
<#128 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABYK62KY3EZ5PLE7OUTTNC3WQ35UDANCNFSM6AAAAAATRN24VY>.
You are receiving this because you commented.Message ID:
***@***.***>
I just when and reminded myself how we (Easysoft) did it with OAuth and
Spark. We allow the access token to be in the DSN as well as the
connection string. But I didn't notice the problem as the access tokens
we had were only 20 characters or so.
|
That (dsn in odbc.ini) also wouldn't solve my issue. We need to make sure users in a shared compute environment are using their own credentials. Our shared library abstracts this from our users so they can basically do: ourdatabricks.read(select blah) under the hood it gets an oauth token for the user and crafts a custom connection string for them. |
On 05/01/2023 17:24, Mike wrote:
That also wouldn't solve my issue. We need to make sure users in a
shared compute environment are using their own credentials. Our shared
library extracts this from our users so they can basically do:
ourdatabricks.read(select blah)
under the hood it gets an oauth token for the user and crafts a custom
connection string for them.
Fair enough. Sounds like a job for Kerberos, but anyhow, there is the
fix now in git.
|
My life was so much easier before oauth! |
I'm sorry if this is not the correct place to start with this issue, but this library happens to be at the bottom of the offending stack.
There appears to be some sort of incompatiblity between pyodbc, unixodbc, connection pooling, and long connection strings.
Here is the stack trace when running under valgrind
You can reproduce this by running:
docker run jabberaa/spark-odbc-issue:latest
The repro source code is here:
https://github.com/jabbera/spark-odbc-issue
Happy to try any other techniques you can suggest to get to the bottom of the issue.
Thanks for your help.
The text was updated successfully, but these errors were encountered: