-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use SQLBindCol when possible #1322
base: master
Are you sure you want to change the base?
Conversation
Wow. That would be a really big performance improvement. It's a lot of code to go through, but I'll try to get to is soon. |
Cool, thanks! Unfortunately I couldn't get native uuids to work on 64bit windows (seems to work fine everywhere else), so I switched back to GetData for that. The point at which it fails is the Py_BuildValue call in GetUUID. I tried a few things, including using the exact same buffer (in heap) for both cases (getting data via SQLGetData vs SQLBindCol+Fetch) and it looks to me like we're getting the same bytes both ways. Maybe there is some memory alignment issue I'm not understanding. I guess I'm also not sure why PYSQLGUID is defined the way it is and not just 16 bytes, maybe that has something to do with it. |
Hello, I can add some note to performance .... for last couple of months I'm trying to solve performance issue when downloading the data from IBM iSeries (AS400/ IBM i). IBM support analyzed ODBC trace and missing SQLBindCol resulting in "In the end about 80% of the time is spent in the application, about 5% on the IBM i and the rest is the network". Looking forward for new version :) |
Thank you for the benchmark! I suppose a cursor attribute to track how many columns were bound would be useful. I've rewritten GetUUID to build the tuple manually instead of using Py_BuildValue, since that was behaving in funny ways. Now binding works for that too. I've also tried another approach to integrate SQLBindCol in the BindCol_rowwise branch. The idea is that since we need to pre calculate all of the c types/buffer sizes for SQLBindCol anyway, we can store it inside ColumnInfo and abstract out SQLGetData to GetData from the individual Get* Functions. As a consequence, most of the Get* functions collapse to just a few lines. I think that this way, SQLBindCol and SQLGetData play together in a more natural way, but it's definitely more change to existing code. This way it was also fairly straight forward to implement fetching arrays in case all columns can be bound (using rowwise binding - felt more natural, since we're returning rows). This seems to mostly just affect narrow fetches though and the performance gain definitely less than binding in the first place. Curious to hear any thoughts. |
- Track encoding ctypes and trigger rebinding on changes - Prevent losing rows when switching from fetchmany to fetchone and then triggering a rebind - Add tests for the above - Avoid unnecessary dict copy when checking if rebind is necessary - Minor improvements to diagnostic variables and error handling
- Store metadata like c_types and text encodings in ColumnInfo instead of computing it for every GetData call - compute said metadata in the first call to cur.fetch* and check if it changed before subsequent fetch* calls - Abstract out SQLGetData and the decision to use it or a bound column as well as the null check to GetData from the individual Get* Functions - Use row-wise column binding - Add cursor attributes for diagnostics (bound_columns_count, bound_buffer_rows) and configuration (bind_cell_cap, bind_byte_cap)
When all columns can be bound and the bandwidth is good enough, this should provide a decent performance improvement, as SQLGetData seems to use a bunch of CPU power. I didn't do extensive benchmarks, but in a small scale cloud web app connected to a decently powerful Azure SQL DB (same region), I got up to a 50% lower execution time on a cursor.execute(...).fetchall() call. Probably 30% is more realistic.
Unfortunately the API allows changing native_uuid and output converters between fetch calls, so the code checks for changes and rebinds when necessary. I've added a test for it.
Some config variables that could be added are a cap on total memory that can be allocated for binding and the threshold at which we use SQLGetData for variable length data. But since only one row is bound this might be unnecessary.