-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
execute
does not kick off query on Trino Server
#232
Comments
It's expected behaviour at this point since the Trino REST API won't send/execute query until the A possible solution to this is to use thread pool to keep calling fetch on the resource and populate a buffer - then the Cursor's fetch methods can operate on the buffer instead of actual results. That has the problem that even for clients who only want to read few rows at a time the buffer will still occupy memory and depending on implementation lead into OOM issues with large result sets. |
Appreciate the quick response and clarification! I more or less implemented your suggested solution. Agree that it's not ideal but it works well enough for our use case. Thanks again. Feel free to close. |
I actually agree with @benrifkind. This is counter-intuitive and seems definitely not a standard behaviour of other dbapi implementations. The Trino REST API should be abstracted as much as possible through a standard and intuitive dbapi implementation. dbapi protocol is not the same as Trino protocol. It is true that dbapi doesn't prescribe the return value of execute, however it makes sense to follow the other dbms so that we stay compatible with abstraction frameworks like sqlalchemy. Note that this behavior also causes the flaky sqlalchemy tests. sqlalchemy expects execute to block until the DML operation completes and only expects result sets to be returned through the RETURNING clause. See sqlalchemy source code.
Some other dbapi examples: sqlite
No need to fetch the result of a DML query. psycopg
Again no need to call fetch* functions mysql
Again no need for fetch* cc @hovaesco |
I do mean that we should change the behaviour (and that's why didn't close this issue yet) to hide the protocol details from user but it involves some decisions and more work than you point out here:
Note that it also requires change in the engine to not return result sets in some cases. |
I think all this decisions are not really relevant to the issue but rather separate improvements. The double buffer introduced in #220 has nothing to do with what is done on The proposed fix for this issue is not different from what was there before in a sense that we just do a
This is two layered: results are buffered on the server until the query abandonment limit kicks in. In the client #220 introduces a double buffer similar to the JDBC implementation. It is up to the user to keep calling fetch or cancel the query.
How many rows that reside in the buffer depends on first what is being returned by the Trino in one call to the REST api and second on if we want to split result retrieval from serialization. On the first point I don't think this is part of this change. Currently Trino doesn't support setting fetchsize. On the second point see the next question.
I think background thread(s) is probably mostly useful to parallelise the results retrieval in the |
Expected behavior
The call to
client.execute
does not start the query on the Trino cluster.Actual behavior
The query only starts on the Trino cluster once
client.fetch
is called.Steps To Reproduce
Not sure if this is an issue or just expected behavior but I noticed that the query is not actually run by the Trino cluster until data is requested by the client.
So this code will not actually run the query
Nothing happens on the Trino server at this point.
The query only runs when we call fetch. This line is what kicks off the query.
Basically it seems like the initial post request does not actual kick off the query. This seems unintuitive to me. Since I would have thought that execute would actually run the query. As an example if I am trying to insert data into a table, I don't feel the need to call
fetch
because there is no data to fetch. I just want the query to run and I want to be able to wait until it is successful. I guess I'm wondering what the reasoning is behind the current behavior. I believe the behavior of PyHive was to actually start the query whenexecute
was called. Thanks!Log output
No response
Operating System
macOS Monterey
Trino Python client version
0.309.0
Trino Server version
388
Python version
3.8.6
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: