Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(bigquery): Remove ReadAPI bypass in executeSelect() #3624

Merged
merged 8 commits into from
Jan 15, 2025

Conversation

whuffman36
Copy link
Contributor

@whuffman36 whuffman36 commented Jan 6, 2025

Currently, the ReadAPI is being bypassed in the executeSelect() function when requests are sent using fast query. This change decides whether to use the ReadAPI based on the query result size, and enables the use of both fast query and ReadAPI together.

This change also updates queryRpc() to use the table data from the first page of the query results instead of the empty results of the unfinished job.

Finally, this change adds a subfield in the QueryStatistics field in the query response to keep track of whether or not the ReadAPI is used. This field is essentially only useful for testing to ensure that another regression is avoided in the future.

Fixes #2764

@product-auto-label product-auto-label bot added size: m Pull request size is medium. api: bigquery Issues related to the googleapis/java-bigquery API. labels Jan 6, 2025
Move readAPI condition check from getExecuteSelectResponse() to
queryRpc(). This allows fast query to be used with ReadAPI.
@product-auto-label product-auto-label bot added size: s Pull request size is small. and removed size: m Pull request size is medium. labels Jan 7, 2025
@whuffman36 whuffman36 marked this pull request as ready for review January 7, 2025 22:04
@whuffman36 whuffman36 requested a review from a team as a code owner January 7, 2025 22:04
@whuffman36 whuffman36 requested review from mrfaizal, shollyman and PhongChuong and removed request for mrfaizal January 7, 2025 22:04
@whuffman36 whuffman36 self-assigned this Jan 7, 2025
@whuffman36
Copy link
Contributor Author

Still working on ideas for a testing strategy to make sure the requests are going to the endpoints that we want, but I wanted to get your opinions on the fix.

A Boolean field is added to keep track of whether or not the high
throughput ReadAPI is used. This is mostly for testing to avoid
another regression in the future.
@product-auto-label product-auto-label bot added size: m Pull request size is medium. and removed size: s Pull request size is small. labels Jan 9, 2025
Copy link
Contributor

@PhongChuong PhongChuong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of comments; overall PR looks good.

@@ -476,22 +476,29 @@ private BigQueryResult queryRpc(
}

// Query finished running and we can paginate all the results
if (results.getJobComplete() && results.getSchema() != null) {
// Results should be read using the high throughput read API if sufficiently large.
boolean shouldUseReadApi =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldUseReadApi is very similar to the method: useReadAPI which is used in getSubsequentQueryResultsWithJob. In addition, even when shouldUseReadApi == true here, the useReadAPI() can return false in the subsequent getSubsequentQueryResultsWithJob call. Perhaps, changing the name to a negative condition is sufficient. E.g., shouldNotUseReadApi (or something else better).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of resultsLargeEnoughForReadApi?

@whuffman36 whuffman36 merged commit fadd992 into main Jan 15, 2025
17 checks passed
@whuffman36 whuffman36 deleted the execute-select branch January 15, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the googleapis/java-bigquery API. size: m Pull request size is medium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reading data using the executeSelect API is slow
2 participants