Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(c/driver/sqlite): Fix parameter binding when inferring types and when retrieving #742

Merged
merged 3 commits into from
Jun 12, 2023

Conversation

krlmlr
Copy link
Collaborator

@krlmlr krlmlr commented Jun 8, 2023

Needs tests on the C side and perhaps also on the R side. Please advise.

Closes #734.

library(adbcdrivermanager)
# pkgload::load_all()

# Use the driver manager to connect to a database
db <- adbc_database_init(adbcsqlite::adbcsqlite(), uri = ":memory:")
con <- adbc_connection_init(db)

# Write a table
flights <- nycflights13::flights
# (timestamp not supported yet)
flights$time_hour <- NULL

stmt <- adbc_statement_init(con, adbc.ingest.target_table = "flights")
adbc_statement_bind(stmt, flights)
adbc_statement_execute_query(stmt)
#> [1] 336776
adbc_statement_release(stmt)

# March flights
stmt <- adbc_statement_init(con)
adbc_statement_set_sql_query(stmt, "SELECT * from flights WHERE month = 3 LIMIT 2")
stream <- nanoarrow::nanoarrow_allocate_array_stream()
adbc_statement_execute_query(stmt, stream)
#> [1] -1


result <- tibble::as_tibble(stream)
adbc_statement_release(stmt)

result
#> # A tibble: 2 × 18
#>    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>   <dbl> <dbl> <dbl>    <dbl>          <dbl>     <dbl>    <dbl>          <dbl>
#> 1  2013     3     1        4           2159       125      318             56
#> 2  2013     3     1       50           2358        52      526            438
#> # ℹ 10 more variables: arr_delay <dbl>, carrier <chr>, flight <dbl>,
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>


# March flights with a parameter, not passing parameter
stmt <- adbc_statement_init(con)
adbc_statement_set_sql_query(stmt, "SELECT * from flights WHERE month = ? LIMIT 2")
stream <- nanoarrow::nanoarrow_allocate_array_stream()
adbc_statement_execute_query(stmt, stream)
#> [1] -1
result <- tibble::as_tibble(stream)
adbc_statement_release(stmt)

result
#> # A tibble: 0 × 18
#> # ℹ 18 variables: year <dbl>, month <dbl>, day <dbl>, dep_time <dbl>,
#> #   sched_dep_time <dbl>, dep_delay <dbl>, arr_time <dbl>,
#> #   sched_arr_time <dbl>, arr_delay <dbl>, carrier <dbl>, flight <dbl>,
#> #   tailnum <dbl>, origin <dbl>, dest <dbl>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>

# March flights with a parameter
stmt <- adbc_statement_init(con)
adbc_statement_set_sql_query(stmt, "SELECT * from flights WHERE month = ? LIMIT 2")
adbc_statement_bind_stream(stmt, data.frame(a = 3))
stream <- nanoarrow::nanoarrow_allocate_array_stream()
adbc_statement_execute_query(stmt, stream)
#> [1] -1
result <- tibble::as_tibble(stream)
adbc_statement_release(stmt)

result
#> # A tibble: 2 × 18
#>    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>   <dbl> <dbl> <dbl>    <dbl>          <dbl>     <dbl>    <dbl>          <dbl>
#> 1  2013     3     1        4           2159       125      318             56
#> 2  2013     3     1       50           2358        52      526            438
#> # ℹ 10 more variables: arr_delay <dbl>, carrier <chr>, flight <dbl>,
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>

# Many March flights with multiple parameters
stmt <- adbc_statement_init(con)
adbc_statement_set_sql_query(stmt, "SELECT * from flights WHERE month = ?")
adbc_statement_bind_stream(stmt, data.frame(a = 2:4))
stream <- nanoarrow::nanoarrow_allocate_array_stream()
adbc_statement_execute_query(stmt, stream)
#> [1] -1
result <- tibble::as_tibble(stream)
adbc_statement_release(stmt)

result
#> # A tibble: 24,951 × 18
#>     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
#>    <dbl> <dbl> <dbl>    <dbl>          <dbl>     <dbl>    <dbl>          <dbl>
#>  1  2013     2     1      456            500        -4      652            648
#>  2  2013     2     1      520            525        -5      816            820
#>  3  2013     2     1      527            530        -3      837            829
#>  4  2013     2     1      532            540        -8     1007           1017
#>  5  2013     2     1      540            540         0      859            850
#>  6  2013     2     1      552            600        -8      714            715
#>  7  2013     2     1      552            600        -8      919            910
#>  8  2013     2     1      552            600        -8      655            709
#>  9  2013     2     1      553            600        -7      833            815
#> 10  2013     2     1      553            600        -7      821            825
#> # ℹ 24,941 more rows
#> # ℹ 10 more variables: arr_delay <dbl>, carrier <chr>, flight <dbl>,
#> #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
#> #   hour <dbl>, minute <dbl>

# Clean up
adbc_connection_release(con)
adbc_database_release(db)

Created on 2023-06-08 with reprex v2.0.2

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jun 8, 2023

What should happen if parameters are required but not provided? What if there's a mismatch (parameter count or parameter names)?

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jun 8, 2023

@lidavidm: Triage rights here and on nanoarrow for Nicolas and me would be great so that we can request reviews.

@lidavidm
Copy link
Member

lidavidm commented Jun 8, 2023

See #743

@lidavidm lidavidm changed the title fix(r): Fix parameter binding when inferring types and when retrieving fix(c/driver/sqlite): Fix parameter binding when inferring types and when retrieving Jun 8, 2023
@lidavidm
Copy link
Member

lidavidm commented Jun 8, 2023

It seems reasonable, what we could do is add a new test to adbc_validation.cc

That said it seems possibly we already have tests that cover this and were written wrongly? Or this is causing tests to fail otherwise

@krlmlr
Copy link
Collaborator Author

krlmlr commented Jun 9, 2023

Interesting test failures. I can take a look at what's causing them, unless you do that first.

@lidavidm
Copy link
Member

Ah, I think what's happening now is that the batch size is also limited by the number of rows emitted from binding a parameter. Before it would (accidentally) aggregate rows from multiple parameters into a single batch.

@lidavidm
Copy link
Member

See krlmlr#1

@lidavidm lidavidm merged commit 0e03c92 into apache:main Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

r: Understand parameter binding
2 participants