-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support querying files directly in posit connect with duckdb #197
Comments
So the good news is I'm able to query parquet files now. The bad news is that it seems to be making 1 fs.exists call, and 7 fs.info calls? Because the Posit API requires us to make many API calls to convert the human friendly paths (e.g.
Note that because we have a cache wrapped around the filesystem, 2 of the calls also open the file (likely the first to read the parquet header, and the second to fetch relevant data?). Here's the full log: # fs.exists ----
2023-04-12 12:30:34,755 - pins.rsconnect.fs - DEBUG - exists
2023-04-12 12:30:34,756 - pins.rsconnect.fs - DEBUG - info: michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:34,756 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:34,910 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:35,063 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:35,241 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:35,424 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
# fs.info ----
2023-04-12 12:30:35,597 - pins.rsconnect.fs - DEBUG - info: michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:35,598 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:35,768 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:35,927 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:36,119 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:36,321 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
2023-04-12 12:30:36,500 - pins.cache - INFO - cache file: /Users/machow/Library/Caches/pins-py/rsc_0c1c9f784f62118a2f2361e8cff9105dfe5066d03ce45997d504c6950e8f5b6b/michael.chow+mtcars3/72103/mtcars3.parquet
# fs.info ----
2023-04-12 12:30:36,502 - pins.rsconnect.fs - DEBUG - info: rsc://michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:36,503 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:36,656 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:36,832 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:37,037 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:37,226 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
# fs.info ----
2023-04-12 12:30:37,385 - pins.rsconnect.fs - DEBUG - info: rsc://michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:37,386 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:37,537 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:37,704 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:37,881 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:38,088 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
# fs.info ----
2023-04-12 12:30:38,243 - pins.rsconnect.fs - DEBUG - info: rsc://michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:38,244 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:38,397 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:38,548 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:38,739 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:38,952 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
# fs.info ----
2023-04-12 12:30:39,124 - pins.rsconnect.fs - DEBUG - info: michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:39,125 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:39,286 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:39,436 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:39,622 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:39,821 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
2023-04-12 12:30:39,982 - pins.cache - INFO - cache file: /Users/machow/Library/Caches/pins-py/rsc_0c1c9f784f62118a2f2361e8cff9105dfe5066d03ce45997d504c6950e8f5b6b/michael.chow+mtcars3/72103/mtcars3.parquet
# fs.info ----
2023-04-12 12:30:40,048 - pins.rsconnect.fs - DEBUG - info: rsc://michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:40,048 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:40,193 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:40,344 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:40,532 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:40,736 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {}
# fs.info ----
2023-04-12 12:30:40,909 - pins.rsconnect.fs - DEBUG - info: rsc://michael.chow/mtcars3/72103/mtcars3.parquet
2023-04-12 12:30:40,910 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True}}
2023-04-12 12:30:41,079 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/users -- {'params': {'prefix': 'michael.chow', 'walk_pages': True, 'page_number': 2}}
2023-04-12 12:30:41,232 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content -- {'params': {'owner_guid': 'c31bd134-4d4a-4275-92b1-3e7f8046c03a', 'name': 'mtcars3'}}
2023-04-12 12:30:41,423 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/__api__/v1/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/bundles/72103 -- {}
2023-04-12 12:30:41,615 - pins.rsconnect.api - DEBUG - RSConnect API GET: https://colorado.posit.co/rsc/content/6e1b8ea7-aafb-462a-b644-d4a62951ec85/_rev72103/mtcars3.parquet -- {} |
Since pins-python uses fsspec under the hood, users are able to query pins data directly using duckdb's fsspec integration.
While #193 allows duckdb to query CSV pins on posit connect, parquet files cannot be queried. This is likely because duckdb needs to scan parquet headers.
Below, I provide examples, but first--here is a snippet to enable logging to stdout:
Querying parquet pins on s3 (for reference)
First, here is how you connect to a temporary s3 board, and return info on a file:
Next, we'll add a parquet pin
Finally, we'll query directly in duckdb
Querying parquet in pins
I think the issue has to do with how we're returning info on the file.
The text was updated successfully, but these errors were encountered: