Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support writing multiple dataframes/objects to the same pin #311

Closed
dbkegley opened this issue Aug 26, 2024 · 4 comments · Fixed by #319
Closed

feat: support writing multiple dataframes/objects to the same pin #311

dbkegley opened this issue Aug 26, 2024 · 4 comments · Fixed by #319
Labels
.enhancement New feature or request

Comments

@dbkegley
Copy link

I spoke to some pins users at posit::conf who are interested in the ability to read/write multiple dataframes to the same pin. The primary use-case for this is when using board_connect. The ACL controls imposed by Connect mean that if a user wants to store >1 related dataframes on Connect then they must use multiple pins. This is cumbersome because they must also maintain ACL's for multiple content items. My recommendation for now is to use groups in Connect and update group membership but YMMV depending on the configured Auth provider in Connect.

I'm not that familiar with how pins stores data but my guess is that some of this is already possible when using the json storage type for Python or the json/rds types for R but the user would need combine their dataframes first.

It would be nice if pins supported APIs for writing multiple dataframes/objects to the same pin. I'm envisioning something like this:

board = pins.board_connect()
board.pin_write({"sales": tidy_sales_data, "other": my_other_dataframe}, "dbkegley/sales-summary", type="parquet")

dfs = board.pin_read("dbkegley/sales-summary")
sales_data = dfs['sales']
other_data = dfs['other']

This would store 2 separate parquet files (or CSVs) under the hood, one for each dataframe.

@isabelizimm
Copy link
Collaborator

isabelizimm commented Aug 30, 2024

Thank you for the report! I know something like this is available on the R side using board.pin_upload() to upload multiple files. Right now, pins for Python has the ability to upload 1 file at a time, but not multiple. I think a reasonable first step would be to implement pin_upload() for multiple files; that would enable people to do something like: board.pin_upload([tidy_sales_data.to_parquet(), other.to_parquet()]).

Does that match what you would expect in this scenario, at least partially?

@dbkegley
Copy link
Author

dbkegley commented Sep 3, 2024

I think that would work! Thanks @isabelizimm

@isabelizimm isabelizimm added the .enhancement New feature or request label Sep 23, 2024
@jeffkeller-einc
Copy link

I second the request for multi-file pins via board.pin_upload(). I often do this on the R side to store a collection of files together (e.g., various related model artifacts).

My workaround for the single-file limitation in Python board.pin_upload() is to tar the files into a single archive and pin that.

It's also worth noting that if you write a multi-file pin from R and attempt to download it from Python with board.pin_download(), you will only get one file. So there is some work there as well.

@isabelizimm
Copy link
Collaborator

It's also worth noting that if you write a multi-file pin from R and attempt to download it from Python with board.pin_download(), you will only get one file. So there is some work there as well.

I'm going to break this out into a separate issue for tracking purposes. Thank you for the feedback on use cases and rough edges!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
@dbkegley @isabelizimm @jeffkeller-einc and others