DaskR prototype with rpy2 and reticulate #2254
Labels
discussion
Discussing a topic with no specific actions yet
enhancement
Improve existing functionality or make things work better
@quasiben has been playing with Dask and R with reticulate and rpy2 with the objective of providing Dask's concurrent.futures API to R (from which they could presumably build other systems). This is somewhat tricky because you have both a Python and R session living side-by-side and need to move things between them from time to time (hopefully infrequently) using reticulate or rpy2. There are a variety of ways to do this, I thought I'd lay out the way that makes the most sense to me.
From R we need ways to get proxy objects in Python that point to our local objects in R without doing any clever/lossy conversion (like R dataframes to Pandas dataframes). Below I use rpy2 as this proxy object, but anything would do.
From within Python we need ways to serialize and deserialize these proxy objects, presumably this involves calling R's
serialize
function on the R side and then bringing those bytes back to Python, and then doing the reverse with deserialize.On the Dask side we need to implement the following interface for
rpy2
objects: https://distributed.readthedocs.io/en/latest/serialization.html#dask-serialization-familyAlso on the Python side we need to implement the
sizeof
protocol` to quickly estimate how large an object is in bytesWe'll need to develop a basic API in R. If we're starting with the concurrent.futures interface (which seems simplest) then this probably means
submit
,gather
,as_completed
,wait
, andmap
(is anything else critical?). We should check in with people familiar with R though to learn if there is a more native API that R users would find more familiar.The text was updated successfully, but these errors were encountered: