make_cluster(): subsetting with [
not working
#83
Replies: 4 comments 17 replies
-
Hi Henrik, thanks for spotting this. I would not otherwise have encountered As for hitting the same node, I am not currently aware that this is a feature of clusters - functions such as However, the current state of affairs represents only the initial implementation, so it is possible that what you describe becomes possible down the line. And there are multiple ways to achieve this theoretically. In the meantime, one alternative I thought might work for you, is to do everything in one At the end of the day, I am thinking that it is probably best to focus on |
Beta Was this translation helpful? Give feedback.
-
Hi Henrik, coming back to this. I must have missed your opening line here and just jumped to the detail:
I think that makes perfect sense to enable I just tried something like: library(mirai)
library(parallel)
cl <- make_cluster(2)
setDefaultCluster(cl)
getDefaultCluster()
#> < miraiCluster >
#> - cluster ID: `0`
#> - nodes: 2
#> - active: TRUE
library(future)
plan("cluster")
f <- future({
a <- 7
b <- 3
c <- 2
a * b * c
})
value(f)
#> [1] 42 Do let me know what you think. |
Beta Was this translation helpful? Give feedback.
-
I think one must be extremely careful when making such decisions. It's important to make sure one is not putting a square peg in a round hole in situations where we're gluing major frameworks together. To me, it's not clear if that is happening here and what the shape of the hole is, but it's important to know that before moving forward. For instance, incorrect use affects the maintainer and future of the parallel package, e.g. it might lock them in on a path where they didn't intend to go. And, as I said before, it can hit users negatively out there without their and our knowledge. It could very well be that the parallel API needs refinement and clarification (I guess this discussion shows that). If parallel provides a round hole, and we have a square peg, then we might have to work with parallel to expand on its API. To be clear, I'm re-re-evaluating my view of the parallel API yet again. |
Beta Was this translation helpful? Give feedback.
-
Another attempt at this. The original issue was the print method for a subset cluster giving an error: cl <- mirai::make_cluster(2)
cl_kk <- cl[1]
print(cl_kk)
#> Error in ..[[attr(x, "id")]] :
#> wrong arguments for subsetting an environment I have now implemented in the print method for it to give a more meaningful error instead: cl <- mirai::make_cluster(2)
cl_kk <- cl[1]
print(cl_kk)
#> Error in print.miraiCluster(cl_kk) :
#> cluster of type 'mirai' is not subsettable This should be an improvement. Not much else I can do at this stage, as |
Beta Was this translation helpful? Give feedback.
-
Low-level issue
I attempted:
Looking at:
I see that
cl_kk
lost theid
attribute. To fix this, add:With this, the following works:
The bigger picture
I spotted the above while trying to add support for
plan(cluster, workers = cl)
withcl <- mirai::make_cluster(2)
to future.Part of the generic implementation of
cluster
futures is that it, regardless of actual implementation, communicates with a specific cluster node (cl_kk <- cl[kk]
) multiple times. The gist is (it's done differently):clusterExport(cl_kk, globals)
clusterEvalQ(cl_kk, <expr>)
While doing this, I realized that although I use the same
cl_kk
in both steps, they do not necessarily reach the same cluster node (R process). Here's a minimal reproducible example using just mirai:If you run this a few times, you'll see that despite using the same
cl[kk]
,clusterEvalQ()
ends up running on different R processes;I think this behavior does not meet the expectations on how parallel
cluster
:s should work. Of course, a first, conservative approach would set[.miraiCluster
to give an error sayingmiraiCluster
does not support subsetting, but I think the real goal should be to support being able to hit the same R process when using a single cluster node.Beta Was this translation helpful? Give feedback.
All reactions