-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Distributed cache strategy #5
Comments
Are you already familiar with dask's distributed scheduler at http://distributed.readthedocs.org/en/latest/ ? |
Hello Matt. Yes, I have seen distributed. What I am after is some sort of key value storage for storing results and available on multiple nodes locally. Is it something inside distributed? |
The distributed scheduler maintains distributed data indexed by key in the active memory of multiple nodes. It doesn't persist to disk but easily could by running some function against the data. I would not recommend using distributed as a persistent key-value store or as a database but instead as a staging ground for distributed computations. For persistence I would generally recommend some sort of parallel storage solution like HDFS, Ceph, or a more traditional Posix file system. It tends to be fairly easy to write ingest functions to read from these with In [1]: from distributed import Executor
In [2]: e = Executor('127.0.0.1:8786')
In [3]: futures = e.scatter(range(10))
In [4]: futures
Out[4]:
[<Future: status: finished, key: f1b37646b1331806c57001fca979724e>,
<Future: status: finished, key: 8068024105615db5ba4cbd263d42a6db>,
<Future: status: finished, key: d1d6feb3b70239a6081e64b7272d18c8>,
<Future: status: finished, key: 613f9ec13ceecaed260a81ade5f37311>,
<Future: status: finished, key: 205604040f767e399cb8b3c9322abdaa>,
<Future: status: finished, key: 7b4b5c735acdd7ee3b2fafd41bc5be78>,
<Future: status: finished, key: 3ab29504c46cb29e851081c84b5e4249>,
<Future: status: finished, key: f7aa9090f9d4af3b10066e98993388de>,
<Future: status: finished, key: 25e70ec51f367dbbff88d67a9684a5a6>,
<Future: status: finished, key: 42cb136fea0b4175909749d31c9a6c84>]
In [5]: e.has_what()
Out[5]:
{'192.168.1.141:33088': ['d1d6feb3b70239a6081e64b7272d18c8',
'25e70ec51f367dbbff88d67a9684a5a6',
'8068024105615db5ba4cbd263d42a6db',
'613f9ec13ceecaed260a81ade5f37311',
'f7aa9090f9d4af3b10066e98993388de',
'205604040f767e399cb8b3c9322abdaa',
'7b4b5c735acdd7ee3b2fafd41bc5be78',
'3ab29504c46cb29e851081c84b5e4249'],
'192.168.1.141:52787': ['f1b37646b1331806c57001fca979724e',
'42cb136fea0b4175909749d31c9a6c84']}
In [6]: def inc(x):
return x + 1
...:
In [7]: futures2 = e.map(inc, futures)
In [8]: e.has_what()
Out[8]:
{'192.168.1.141:33088': ['inc-6a9bbed6c5c042aac23ac3739b3080ab',
'inc-80a8293859f4e85484b3fa9f5ae37e66',
'3ab29504c46cb29e851081c84b5e4249',
'inc-a684480530c8117e48025ba71b62a8fb',
'25e70ec51f367dbbff88d67a9684a5a6',
'd1d6feb3b70239a6081e64b7272d18c8',
'inc-9824ed6eb573348151189779d5fa084a',
'8068024105615db5ba4cbd263d42a6db',
'613f9ec13ceecaed260a81ade5f37311',
'inc-585e542b4025afe01b51bbab1e738cf6',
'f7aa9090f9d4af3b10066e98993388de',
'inc-b7ea7bc3d40dfc9dbaaab71d5a9f7a0a',
'inc-7348f0bb324a145903cbe7c249e062e4',
'205604040f767e399cb8b3c9322abdaa',
'7b4b5c735acdd7ee3b2fafd41bc5be78',
'inc-4ade3c6e95e45a1afc31f887a57733f5'],
'192.168.1.141:52787': ['f1b37646b1331806c57001fca979724e',
'42cb136fea0b4175909749d31c9a6c84',
'inc-b9caf9d500d82c4b712e016a1263da32',
'inc-80d3fd1f19423d863391d5d468b29e54']} |
Thank you. I was thinking about plugging something like tarantool or Riak. |
That could be a very fun combination |
Matt, I am interested in your opinition, I was inspired by http://mxnet.readthedocs.org/en/latest/distributed_training.html#how-to-write-a-distributed-program-on-mxnet |
Sorry to necro this old thread, but figured this might be worth mentioning. Zarr recently added a Redis store ( zarr-developers/zarr-python#372 ), which is an in-memory key-value store. It implements the |
Hello Fellows,
I am looking at using dask for distributed computation and I was wondering what is your strategy to make cachey into horizontally scalable cache and common calculation space?
Regards,
Alex
The text was updated successfully, but these errors were encountered: