Experiment rechunking cupy array on DGX #59

mrocklin · 2019-05-29T17:52:36Z

Using the DGX branch, and the tom-ucx distributed branch, I'm playing with rechunking a large 2d array from by row to by column

from dask_cuda import DGX
cluster = DGX(CUDA_VISIBLE_DEVICES=[0,1,2,3])
from dask.distributed import Client
client = Client(cluster)
import cupy, dask.array as da, numpy as np
rs = da.random.RandomState(RandomState=cupy.random.RandomState)
x = rs.random((40000, 40000), chunks=(None, '1 GiB')).persist()
y = x.rechunk(('1 GiB', -1)).persist()

This is a fun experiment because it's a common operation, stresses UCX a bit, and is currently quite fast (when it works).

I've run into the following problems:

Spilling to disk when I run out of device memory (I don't have any spill to disk things on at the moment)

Sometimes I get this error from the dask comm ucx code

  File "/home/nfs/mrocklin/distributed/distributed/comm/ucx.py", line 134, in read
    nframes, = struct.unpack("Q", obj[:8])  # first eight bytes for number of frames

Sometimes CURAND seems to dislike me

distributed.protocol.pickle - INFO - Failed to deserialize b'\x80\x04\x95[\x00\x00\x00\x00\x00\x00\x00\x8c\x10cupy.cuda.curand\x94\x8c\x0bCURANDError\x94\x93\x94\x8c!CURAND_STATUS_PREEXISTING_FAILURE\x94\x85\x94R\x94}\x94\x8c\x06status\x94K\xcasb.'
Traceback (most recent call last):
  File "/home/nfs/mrocklin/distributed/distributed/worker.py", line 3193, in apply_function
    result = function(*args, **kwargs)
  File "/home/nfs/mrocklin/dask/dask/array/random.py", line 411, in _apply_random
    return func(*args, size=size, **kwargs)
  File "/raid/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/cupy/random/generator.py", line 516, in random_sample
    out = self._random_sample_raw(size, dtype)
  File "/raid/mrocklin/miniconda/envs/ucx/lib/python3.7/site-packages/cupy/random/generator.py", line 505, in _random_sample_raw
    func(self._generator, out.data.ptr, out.size)
  File "cupy/cuda/curand.pyx", line 155, in cupy.cuda.curand.generateUniformDouble
  File "cupy/cuda/curand.pyx", line 159, in cupy.cuda.curand.generateUniformDouble
  File "cupy/cuda/curand.pyx", line 83, in cupy.cuda.curand.check_status
cupy.cuda.curand.CURANDError: CURAND_STATUS_PREEXISTING_FAILURE

I don't plan to invesigate these personally at the moment, but I wanted to record the experiment somewhere (and this seems to currently be the best place?). I think that it might be useful to have someone like @madsbk or @pentschev look into this after the UCX and DGX work gets cleaned up a bit more.

pentschev · 2019-05-29T18:19:50Z

@mrocklin during the weekend I was running some SVD benchmarks again and I've come across a very similar issue, that I think may be related to memory spilling. Could you confirm whether workers start to die when they run out of memory? That's exactly what was happening to me.

pentschev · 2019-05-29T18:21:15Z

Sorry, I meant to say when the worker's GPU runs out of memory.

mrocklin · 2019-05-29T18:24:03Z

For the first kind of error, yes. I haven't plugged in the DeviceHostDisk spill mechanism yet.

pentschev · 2019-05-29T18:25:20Z

I was getting those errors even with DeviceHostDisk, unless I had some messed up configuration I didn't notice. That said, it may be that there's a bug and we need to test it better, I will do that soon.

jakirkham · 2019-05-29T18:52:36Z

FWIW I've also had some very similar pains with rechunking (particularly in cases where an array needs to be flattened out). Needed a ravel implementation that avoided rechunking entirely to bypass the issue. Would be happy to try out the current UCX work to see if this helps (or point someone else playing with this to a good test case).

pentschev · 2019-05-29T18:58:34Z

I may be wrong, but I think the issue here is not directly related to rechunking arrays, but rather to running out of device memory.

jakirkham · 2019-05-29T19:00:20Z

Yes, I run out of device memory immediately after starting a computation that follows rechunking. Happy to dive into it further with you if it is of interest.

mrocklin · 2019-05-29T19:08:59Z

Let me clean things up a bit and write down installation instructions. Then it'd be good to have people dive in. My thought was that @pentschev or @madsbk might be a better fit so that you don't get taken away from driving imaging applications.

pentschev · 2019-05-29T19:10:55Z

I will definitely dive into that, since I have a strong feeling that the memory spilling mechanism may not be working properly, or not active at all. How urgent is this for both of you?

mrocklin · 2019-05-29T19:15:14Z

Not urgent. I recommend waiting until tomorrow at least.

…

On Wed, May 29, 2019 at 2:10 PM Peter Andreas Entschev < ***@***.***> wrote: I will definitely dive into that, since I have a strong feeling that the memory spilling mechanism may not be working properly, or not active at all. How urgent is this for both of you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#59?email_source=notifications&email_token=AACKZTGRO4URF4VF5UEXXR3PX3IL7A5CNFSM4HQPFBW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWQLJTA#issuecomment-497071308>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTBSFHHCIPX2USBYFATPX3IL7ANCNFSM4HQPFBWQ> .

pentschev · 2019-05-30T15:36:15Z

This is very likely related to #57, in fact, probably the same bug on device memory spilling.

pentschev · 2019-05-31T19:52:03Z

So I was checking this, and I can't reproduce any cuRAND errors. What I ultimately get instead is an out of memory error:

Traceback (most recent call last):
  File "dask-cuda-59.py", line 10, in <module>
    y.compute()
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/base.py", line 156, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/base.py", line 399, in compute
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/base.py", line 399, in <listcomp>
    return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/array/core.py", line 828, in finalize
    return concatenate3(results)
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/array/core.py", line 3607, in concatenate3
    return _concatenate2(arrays, axes=list(range(x.ndim)))
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/dask/array/core.py", line 228, in _concatenate2
    return concatenate(arrays, axis=axes[0])
  File "/home/nfs/pentschev/miniconda3/envs/rapids-0.7/lib/python3.7/site-packages/cupy/manipulation/join.py", line 49, in concatenate
    return core.concatenate_method(tup, axis)
  File "cupy/core/_routines_manipulation.pyx", line 563, in cupy.core._routines_manipulation.concatenate_method
  File "cupy/core/_routines_manipulation.pyx", line 608, in cupy.core._routines_manipulation.concatenate_method
  File "cupy/core/_routines_manipulation.pyx", line 637, in cupy.core._routines_manipulation._concatenate
  File "cupy/core/core.pyx", line 134, in cupy.core.core.ndarray.__init__
  File "cupy/cuda/memory.pyx", line 518, in cupy.cuda.memory.alloc
  File "cupy/cuda/memory.pyx", line 1085, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 1106, in cupy.cuda.memory.MemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 934, in cupy.cuda.memory.SingleDeviceMemoryPool.malloc
  File "cupy/cuda/memory.pyx", line 949, in cupy.cuda.memory.SingleDeviceMemoryPool._malloc
  File "cupy/cuda/memory.pyx", line 697, in cupy.cuda.memory._try_malloc
cupy.cuda.memory.OutOfMemoryError: out of memory to allocate 12800000000 bytes (total 38400000000 bytes)

After some checking, I was able to confirm that dask-cuda has only ~5GB in the device LRU, all the rest is temporary CuPy memory (over 30GB). I'm not sure what we can do to make such cases to work, nor if we have an option at all. In this particular case, the amount of memory it tries to allocate is exactly the problem size 40000 * 40000 * 8 = 12800000000, which CuPy may be trying to allocate for the final result. But unless we can control CuPy internally to spill device memory, I'm not sure we'll be able to support such problem sizes.

I will think a bit more about this, if you have any suggestions, please let me know.

mrocklin · 2019-05-31T20:23:27Z

To make sure I understand, the temporary CuPy memory here is likely from some sort of memory manager?

pentschev · 2019-05-31T20:36:29Z

No, I tried also disabling it. The temporary memory could be any intermediary buffers needed, for example, concatenation of multiple arrays or any other functions that can't write to input memory (and thus require some additional memory to store output).

pentschev · 2021-01-08T20:52:03Z

There has been great progress on that over the last year or so, I'm closing this as I don't think this is an issue anymore.

pentschev closed this as completed Jan 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment rechunking cupy array on DGX #59

Experiment rechunking cupy array on DGX #59

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

pentschev commented May 29, 2019

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

jakirkham commented May 29, 2019

pentschev commented May 29, 2019

jakirkham commented May 29, 2019

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

mrocklin commented May 29, 2019 via email

pentschev commented May 30, 2019

pentschev commented May 31, 2019

mrocklin commented May 31, 2019

pentschev commented May 31, 2019

pentschev commented Jan 8, 2021

Experiment rechunking cupy array on DGX #59

Experiment rechunking cupy array on DGX #59

Comments

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

pentschev commented May 29, 2019

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

jakirkham commented May 29, 2019

pentschev commented May 29, 2019

jakirkham commented May 29, 2019

mrocklin commented May 29, 2019

pentschev commented May 29, 2019

mrocklin commented May 29, 2019 via email

pentschev commented May 30, 2019

pentschev commented May 31, 2019

mrocklin commented May 31, 2019

pentschev commented May 31, 2019

pentschev commented Jan 8, 2021