Add distributed ndarray #7881

shino16 · 2023-09-26T06:34:02Z

Adds array.DistributedArray to cupyx.distributed.

It provides initial support of

conversion from/to ndarray
element-wise operations (ufunc, ElementwiseKernel)
reduction (max/min/sum/prod)
matrix multiplication (matmul)

in a multi-GPU setting.

emcastillo · 2023-09-28T01:31:07Z

cupy/_core/_reduction.pyx

@@ -581,6 +581,10 @@ cdef class _SimpleReductionKernel(_AbstractReductionKernel):
    def __call__(self, object a, axis=None, dtype=None, _ndarray_base out=None,
                 bint keepdims=False):

+        if hasattr(a, '__cupy_override_reduction_kernel__'):


This is currently only working for SimpleReductionKernel
There are other more generic Reduction kernels, I wonder if we can easily support them?

__cupy_override_reduction_kernel__ has to be called here, before type checks which rule out DistributedArray.
I believe we can support ReductionKernel in the same way, by adding this hook in its __call__ method.

emcastillo · 2023-09-28T01:32:15Z

cupy/_core/core.pyx

@@ -995,6 +995,9 @@ cdef class _ndarray_base:
           :meth:`numpy.ndarray.max`

        """
+        if hasattr(self, '__cupy_override_reduction_kernel__'):


I would only like to have this check in the ReductionKernel machinery. Do you think it should be possible to do that? thanks!

If we remove this check, cupy._core._routines_statistics._ndarray_max tries CUB and cuTENSOR before falling back to cupy.max.
So this check should be done before reaching cupy.max.__call__, I wonder where this check should be placed.
https://github.com/shino16/cupy/blob/main/cupy/_core/_routines_statistics.pyx#L25-L43

That's a good call. Maybe we should rewrite those checks to make it feasible.
Like having an attribute in the ndarray that's overridden by derived classes.
array.support_cub_for_routines

Now DistributedArray overrides all the methods of ndarray including max/min/sum/prod, so those hooks in cupy._core.core.ndarray are removed. Still your idea sounds reasonable.

emcastillo · 2023-09-28T01:33:54Z

cupyx/distributed/_nccl_comm.py

@@ -43,6 +43,18 @@
    _nccl_ops = {}


+def _get_nccl_dtype_and_count(array, count=None):


emcastillo · 2023-09-28T01:34:45Z

cupyx/distributed/array/_array.py

+            shape (tuple of ints): Length of axes.
+            dtype: Data type. It must be an argument of :class:`numpy.dtype`.
+            mode (str or mode object): Mode that determines how overlaps of
+                chunks are interpreted.


I would like a description of the available models in the docstring under a .. note: section :)

mergify · 2023-09-29T06:42:48Z

This pull request is now in conflicts. Could you fix it @shino16? 🙏

emcastillo · 2023-09-29T07:31:34Z

/test mini

emcastillo · 2023-09-29T08:48:32Z

/test mncore

leofang

Quick drive-by comments 🙂

leofang · 2023-10-04T14:00:07Z

examples/distributed/elementwise.py

+INDEX_MAP_A = {
+    0: (slice(200)),  # arr[:200]
+    1: (slice(200, None)),  # arr[200:]
+}
+
+INDEX_MAP_B = {
+    0: (slice(None), slice(None, None, 3)),  # arr[:,  ::2]
+    1: (slice(None), slice(1, None, 2)),  # arr[:, 1::2]
+}


It'd be nice to comment why we need these index maps, currently this sample assumes that users have the concept of sharding, which may or may not be the case.

leofang · 2023-10-04T14:01:37Z

examples/distributed/elementwise.py

+with Device(0):
+    s0 = Stream()
+    s0.use()
+
+with Device(1):
+    s1 = Stream()
+    s1.use()


Quality of life change: This sample assumes there exists 2 GPUs, we need a device count check at the top and do early exit.

leofang · 2023-10-04T14:01:56Z

examples/distributed/matmul.py

+#     0    M/3    M
+#   0 +-----+-----+
+#     | 0 1 |  2  |
+# N/2 +-----+-----+
+#     | 0 3 | 1 2 |
+#   N +-----+-----+


Q: Does this sample require 4 GPUs?

Yeah, we need to add a runtime check here! great catch :)

emcastillo · 2023-10-13T01:00:33Z

cupyx/distributed/array/_array.py

+    corresponding to slices of the original array. Note that one device can
+    hold multiple chunks.
+
+    This direct constructor is designed for internal calls. Users should create


Suggested change

This direct constructor is designed for internal calls. Users should create

This direct constructor is intended as a private API. Users should create

emcastillo · 2023-10-13T01:28:02Z

cupyx/distributed/array/_array.py

+
+        obj._streams = {}
+        obj._comms = comms if comms is not None else {}
+


Several of the assignments here could be moved to init

emcastillo · 2023-10-13T01:33:11Z

cupyx/distributed/array/_array.py

+def distributed_array(
+    array: ArrayLike,
+    index_map: dict[int, Any],
+    mode: str = 'replica',


maybe better to use enum?

emcastillo · 2023-10-13T01:34:30Z

cupyx/distributed/array/_array.py

+
+    if isinstance(array, (numpy.ndarray, ndarray)):
+        if mode != 'replica':
+            array = array.copy()


TODO: If the array is already in the device, making a copy can cause an OOM, in some cases we may be able to copy only the chunk in the device if the array is contiguous.

emcastillo · 2023-10-13T01:45:35Z

cupyx/distributed/array/_chunk.py

+
+
+class _ArrayPlaceholder:
+    # Mocks ndarray


Self note, this is used when we are waiting chunks to arrive, to the current device

emcastillo · 2023-10-13T01:52:03Z

/test full

emcastillo · 2023-10-18T02:14:40Z

@shino16 can you repush your changes and open again the PR? thanks!

kmaehashi · 2023-11-08T06:18:06Z

This pull-request has merged as #7942. Thanks again @shino16 for working on this!

kmaehashi mentioned this pull request Sep 26, 2023

CuPy v13 Release Plan #7555

Open

asi1024 assigned asi1024 and emcastillo Sep 27, 2023

asi1024 added cat:feature New features/APIs prio:medium labels Sep 27, 2023

emcastillo reviewed Sep 28, 2023

View reviewed changes

shino16 force-pushed the main branch from 9169ea3 to 1447905 Compare September 28, 2023 06:50

shino16 mentioned this pull request Sep 28, 2023

Add optional argument device_id=-1 to get_current_stream #7885

Merged

shino16 marked this pull request as ready for review September 30, 2023 15:59

leofang reviewed Oct 4, 2023

View reviewed changes

asi1024 added this to the v13.0.0rc1 milestone Oct 11, 2023

emcastillo reviewed Oct 13, 2023

View reviewed changes

emcastillo closed this Oct 18, 2023

emcastillo force-pushed the main branch from 738478c to 8ddb02d Compare October 18, 2023 01:55

emcastillo mentioned this pull request Oct 18, 2023

[takeover] Add distributed ndarray #7942

Merged

asi1024 modified the milestones: v13.0.0rc1, Closed PRs Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add distributed ndarray #7881

Add distributed ndarray #7881

shino16 commented Sep 26, 2023

emcastillo Sep 28, 2023

shino16 Sep 28, 2023

emcastillo Sep 28, 2023

shino16 Sep 28, 2023 •

edited

Loading

emcastillo Sep 28, 2023

shino16 Sep 29, 2023

emcastillo Sep 28, 2023

emcastillo Sep 28, 2023

mergify bot commented Sep 29, 2023

emcastillo commented Sep 29, 2023

emcastillo commented Sep 29, 2023

leofang left a comment

leofang Oct 4, 2023

leofang Oct 4, 2023

leofang Oct 4, 2023

emcastillo Oct 9, 2023

emcastillo Oct 13, 2023

emcastillo Oct 13, 2023

emcastillo Oct 13, 2023

emcastillo Oct 13, 2023

emcastillo Oct 13, 2023

emcastillo commented Oct 13, 2023

emcastillo commented Oct 18, 2023

kmaehashi commented Nov 8, 2023

		@@ -43,6 +43,18 @@
		_nccl_ops = {}


		def _get_nccl_dtype_and_count(array, count=None):

	This direct constructor is designed for internal calls. Users should create
	This direct constructor is intended as a private API. Users should create


		obj._streams = {}
		obj._comms = comms if comms is not None else {}

Add distributed ndarray #7881

Add distributed ndarray #7881

Conversation

shino16 commented Sep 26, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shino16 Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented Sep 29, 2023

emcastillo commented Sep 29, 2023

emcastillo commented Sep 29, 2023

leofang left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emcastillo commented Oct 13, 2023

emcastillo commented Oct 18, 2023

kmaehashi commented Nov 8, 2023

shino16 Sep 28, 2023 •

edited

Loading