addressing the case when output region for repeat operation is too big #386

ipdemes · 2022-05-31T21:09:36Z

The 'slices' implementation is taken from Cupy

manopapad · 2022-06-01T17:46:10Z

cunumeric/module.py

-        )
+        if repeats < 0:
+            return ValueError(
+                "'repeats' should not be negative: {}".format(repeats)


Please use f-strings in new code

and use raise to raise an exception instead of return

manopapad · 2022-06-01T18:33:31Z

cunumeric/module.py

+        # check if size of the output array is less 8GB. In this case we can
+        # use output regions, otherwise we will use statcally allocated
+        # array
+        if size < 8589934592 / 2:


A bunch of comments about this, going from lower- to higher-level:

This seems to be testing for 4GB, not 8GB?

This should be a named constant, and ideally written as 8 * (2 ** 30), so it's obvious we're talking about 8GB.

This limit is not considering the available memory. 8GB may be too large or too little depending on the memory. This number should probably be a percentage of the relevant available memory.

This is considering the full size of the array, not the size of each chunk. E.g. 16GB may be totally fine if split across 8 GPUs.

It seems to me that the only real decision we're making here is whether to perform the operation using an eager output or a deferred output. Therefore, we want to also be querying the (relative) sizes of the eager and deferred pools. Ideally we would also consider the current/projected load on each pool, which is not possible right now, but might be possible in the future, if legate.core takes over more instance management responsibilities.

Finally, AFAIK the unification of eager and deferred pools is on the Legion roadmap. If that happens, then we could safely always use the more efficient eager implementation. @lightsighter how far in the future do you think this is? If nobody has complained about this issue, we may want to wait until unification lands.

manopapad · 2022-06-01T18:47:10Z

cunumeric/module.py

+            for i in range(a._shape[axis]):
+                a_index[axis] = slice(i, i + 1)
+                res_index[axis] = slice(offset, offset + repeats)
+                result[res_index] = array[a_index]
+                offset += repeats


I don't love this. We are emitting a separate operation for each slice. It would be more efficient if we could manually partition the result array on axis using a custom coloring, following a partitioning of a on the same axis:

p_a = axis_equal_partition(a, axis, colors) for c in colors: result_coloring[c] = Rect( lo=p_a[c].domain.lo * repeats, hi=p_a[c].domain.hi * repeats ) p_result = partition_by_coloring(result, result_coloring) for c in colors: repeat(p_result[c], p_a[c]) # index launch

However, this would require support for manual-coloring partitioning from the core. @magnatelee is this something that's reasonable?

I don't think anything that does coloring would be any more scalable than the original code. and it is in some sense worse as it takes away the core's ability to reason about the partitioning. for example, if the core saw multiple tasks operating on disjoint parts of the same array, it could potentially partition and map them in a way that those tasks are distributed in a balanced manner.

marcinz · 2023-01-26T01:03:07Z

@ipdemes What is the status of this PR?

addressing the case when output region for repeat operation is too big

4b13a6c

ipdemes force-pushed the repeat branch from 17db7be to 4b13a6c Compare May 31, 2022 21:12

ipdemes requested a review from magnatelee May 31, 2022 21:14

manopapad reviewed Jun 1, 2022

View reviewed changes

replacing return with raise

4f02f39

marcinz changed the base branch from branch-22.07 to branch-23.07 May 18, 2023 20:31

marcinz changed the base branch from branch-23.07 to branch-23.09 July 18, 2023 15:45

marcinz changed the base branch from branch-23.09 to branch-23.11 September 26, 2023 00:39

marcinz changed the base branch from branch-23.11 to branch-24.01 November 9, 2023 17:15

marcinz changed the base branch from branch-24.01 to branch-24.03 February 22, 2024 01:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

addressing the case when output region for repeat operation is too big #386

addressing the case when output region for repeat operation is too big #386

ipdemes commented May 31, 2022

manopapad Jun 1, 2022

magnatelee Jun 22, 2022

manopapad Jun 1, 2022

manopapad Jun 1, 2022

magnatelee Jun 22, 2022

marcinz commented Jan 26, 2023

addressing the case when output region for repeat operation is too big #386

Are you sure you want to change the base?

addressing the case when output region for repeat operation is too big #386

Conversation

ipdemes commented May 31, 2022

manopapad Jun 1, 2022

Choose a reason for hiding this comment

magnatelee Jun 22, 2022

Choose a reason for hiding this comment

manopapad Jun 1, 2022

Choose a reason for hiding this comment

manopapad Jun 1, 2022

Choose a reason for hiding this comment

magnatelee Jun 22, 2022

Choose a reason for hiding this comment

marcinz commented Jan 26, 2023