Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addressing the case when output region for repeat operation is too big #386

Open
wants to merge 2 commits into
base: branch-24.03
Choose a base branch
from

Conversation

ipdemes
Copy link
Contributor

@ipdemes ipdemes commented May 31, 2022

The 'slices' implementation is taken from Cupy

)
if repeats < 0:
return ValueError(
"'repeats' should not be negative: {}".format(repeats)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use f-strings in new code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and use raise to raise an exception instead of return

# check if size of the output array is less 8GB. In this case we can
# use output regions, otherwise we will use statcally allocated
# array
if size < 8589934592 / 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of comments about this, going from lower- to higher-level:

This seems to be testing for 4GB, not 8GB?

This should be a named constant, and ideally written as 8 * (2 ** 30), so it's obvious we're talking about 8GB.

This limit is not considering the available memory. 8GB may be too large or too little depending on the memory. This number should probably be a percentage of the relevant available memory.

This is considering the full size of the array, not the size of each chunk. E.g. 16GB may be totally fine if split across 8 GPUs.

It seems to me that the only real decision we're making here is whether to perform the operation using an eager output or a deferred output. Therefore, we want to also be querying the (relative) sizes of the eager and deferred pools. Ideally we would also consider the current/projected load on each pool, which is not possible right now, but might be possible in the future, if legate.core takes over more instance management responsibilities.

Finally, AFAIK the unification of eager and deferred pools is on the Legion roadmap. If that happens, then we could safely always use the more efficient eager implementation. @lightsighter how far in the future do you think this is? If nobody has complained about this issue, we may want to wait until unification lands.

Comment on lines +2126 to +2130
for i in range(a._shape[axis]):
a_index[axis] = slice(i, i + 1)
res_index[axis] = slice(offset, offset + repeats)
result[res_index] = array[a_index]
offset += repeats
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. We are emitting a separate operation for each slice. It would be more efficient if we could manually partition the result array on axis using a custom coloring, following a partitioning of a on the same axis:

p_a = axis_equal_partition(a, axis, colors)
for c in colors:
  result_coloring[c] = Rect(
    lo=p_a[c].domain.lo * repeats,
    hi=p_a[c].domain.hi * repeats
  )
p_result = partition_by_coloring(result, result_coloring)
for c in colors:
  repeat(p_result[c], p_a[c])  # index launch

However, this would require support for manual-coloring partitioning from the core. @magnatelee is this something that's reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think anything that does coloring would be any more scalable than the original code. and it is in some sense worse as it takes away the core's ability to reason about the partitioning. for example, if the core saw multiple tasks operating on disjoint parts of the same array, it could potentially partition and map them in a way that those tasks are distributed in a balanced manner.

@marcinz
Copy link
Collaborator

marcinz commented Jan 26, 2023

@ipdemes What is the status of this PR?

@marcinz marcinz changed the base branch from branch-22.07 to branch-23.07 May 18, 2023 20:31
@marcinz marcinz changed the base branch from branch-23.07 to branch-23.09 July 18, 2023 15:45
@marcinz marcinz changed the base branch from branch-23.09 to branch-23.11 September 26, 2023 00:39
@marcinz marcinz changed the base branch from branch-23.11 to branch-24.01 November 9, 2023 17:15
@marcinz marcinz changed the base branch from branch-24.01 to branch-24.03 February 22, 2024 01:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants