Features/560 reshape speedup #1125

ClaudiaComito · 2023-03-20T05:24:57Z

Description

While testing Heat+torch 2.0 on Apple MPS #1053 I noticed a major bottleneck in ht.reshape even on 1 process. We had been discussing this some time ago, #560, #874

The main slow-down arises in setting up the Alltoallv communication necessary when the input DNDarray is distributed along non-zero axes (ht.reshape(x, new_shape, new_split) with x.split > 0 and potentially new_split >0).

I ran some experiments and realized that resplitting in place (input split to 0, and then if necessary 0 to new_split) while inelegant still performs much better than the current implementation. This PR implements a pragmatic solution based on dndarray.resplit_ and dndarray.redistribute_.

~~Below some plots from tests on my laptop - matrices are still relatively small and 1-process performance is still better than 2-processes, will replace with scaling tests on HDF ML later.~~

UPDATED

Below some tests run on HDF ML, CPU only, 2 CPUs per node, 12 threads per CPU.
Initial matrix: A = ht.zeros((1000, size), split=key[0])
Timed operation: B = ht.reshape(A, (10000000, -1), new_split=key[1])
Sampled sizes: sizes = [500000*n for n in [1, 2, 4, 8, 16, 32, 64, 128, 256]]
Sampled split combinations: [[0, 0], [1, 1], [1, 0], [0, 1]] (only showing plots for split combi 0-0, 1-1)
Minimum speed-up: ~8.5 for split 0-0, ~3 for split 1-1 which requires more communication.

Issue/s resolved: #560, #874

Changes proposed:

replace Alltoallv call with resplitting in place when necessary
updated documentation

Type of change

Performance enhancement

Memory requirements

In progress

Performance

In progress, also see plots

Due Diligence

All split configurations tested
Multiple dtypes tested in relevant functions
Documentation updated (if needed)
Title of PR is suitable for corresponding CHANGELOG entry

Does this change modify the behaviour of other functions? If so, which?

no

codecov · 2023-03-20T05:25:27Z

Codecov Report

Merging #1125 (9bf8442) into main (9f6b2eb) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 9bf8442 differs from pull request most recent head d4a5ede. Consider uploading reports for the commit d4a5ede to get more accurate results

@@            Coverage Diff             @@
##             main    #1125      +/-   ##
==========================================
+ Coverage   91.79%   91.81%   +0.01%     
==========================================
  Files          72       72              
  Lines       10497    10490       -7     
==========================================
- Hits         9636     9631       -5     
+ Misses        861      859       -2

Flag	Coverage Δ
unit	`91.81% <100.00%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
heat/core/dndarray.py	`96.91% <ø> (ø)`
heat/core/manipulations.py	`99.01% <100.00%> (+0.14%)`	⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

ghost · 2023-03-20T05:26:00Z

👇 Click on the image for a new way to code review

Legend

github-advanced-security · 2023-03-20T05:29:34Z

You have successfully added a new CodeQL configuration /language:python. As part of the setup process, we have scanned this repository and found no existing alerts. In the future, you will see all code scanning alerts on the repository Security tab.

mrfh92

These changes look fine to me.

The base branch was changed.

github-actions · 2023-03-29T10:18:32Z

Thank you for the PR!

github-actions · 2023-04-17T02:59:02Z

Thank you for the PR!

github-actions · 2023-04-17T11:51:36Z

Thank you for the PR!

mrfh92

Review for PR#560 "reshape speedup"

This PR contains some improvements for the performance of reshape. The plots indicate a significant improvement of "new" compared to "main" on up to 2 cores.

In essence, the changes of this PR can be devided into two categories: first, int-datatype is replaced by the more appropriate int64 at some places; second, the actual reshape is modified. As far as I can judge, the code is correct and well commented. However, I would like to recomend further numerical experiments on more cores (if not already done so far) in order to ensure that the proposed idea "resplit -> reshape in place -> resplit" (thats so far I understood this) also scales better or at least equal to the implementation in "main".

mrfh92 · 2023-04-18T11:08:50Z

I can run the additional scaling tests

ClaudiaComito · 2023-04-18T11:42:46Z

Hi @mrfh92 , thanks for the review, I actually ran the scaling tests of this implementation. I will run comparison tests with the released reshape (main branch) as well.

mrfh92 · 2023-04-18T12:34:25Z

Strong scaling for a short-fat matrix and split=0,1

Strong scaling for different shapes from short-fat to tall-skinny (split=1 only)

mrfh92 · 2023-04-18T12:37:34Z

It looks like that the "old" reshape is significantly slower than the "new" one (in particular for a small number of processes), but the "old" reshape scales well when increasing the number of processes; the "new" reshape does not seem to scale well (I believe that is due to the usage of resplit which will always have runtime proportional to the number of processes...)

Nevertheless, this does not necessarily imply that the "new" reshape is not an improvement compared to the "old" one; in contrary, the "new" version is that much more faster for small number of processes that it stil outperforms the "old" one there. However, there we should determine whether there is some break-even point at which the "old" one becomes better.

ClaudiaComito · 2023-04-24T09:17:42Z

It looks like that the "old" reshape is significantly slower than the "new" one (in particular for a small number of processes), but the "old" reshape scales well when increasing the number of processes; the "new" reshape does not seem to scale well (I believe that is due to the usage of resplit which will always have runtime proportional to the number of processes...)

Nevertheless, this does not necessarily imply that the "new" reshape is not an improvement compared to the "old" one; in contrary, the "new" version is that much more faster for small number of processes that it stil outperforms the "old" one there. However, there we should determine whether there is some break-even point at which the "old" one becomes better.

Thanks a lot @mrfh92 for all the tests. I went back and finished the tests I had started on the cluster, adding test runs on the release (main branch) version of reshape. I've updated the PR description with plots and more details on the experiments.

github-actions · 2023-04-24T09:20:13Z

Thank you for the PR!

github-actions · 2023-04-24T15:42:52Z

Thank you for the PR!

github-actions · 2023-04-27T03:21:46Z

Thank you for the PR!

github-actions · 2023-05-08T07:55:08Z

Thank you for the PR!

…long

github-actions · 2023-05-09T08:43:26Z

Thank you for the PR!

mrfh92

The new plots in the PR description show that my worries on possible performance degradation w.r.t. scalability were actually unjustified. Therefore, I now recommend merging if the CI runs through.

ClaudiaComito added 3 commits March 19, 2023 11:40

reshape speed up first draft

0d7eec2

lazy solution for non-zero split

d243346

remove dead code

8d8fd27

remove test change

f51492a

ClaudiaComito requested a review from coquelin77 March 20, 2023 07:41

ClaudiaComito marked this pull request as ready for review March 20, 2023 07:42

ClaudiaComito requested review from mtar and mrfh92 March 20, 2023 07:45

ClaudiaComito mentioned this pull request Mar 20, 2023

Cut down memory requirements for same-split reshape where possible #873

Closed

4 tasks

ClaudiaComito added manipulations memory footprint labels Mar 20, 2023

double precision for lshape_map, target_map

482aa43

mrfh92 previously approved these changes Mar 23, 2023

View reviewed changes

ClaudiaComito added this to the 1.3.0 milestone Mar 29, 2023

ClaudiaComito changed the base branch from release/1.2.x to main March 29, 2023 10:13

Merge branch 'main' into features/560-reshape-speedup

a624aac

ClaudiaComito added 2 commits April 17, 2023 04:33

Merge branch 'main' into features/560-reshape-speedup

fde1c72

expand docs

d92831f

mrfh92 self-assigned this Apr 17, 2023

Merge branch 'main' into features/560-reshape-speedup

968f1c1

mrfh92 requested changes Apr 18, 2023

View reviewed changes

mrfh92 assigned ClaudiaComito and unassigned mrfh92 Apr 19, 2023

Merge branch 'main' into features/560-reshape-speedup

dd1f5e4

ClaudiaComito assigned mrfh92 and unassigned ClaudiaComito Apr 24, 2023

Merge branch 'main' into features/560-reshape-speedup

9bf8442

Merge branch 'main' into features/560-reshape-speedup

b70e878

Merge branch 'main' into features/560-reshape-speedup

fc67229

JuanPedroGHM added the benchmark PR label May 8, 2023

mrfh92 self-requested a review May 8, 2023 08:36

ClaudiaComito removed the benchmark PR label May 9, 2023

prevent torch.prod() RuntimeError on older GPUs when input is int or …

d4a5ede

…long

JuanPedroGHM added the benchmark PR label May 16, 2023

mrfh92 approved these changes May 19, 2023

View reviewed changes

ClaudiaComito merged commit 8abbc16 into main May 22, 2023

ClaudiaComito mentioned this pull request Jun 13, 2023

reshape memory requirements #874

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Features/560 reshape speedup #1125

Features/560 reshape speedup #1125

ClaudiaComito commented Mar 20, 2023 •

edited

Loading

codecov bot commented Mar 20, 2023 •

edited

Loading

ghost commented Mar 20, 2023 •

edited by ghost

Loading

Legend

github-advanced-security bot commented Mar 20, 2023

mrfh92 left a comment

github-actions bot commented Mar 29, 2023

github-actions bot commented Apr 17, 2023

github-actions bot commented Apr 17, 2023

mrfh92 left a comment

mrfh92 commented Apr 18, 2023

ClaudiaComito commented Apr 18, 2023 •

edited

Loading

mrfh92 commented Apr 18, 2023 •

edited by mtar

Loading

mrfh92 commented Apr 18, 2023 •

edited

Loading

ClaudiaComito commented Apr 24, 2023

github-actions bot commented Apr 24, 2023

github-actions bot commented Apr 24, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented May 8, 2023

github-actions bot commented May 9, 2023

mrfh92 left a comment •

edited

Loading

Features/560 reshape speedup #1125

Features/560 reshape speedup #1125

Conversation

ClaudiaComito commented Mar 20, 2023 • edited Loading

Description

Changes proposed:

Type of change

Memory requirements

Performance

Due Diligence

Does this change modify the behaviour of other functions? If so, which?

codecov bot commented Mar 20, 2023 • edited Loading

Codecov Report

ghost commented Mar 20, 2023 • edited by ghost Loading

Legend

github-advanced-security bot commented Mar 20, 2023

mrfh92 left a comment

Choose a reason for hiding this comment

github-actions bot commented Mar 29, 2023

github-actions bot commented Apr 17, 2023

github-actions bot commented Apr 17, 2023

mrfh92 left a comment

Choose a reason for hiding this comment

mrfh92 commented Apr 18, 2023

ClaudiaComito commented Apr 18, 2023 • edited Loading

mrfh92 commented Apr 18, 2023 • edited by mtar Loading

mrfh92 commented Apr 18, 2023 • edited Loading

ClaudiaComito commented Apr 24, 2023

github-actions bot commented Apr 24, 2023

github-actions bot commented Apr 24, 2023

github-actions bot commented Apr 27, 2023

github-actions bot commented May 8, 2023

github-actions bot commented May 9, 2023

mrfh92 left a comment • edited Loading

Choose a reason for hiding this comment

ClaudiaComito commented Mar 20, 2023 •

edited

Loading

codecov bot commented Mar 20, 2023 •

edited

Loading

ghost commented Mar 20, 2023 •

edited by ghost

Loading

ClaudiaComito commented Apr 18, 2023 •

edited

Loading

mrfh92 commented Apr 18, 2023 •

edited by mtar

Loading

mrfh92 commented Apr 18, 2023 •

edited

Loading

mrfh92 left a comment •

edited

Loading