Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Propagate reshapes through generics with reduction… #18968

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

IanWood1
Copy link
Contributor

@IanWood1 IanWood1 commented Oct 31, 2024

…(#18857)"

This regresses sdxl int8 perf by increasing the dimensionality of attention ops which messes with the attention spec. Revert this for now and reland once CollapseDimensionsPass can handle attention.

This reverts commit 78481a6.

@IanWood1 IanWood1 force-pushed the revert_prop_through_reduction branch 2 times, most recently from caece27 to ac014c1 Compare October 31, 2024 17:54
@IanWood1
Copy link
Contributor Author

Update: this still should be reverted but @Groverkss is going to also work on making the spec more robust for attention

@ScottTodd ScottTodd removed their request for review October 31, 2024 20:48
@IanWood1 IanWood1 changed the title Revert "Propagate reshapes through generics with reduction iterators … Revert "Propagate reshapes through generics with reduction… Oct 31, 2024
@IanWood1 IanWood1 merged commit 8d3faf8 into iree-org:main Oct 31, 2024
36 checks passed
IanWood1 added a commit to IanWood1/iree that referenced this pull request Nov 12, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Nov 14, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Nov 18, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Nov 18, 2024
giacs-epic pushed a commit to giacs-epic/iree that referenced this pull request Dec 4, 2024
…#18968)

…(iree-org#18857)"

This regresses sdxl int8 perf by increasing the dimensionality of
`attention` ops which messes with the attention spec. Revert this for
now and reland once `CollapseDimensionsPass` can handle attention.

This reverts commit 78481a6.

Signed-off-by: Ian Wood <[email protected]>
Signed-off-by: Giacomo Serafini <[email protected]>
IanWood1 added a commit to IanWood1/iree that referenced this pull request Dec 5, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Dec 5, 2024
IanWood1 added a commit to IanWood1/iree that referenced this pull request Dec 7, 2024
IanWood1 added a commit that referenced this pull request Jan 8, 2025
Reland after fixing sdxl int8 regressions via
#19012.

Running CI revealed further performance regressions that have pending
patches: #19325 and
#19326.

This reverts commit 8d3faf8.

---------

Signed-off-by: Ian Wood <[email protected]>
@MaheshRavishankar
Copy link
Contributor

MaheshRavishankar commented Jan 9, 2025

oh oh, this cause quite a big regression in llama_8b_fp16

At this commit (a5c3879)

-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_prefill_bs4/process_time/real_time              2511 ms         5351 ms            1 items_per_second=0.398218/s
BM_prefill_bs4/process_time/real_time              1593 ms         3223 ms            1 items_per_second=0.62773/s
BM_prefill_bs4/process_time/real_time              1577 ms         3192 ms            1 items_per_second=0.633948/s
BM_prefill_bs4/process_time/real_time              1576 ms         3192 ms            1 items_per_second=0.634657/s
BM_prefill_bs4/process_time/real_time              1575 ms         3191 ms            1 items_per_second=0.634738/s
BM_prefill_bs4/process_time/real_time              1568 ms         3177 ms            1 items_per_second=0.637676/s
BM_prefill_bs4/process_time/real_time              1567 ms         3172 ms            1 items_per_second=0.638253/s
BM_prefill_bs4/process_time/real_time              1572 ms         3184 ms            1 items_per_second=0.636193/s
BM_prefill_bs4/process_time/real_time_mean         1692 ms         3460 ms            8 items_per_second=0.605177/s
BM_prefill_bs4/process_time/real_time_median       1576 ms         3191 ms            8 items_per_second=0.634698/s
BM_prefill_bs4/process_time/real_time_stddev        331 ms          764 ms            8 items_per_second=0.0836861/s
BM_prefill_bs4/process_time/real_time_cv          19.55 %         22.08 %             8 items_per_second=13.83%

Before this commit (80cbf6b)

-------------------------------------------------------------------------------------------------------
Benchmark                                             Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------------------
BM_prefill_bs4/process_time/real_time               960 ms         2209 ms            1 items_per_second=1.04156/s
BM_prefill_bs4/process_time/real_time              45.6 ms         88.7 ms            1 items_per_second=21.9521/s
BM_prefill_bs4/process_time/real_time              46.0 ms         91.8 ms            1 items_per_second=21.7303/s
BM_prefill_bs4/process_time/real_time              59.3 ms          118 ms            1 items_per_second=16.8629/s
BM_prefill_bs4/process_time/real_time              46.1 ms         92.3 ms            1 items_per_second=21.6801/s                                                                                               BM_prefill_bs4/process_time/real_time              46.1 ms         92.6 ms            1 items_per_second=21.6903/s
BM_prefill_bs4/process_time/real_time              46.0 ms         90.5 ms            1 items_per_second=21.7216/s
BM_prefill_bs4/process_time/real_time              46.0 ms         92.4 ms            1 items_per_second=21.7274/s                                                                                               BM_prefill_bs4/process_time/real_time_mean          162 ms          359 ms            8 items_per_second=18.5508/s
BM_prefill_bs4/process_time/real_time_median       46.1 ms         92.4 ms            8 items_per_second=21.706/s
BM_prefill_bs4/process_time/real_time_stddev        323 ms          747 ms            8 items_per_second=7.27908/s
BM_prefill_bs4/process_time/real_time_cv         199.22 %        207.93 %             8 items_per_second=39.24%

@IanWood1
Copy link
Contributor Author

IanWood1 commented Jan 9, 2025

I'd suspect that its causing reduction/matmul ops to get expanded but they can't get collapsed because they are dynamic. Do you have the traces and/or dispatch counts for the runs?

@MaheshRavishankar
Copy link
Contributor

I can give you a full repro in the morning

MaheshRavishankar added a commit to MaheshRavishankar/iree that referenced this pull request Jan 9, 2025
MaheshRavishankar added a commit to MaheshRavishankar/iree that referenced this pull request Jan 9, 2025
MaheshRavishankar added a commit that referenced this pull request Jan 9, 2025
…19647)

This reverts commit a5c3879.

Seems like there is a regression due to dynamic shapes

#18968 (comment)

Signed-off-by: MaheshRavishankar <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants