Fix blocks number calculation for Flat PA #269

iboiko-habana · 2024-09-11T09:48:12Z

Fix blocks number calculation for Flat PA via adding empty table_block (#158)

madamczykhabana · 2024-09-17T05:37:19Z

vllm/worker/habana_model_runner.py

+                if len(block_table) == 0:
+                    block_number = 0
+                    block_table = []
+                else:
+                    block_number = block_table[position // self.block_size]
                if block_number == _PAD_BLOCK_ID:
                    slot = next(dummy_slots)


Please combine those two if statements. Something like:

if len(block_table) == 0: block_number = _PAD_BLOCK_ID block_table = [] slot = next(dummy_slots) else: block_number = block_table[position // self.block_size] block_offset = position % self.block_size slot = block_number * self.block_size + block_offset

madamczykhabana

LGTM

hlin99 · 2024-09-19T11:06:33Z

seq_group_metadata_list.extend(
self.create_dummy_seq_group_metadata(0, 0, is_prompt)
for _ in range(batch_size_padding))

this piece of code introduces metadata certation in loop, and observe 10% perf drop. is this code change intentional?

iboiko-habana · 2024-09-20T07:15:58Z

@hlin99 Thanks. Please re-check perf with #301

Fix blocks number calculation for Flat PA via adding empty table_block (HabanaAI#158)

hlin99 · 2024-09-20T10:48:39Z

`` @hlin99 Thanks. Please re-check perf with #301

Unfortunately, performance has not improved, and the data looks identical before and after applying the patch. It seems that dummy creation and list extension are not the root cause of the performance drop. Instead, the issue appears to stem from changes to the dummy metadata, which are affecting subsequent calling path changes.

iboiko-habana added 2 commits September 11, 2024 12:33

Fix blocks number calculation for Flat PA

a4e3722

Add format check

041659d

madamczykhabana requested changes Sep 17, 2024

View reviewed changes

Update habana_model_runner.py

0907f38

madamczykhabana approved these changes Sep 18, 2024

View reviewed changes

madamczykhabana merged commit b62fba8 into habana_main Sep 18, 2024
14 checks passed

iboiko-habana mentioned this pull request Sep 19, 2024

[Bug]: Unexpected decode graph compilation after preemption #158

Closed

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Fix blocks number calculation for Flat PA (HabanaAI#269)

4a182d5

Fix blocks number calculation for Flat PA via adding empty table_block (HabanaAI#158)

zhouyu5 pushed a commit to zhouyu5/vllm-fork that referenced this pull request Sep 20, 2024

Fix blocks number calculation for Flat PA (HabanaAI#269)

5999224

Fix blocks number calculation for Flat PA via adding empty table_block (HabanaAI#158)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix blocks number calculation for Flat PA #269

Fix blocks number calculation for Flat PA #269

iboiko-habana commented Sep 11, 2024

madamczykhabana Sep 17, 2024

madamczykhabana left a comment

hlin99 commented Sep 19, 2024

iboiko-habana commented Sep 20, 2024

hlin99 commented Sep 20, 2024 •

edited

Loading

Fix blocks number calculation for Flat PA #269

Fix blocks number calculation for Flat PA #269

Conversation

iboiko-habana commented Sep 11, 2024

madamczykhabana Sep 17, 2024

Choose a reason for hiding this comment

madamczykhabana left a comment

Choose a reason for hiding this comment

hlin99 commented Sep 19, 2024

iboiko-habana commented Sep 20, 2024

hlin99 commented Sep 20, 2024 • edited Loading

hlin99 commented Sep 20, 2024 •

edited

Loading