[Feature]: Return softmax of attention layer. #6424

DouHappy · 2024-07-14T14:36:53Z

🚀 The feature, motivation and pitch

Feature

I'm working on returning the softmax of the LLM's attention layer. As far as I know, only transformers support this feature. But transforms is not efficient enough. I saw "return_softmax" vllm_flash_attn, but seems it can't be used. Would you be able to add this feature?

Motivation

Softmax of attention is important evidence for humans to understand how LLM works. Humans can determine which input token is important for the inference of the output token. And more features can be developed based on this feature such as visualization of LLM evidence.

Pitch

For this RFC in particular, we propose the following changes:

Add a new function llm.forward(). This function receives requests to do prefill and return softmax.
Modify prefill kernels to support output softmax.

Alternatives

No response

Additional context

No response

DarkLight1337 · 2024-07-14T16:24:27Z

#6260 may be able to achieve this by returning arbitrary outputs directly from the model.

AllenDou · 2024-07-15T03:02:30Z

Yes, I also think #6260 could achieve this feature, a small refactor is needed.

DouHappy · 2024-07-15T04:05:22Z

Yes, I also think #6260 could achieve this feature, a small refactor is needed.

Thank you for your excellent work. I will close this issue.

DouHappy added the feature request label Jul 14, 2024

DarkLight1337 mentioned this issue Jul 14, 2024

[Core][Model] Add simple_model_runner and a new model XLMRobertaForSequenceClassification through multimodal interface #6260

Closed

DouHappy closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Return softmax of attention layer. #6424

[Feature]: Return softmax of attention layer. #6424

DouHappy commented Jul 14, 2024

DarkLight1337 commented Jul 14, 2024

AllenDou commented Jul 15, 2024

DouHappy commented Jul 15, 2024

[Feature]: Return softmax of attention layer. #6424

[Feature]: Return softmax of attention layer. #6424

Comments

DouHappy commented Jul 14, 2024

🚀 The feature, motivation and pitch

Feature

Motivation

Pitch

Alternatives

Additional context

DarkLight1337 commented Jul 14, 2024

AllenDou commented Jul 15, 2024

DouHappy commented Jul 15, 2024