Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Return softmax of attention layer. #6424

Closed
DouHappy opened this issue Jul 14, 2024 · 3 comments
Closed

[Feature]: Return softmax of attention layer. #6424

DouHappy opened this issue Jul 14, 2024 · 3 comments

Comments

@DouHappy
Copy link
Contributor

🚀 The feature, motivation and pitch

Feature

I'm working on returning the softmax of the LLM's attention layer. As far as I know, only transformers support this feature. But transforms is not efficient enough. I saw "return_softmax" vllm_flash_attn, but seems it can't be used. Would you be able to add this feature?

Motivation

Softmax of attention is important evidence for humans to understand how LLM works. Humans can determine which input token is important for the inference of the output token. And more features can be developed based on this feature such as visualization of LLM evidence.

Pitch

For this RFC in particular, we propose the following changes:

  • Add a new function llm.forward(). This function receives requests to do prefill and return softmax.
  • Modify prefill kernels to support output softmax.

Alternatives

No response

Additional context

No response

@DarkLight1337
Copy link
Member

#6260 may be able to achieve this by returning arbitrary outputs directly from the model.

@AllenDou
Copy link
Contributor

Yes, I also think #6260 could achieve this feature, a small refactor is needed.

@DouHappy
Copy link
Contributor Author

Yes, I also think #6260 could achieve this feature, a small refactor is needed.

Thank you for your excellent work. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants