Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ANPD (3x speedup, lossless) #6813

Closed
trudnorx opened this issue Apr 21, 2024 · 6 comments
Closed

Implement ANPD (3x speedup, lossless) #6813

trudnorx opened this issue Apr 21, 2024 · 6 comments
Labels

Comments

@trudnorx
Copy link

A new paper has described ANPD.
According to the paper, ANPD can speed up a LLM by 3x, without any drop in generation quality.
The paper also lists multiple advantages of ANPD over speculative techniques that may already be found in llama.cpp.

@arnfaldur
Copy link

Work based on the same principle has been ongoing here #5479
The authors of this paper describe more success than seems to have been found in that PR.
The N-gram model described in the paper is a little different to what the PR has revolved around.

I will try to adjust the existing speculative sampling code to match that of the paper to see if it results in an improvement.
I have not worked on llama.cpp before so don't hold your breath for any results.

Copy link
Contributor

github-actions bot commented Jun 8, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Jun 8, 2024
@trudnorx
Copy link
Author

trudnorx commented Sep 3, 2024

@ggerganov Can you reopen this and set it to not be auto-closed?

@ggerganov
Copy link
Member

We can reopen if there is any indication of progress. Otherwise, the issue is indeed stale so no need to keep it open.

@trudnorx
Copy link
Author

trudnorx commented Sep 19, 2024

Another big advantage of ANPD over the speculative method used by llamacpp is that it doesn't need any extra memory compared to non-speculative generation. ANPD makes it unnecessary to use a draft model that would lead to lower drafting quality and limit the speculative performance boost depending on the compatibility between it and the target model. This increases speed compared to different speculative techniques. The paper's authors cite a >2x speed improvement with one test case in relation to a 2023 technique that I believe already had advantages over llamacpp's technique.

@ggerganov I think it may be best to keep issues which could lead to a significant improvement open, categorized, and prioritized in some way, depending on the size of that potential improvement, as determined by available research. That would allow them to be better differentiated from a big list of closed issues which don't matter much, or won't be fixed.

That is not to say anyone should be forced to work on them, but doing this could keep ideas better organized. It'd be better than what happens now. There have been some feature requests based on new AI research the discussion of which has been all over the place, spanning multiple issues, some open and others closed, with inconsistent timing, intruding on threads for other issues, and involving multiple threads even exclusively centered on one and the same issue. Which includes discussion on ongoing implementation progress, implementation theory, general discussion of the research, speculation, and more. The way these issues are tracked could have contributed to that.

So I believe that either what I propose, or some different system for more organized idea discussion and tracking should be used.

@ggerganov
Copy link
Member

@ggerganov I think it may be best to keep issues which could lead to a significant improvement open, categorized, and prioritized in some way, depending on the size of that potential improvement, as determined by available research. That would allow them to be better differentiated from a big list of closed issues which don't matter much, or won't be fixed.

Determining the potential improvements from an approach, even if it is published research, is still very subjective and the baselines vary dramatically. So I don't think such categorization is really possible.

So I believe that either what I propose, or some different system for more organized idea discussion and tracking should be used.

Yes, if there are volunteers to help triaging in some way - welcome to start doing it. I don't feel strong about the current issue management. The main advantage of closing stale issues is to filter out topics that are not active and focus on high-priority things such as bugs and feature requests.

From my PoV, I don't see any advantage to open this stale issue as I don't think anyone is working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants