Implement ANPD (3x speedup, lossless) #6813

trudnorx · 2024-04-21T21:02:33Z

A new paper has described ANPD.
According to the paper, ANPD can speed up a LLM by 3x, without any drop in generation quality.
The paper also lists multiple advantages of ANPD over speculative techniques that may already be found in llama.cpp.

arnfaldur · 2024-04-25T00:17:07Z

Work based on the same principle has been ongoing here #5479
The authors of this paper describe more success than seems to have been found in that PR.
The N-gram model described in the paper is a little different to what the PR has revolved around.

I will try to adjust the existing speculative sampling code to match that of the paper to see if it results in an improvement.
I have not worked on llama.cpp before so don't hold your breath for any results.

github-actions · 2024-06-08T01:06:55Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

trudnorx · 2024-09-03T19:05:08Z

@ggerganov Can you reopen this and set it to not be auto-closed?

ggerganov · 2024-09-04T06:37:37Z

We can reopen if there is any indication of progress. Otherwise, the issue is indeed stale so no need to keep it open.

trudnorx · 2024-09-19T02:45:27Z

Another big advantage of ANPD over the speculative method used by llamacpp is that it doesn't need any extra memory compared to non-speculative generation. ANPD makes it unnecessary to use a draft model that would lead to lower drafting quality and limit the speculative performance boost depending on the compatibility between it and the target model. This increases speed compared to different speculative techniques. The paper's authors cite a >2x speed improvement with one test case in relation to a 2023 technique that I believe already had advantages over llamacpp's technique.

@ggerganov I think it may be best to keep issues which could lead to a significant improvement open, categorized, and prioritized in some way, depending on the size of that potential improvement, as determined by available research. That would allow them to be better differentiated from a big list of closed issues which don't matter much, or won't be fixed.

That is not to say anyone should be forced to work on them, but doing this could keep ideas better organized. It'd be better than what happens now. There have been some feature requests based on new AI research the discussion of which has been all over the place, spanning multiple issues, some open and others closed, with inconsistent timing, intruding on threads for other issues, and involving multiple threads even exclusively centered on one and the same issue. Which includes discussion on ongoing implementation progress, implementation theory, general discussion of the research, speculation, and more. The way these issues are tracked could have contributed to that.

So I believe that either what I propose, or some different system for more organized idea discussion and tracking should be used.

ggerganov · 2024-09-19T07:42:07Z

@ggerganov I think it may be best to keep issues which could lead to a significant improvement open, categorized, and prioritized in some way, depending on the size of that potential improvement, as determined by available research. That would allow them to be better differentiated from a big list of closed issues which don't matter much, or won't be fixed.

Determining the potential improvements from an approach, even if it is published research, is still very subjective and the baselines vary dramatically. So I don't think such categorization is really possible.

So I believe that either what I propose, or some different system for more organized idea discussion and tracking should be used.

Yes, if there are volunteers to help triaging in some way - welcome to start doing it. I don't feel strong about the current issue management. The main advantage of closing stale issues is to filter out topics that are not active and focus on high-priority things such as bugs and feature requests.

From my PoV, I don't see any advantage to open this stale issue as I don't think anyone is working on it.

arnfaldur mentioned this issue Apr 25, 2024

Server: enable lookup decoding #6828

Open

github-actions bot added the stale label May 25, 2024

github-actions bot closed this as completed Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement ANPD (3x speedup, lossless) #6813

Implement ANPD (3x speedup, lossless) #6813

trudnorx commented Apr 21, 2024

arnfaldur commented Apr 25, 2024

github-actions bot commented Jun 8, 2024

trudnorx commented Sep 3, 2024

ggerganov commented Sep 4, 2024

trudnorx commented Sep 19, 2024 •

edited

Loading

ggerganov commented Sep 19, 2024

Implement ANPD (3x speedup, lossless) #6813

Implement ANPD (3x speedup, lossless) #6813

Comments

trudnorx commented Apr 21, 2024

arnfaldur commented Apr 25, 2024

github-actions bot commented Jun 8, 2024

trudnorx commented Sep 3, 2024

ggerganov commented Sep 4, 2024

trudnorx commented Sep 19, 2024 • edited Loading

ggerganov commented Sep 19, 2024

trudnorx commented Sep 19, 2024 •

edited

Loading