-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less Sampler Allocations #735
Less Sampler Allocations #735
Conversation
…g passing in a temporary blob of memory to work in. Using this new method in `LLamaContext` and `BaseSamplingPipeline`. - Using `Guidance` method in guidance example, instead of low level one working directly on logits. - Fixed `Guidance` method passing incorrectly sized span.
@Lyrcaxis I'd appreciate your review on this since you've been looking at sampling things recently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall looks good except two concerns. It seems that this PR will obviously improve the performance, which is inspiring!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Reduced the number of allocations required for sampling, by allowing passing in a temporary blob of memory to work in.
LLamaContext
andBaseSamplingPipeline
.Guidance
method in guidance example, instead of low level one working directly on logits.Guidance
method passing incorrectly sized span.