Releases: SciSharp/LLamaSharp
0.10.0 - Phi2
Major Changes
- Update binaries feb 2024 by @martindevans in #479
- Add CLBLAST native library to native libraries build by @jasoncouture in #468
- Introduced a new
BatchedExecutor
by @martindevans in #503 - Swapped
StatelessExecutor
to usellama_decode
! by @martindevans in #445 - LLamaToken Struct by @martindevans in #404
Bug Fixes
- KernelMemory EmbeddingMode bug correction by @zsogitbe in #485
- Normalize Embeddings by @martindevans in #507
- StreamingTextDecoder Fix & Tests by @martindevans in #428
- Tokenizer Fixes For Issue 430 by @martindevans in #433
Other Changes
- Use llama instead of libllama in
[DllImport]
by @jasoncouture in #465 - Updated Examples by @vikramvee in #502
- Added new file types to quantisation by @martindevans in #495
- Smaller Unit Test Model by @martindevans in #496
- Using
AddRange
inLLamaEmbedder
by @martindevans in #499 - Small KV Cache Handling Improvements by @martindevans in #500
- Added increment and decrement operators to
LLamaPos
by @martindevans in #501 - Swapped
GetEmbeddings
tollama_decode
by @martindevans in #474 - kv_cache_instance_methods by @martindevans in #454
- Removed
IModelParams
andIContextParams
setters. by @martindevans in #472 - Managed
LLamaBatch
by @martindevans in #442 - Check Model Path Exists by @martindevans in #437
- Model Metadata Loading Cleanup by @martindevans in #438
- Added a check for EOS token in LLamaStatelessExecutor by @martindevans in #434
- Update README.md by @Oceania2018 in #427
- Gpu layer count change by @Kaotic3 in #424
- Improved exceptions in IModelParams for unknown KV override types. by @martindevans in #416
New Contributors
- @Kaotic3 made their first contribution in #424
- @Oceania2018 made their first contribution in #427
- @jasoncouture made their first contribution in #465
- @zsogitbe made their first contribution in #485
- @vikramvee made their first contribution in #502
Full Changelog: 0.9.1...v0.10.0
0.9.1 - Mixtral!
Major Changes
- Rebuilt ChatSession class by @philippjbauer in #344
- Custom Sampling Pipelines by @martindevans in #348
- Updated Binaries December 2023 by @martindevans in #361
- Added
LLamaWeights.Metadata
property by @martindevans in #380
Bug Fixes
- Fix documentation to reflect changes in ChatSession API by @asmirnov82 in #366
- Added missing field to LLamaModelQuantizeParams by @martindevans in #367
- Fix broken references in docs by @asmirnov82 in #378
- Updated & Fixed WebAPI by @scotmcc in #377
- Fixed loading of very large metadata values by @martindevans in #384
- Update compile.yml to fix not building for windows by @edgett in #386
- Metadata Fixes by @martindevans in #385
- Fix typos in SemanticKernel README file by @asmirnov82 in #408
Other Changes
- Context Set Seed by @martindevans in #368
- Update README.md by @martindevans in #335
- ci: fix error in auto-release. by @AsakusaRinne in #334
- Update README.md by @markvantilburg in #339
- 🔧 Refactor Semantic Kernel chat completion implementation by @xbotter in #341
- build(deps): bump xunit.runner.visualstudio from 2.5.4 to 2.5.5 by @dependabot in #353
- build(deps): bump xunit from 2.6.2 to 2.6.3 by @dependabot in #352
- Added AVX and AVX2 to MacOS x86_64 builds by @martindevans in #360
- Upgrade unittest target framework to .NET 8.0 by @xbotter in #358
- Clone Grammar by @martindevans in #370
- Renamed
llama_sample_temperature
tollama_sample_temp
by @martindevans in #369 - Reset Custom Sampling Pipeline by @martindevans in #372
- Improved support for AVX512 by @martindevans in #373
- bump sk to 1.0.1 & km to 0.18 by @xbotter in #356
- build(deps): bump xunit from 2.6.3 to 2.6.4 by @dependabot in #389
- build(deps): bump xunit.runner.visualstudio from 2.5.5 to 2.5.6 by @dependabot in #391
- build(deps): bump Swashbuckle.AspNetCore from 6.4.0 to 6.5.0 by @dependabot in #388
- build(deps): bump Microsoft.KernelMemory.Abstractions from 0.18.231209.1-preview to 0.24.231228.5 by @dependabot in #397
- build(deps): bump Microsoft.KernelMemory.Core and Microsoft.KernelMemory.Abstractions by @dependabot in #396
- Code cleanup driven by R# suggestions by @martindevans in #400
- Removed some unnecessary uses of
unsafe
by @martindevans in #401 - Safer Model Handle Creation by @martindevans in #402
- Extra ModelParams Checking by @martindevans in #403
New Contributors
- @markvantilburg made their first contribution in #339
- @asmirnov82 made their first contribution in #366
- @scotmcc made their first contribution in #377
- @edgett made their first contribution in #386
Thank you so much for all the contributions! 😻
Full Changelog: v0.8.1...0.9.1
v0.8.1 - Major BUG fix and better feature detection
Break changes
- Change
NativeLibraryConfig.Default
toNativeLibraryConfig.Instance
.
Major features and fix
- MinP Sampler by @martindevans in #277
- CPU Feature Detection 2 by @martindevans in #281
- AntipromptProcessor access by @saddam213 in #288
- StreamingTextDecoder in LLamaExecutorBase by @martindevans in #293
- November Binary Update by @martindevans in #316
- Update KernelMemory Package by @xbotter in #325
- fix: Chinese encoding error with gb2312. by @AsakusaRinne in #326
- feat: allow customized search path for native library loading. by @AsakusaRinne in #333
Other changes
- Add targets in Web project by @SignalRT in #286
- Update examples by @xbotter in #295
- dotnet8.0 by @martindevans in #292
- progress_callback in
LLamaModelParams
by @martindevans in #303 - Added Obsolete markings to all
Eval
overloads by @martindevans in #304 - Improved test coverage. by @martindevans in #311
- Removed Obsolete ModelParams Constructor by @martindevans in #312
- Better TensorSplitsCollection Initialisation by @martindevans in #310
- Add DefaultInferenceParams to Kernel Memory by @xbotter in #307
- Added a converter similar to the Open AI one by @futzy314 in #315
New Contributors
Thank you so much for all the contributions!
Full Changelog: v0.8.0...v0.8.1
v0.8.0: performance improvement, cuda feature detection and kernel-memory integration
What's Changed
- fix: binary not copied on MAC platform. by @AsakusaRinne in #238
- docs: add related repos. by @AsakusaRinne in #240
- docs: add example models for v0.7.0. by @AsakusaRinne in #243
- Adapts to SK Kernel Memory by @xbotter in #226
- CodeQL Pointer Arithmetic by @martindevans in #246
- build(deps): bump xunit from 2.5.0 to 2.6.1 by @dependabot in #233
- build(deps): bump xunit.runner.visualstudio from 2.5.0 to 2.5.3 by @dependabot in #234
- build(deps): bump Swashbuckle.AspNetCore from 6.2.3 to 6.5.0 by @dependabot in #235
- build(deps): bump Microsoft.SemanticKernel from 1.0.0-beta1 to 1.0.0-beta4 by @dependabot in #231
- feat(kernel-memory): avoid loading model twice. by @AsakusaRinne in #248
- GitHub Action Pipeline Improvements by @martindevans in #245
- Update README.md by @hswlab in #252
- Removed some CI targets by @martindevans in #253
- Removed Old Targets From CI matrix by @martindevans in #254
- Align with llama.cpp b1488 by @SignalRT in #249
- Enhance framework compatibility by @Uralstech in #259
- Update LLama.Examples using Spectre.Console by @xbotter in #255
- Context Size Autodetect by @martindevans in #263
- Prevent duplication of user prompts / chat history in ChatSession. by @philippjbauer in #266
- build: add package for kernel-memory integration. by @AsakusaRinne in #244
- Exposed YaRN scaling parameters in IContextParams by @martindevans in #257
- Update ToLLamaSharpChatHistory extension method to be public and support semantic-kernel author roles by @kidkych in #274
- Runtime detection MacOS by @SignalRT in #258
- feat: cuda feature detection. by @AsakusaRinne in #275
New Contributors
- @dependabot made their first contribution in #233
- @hswlab made their first contribution in #252
- @Uralstech made their first contribution in #259
- @philippjbauer made their first contribution in #266
- @kidkych made their first contribution in #274
Full Changelog: v0.7.0...v0.8.0
v0.7.0 - improve performance
This release fixes the performance problem in v0.6.0, so that it's strongly recommended to upgraded to this version. Many thanks for the catch of this problem by @lexxsoft and the fix from @martindevans !
What's Changed
- RoundTrip Tokenization Errors by @martindevans in #205
- Fixed Broken Text Decoding by @martindevans in #219
- Multi GPU by @martindevans in #202
- New Binaries & Improved Sampling API by @martindevans in #223
Full Changelog: v0.6.0...v0.7.0
v0.6.0 - follow major llama.cpp changes
What's Changed
- Better Antiprompt Testing by @martindevans in #150
- Simplified
LLamaInteractExecutor
antiprompt matching by @martindevans in #152 - Changed
OpenOrCreate
toCreate
by @martindevans in #153 - Beam Search by @martindevans in #155
- ILogger implementation by @saddam213 in #158
- Removed
GenerateResult
by @martindevans in #159 GetState()
fix by @martindevans in #160- llama_get_kv_cache_token_count by @martindevans in #164
- better_instruct_antiprompt_checking by @martindevans in #165
- skip_empty_tokenization by @martindevans in #167
- SemanticKernel API Update by @drasticactions in #169
- Removed unused properties of
InferenceParams
&ModelParams
by @martindevans in #149 - Coding assistent example by @Regenhardt in #172
- Remove non-async by @martindevans in #173
- MacOS default build now is metal llama.cpp #2901 by @SignalRT in #163
- CPU Feature Detection by @martindevans in #65
- make InferenceParams a record so we can use
with
by @redthing1 in #175 - fix opaque GetState (fixes #176) by @redthing1 in #177
- Extensions Method Unit Tests by @martindevans in #179
- Async Stateless Executor by @martindevans in #182
- Fixed GitHub Action by @martindevans in #190
- GrammarRule Tests by @martindevans in #192
- More Tests by @martindevans in #194
- Support SemanticKernel 1.0.0-beta1 by @DVaughan in #193
- Major llama.cpp API Change by @martindevans in #185
- Cleanup by @martindevans in #196
- Update WebUI inline with v5.0.x by @saddam213 in #197
- More Logging by @martindevans in #198
- chore: Update LLama.Examples and LLama.SemanticKernel by @xbotter in #201
- ci: add auto release workflow. by @AsakusaRinne in #204
New Contributors
- @Regenhardt made their first contribution in #172
- @redthing1 made their first contribution in #175
- @DVaughan made their first contribution in #193
Full Changelog: v0.5.1...v0.6.0
v0.5.1 - GGUF, grammar and semantic-kernel integration
What's Changed
- Remove native libraries from LLama.csproj and replace it with a targets file. by @drasticactions in #32
- Update libllama.dylib by @SignalRT in #36
- update webapi example by @xbotter in #39
- MacOS metal support by @SignalRT in #47
- Basic ASP.NET Core website example by @saddam213 in #48
- fix breaking change in llama.cpp; bind to latest version llama.cpp to… by @fwaris in #51
- Documentation Spelling/Grammar by @martindevans in #52
- XML docs fixes by @martindevans in #53
- Cleaned up unnecessary extension methods by @martindevans in #55
- Memory Mapped LoadState/SaveState by @martindevans in #56
- Larger states by @martindevans in #57
- Instruct & Stateless web example implemented by @saddam213 in #59
- Fixed Multiple Enumeration by @martindevans in #54
- Fixed More Multiple Enumeration by @martindevans in #63
- Low level new loading system by @martindevans in #64
- Fixed Memory pinning in Sampling API by @martindevans in #68
- Fixed Spelling Mirostate -> Mirostat by @martindevans in #69
- Fixed Mirostate Sampling by @martindevans in #72
- GitHub actions by @martindevans in #74
- Update llama.cpp binaries to 5f631c2 and align the LlamaContext by @SignalRT in #77
- Expose some native classes by @saddam213 in #80
- feat: update the llama backends. by @AsakusaRinne in #78
- ModelParams & InferenceParams abstractions by @saddam213 in #79
- Cleaned up multiple enumeration in FixedSizeQueue by @martindevans in #83
- Improved Tensor Splits by @martindevans in #81
- fix: antiprompt does not work in stateless executor. by @AsakusaRinne in #84
- Access to IModelParamsExtensions by @saddam213 in #86
- Utils Cleanup by @martindevans in #82
- Fixed
ToLlamaContextParams
using the wrong parameter foruse_mmap
by @martindevans in #89 - Fix serialization error due to NaN by @martindevans in #88
- Add native logging output by @saddam213 in #95
- Minor quantizer improvements by @martindevans in #96
- Improved
NativeApi
file a bit by @martindevans in #99 - Logger Comments by @martindevans in #100
- llama_sample_classifier_free_guidance by @martindevans in #101
- Potential fix for .Net Framework issues by @zombieguy98 in #103
- Add missing semi-colon to README sample code by @zerosoup in #104
- Multi Context by @martindevans in #90
- Updated Demos by @martindevans in #105
- renamed some arguments in ModelParams constructor so that class can be serialized easily by @erinloy in #108
- Stateless Executor Fix by @martindevans in #107
- Grammar basics by @martindevans in #102
- Re-renaming some arguments to allow for easy deserialization from appsettings.json. by @erinloy in #111
- Added native symbol for CFG by @martindevans in #112
- Minor Code Cleanup by @martindevans in #114
- Changed type conversion by @zombieguy98 in #116
- OldVersion obsoletion notices by @martindevans in #117
- Embedder Test by @martindevans in #97
- Improved Cloning by @martindevans in #119
- ModelsParams record class by @martindevans in #115
- ReSharper code warnings cleanup by @martindevans in #120
- Two small improvements to the native sampling API by @martindevans in #124
- Removed unnecessary parameters from some low level sampler methods by @martindevans in #125
- Dependency Building In Github Action by @martindevans in #126
- Fixed paths by @martindevans in #127
- Fixed cuda paths again by @martindevans in #130
- Linux cublas by @martindevans in #131
- Fixed linux cublas filenames by @martindevans in #132
- fixed linux cublas paths in final step by @martindevans in #133
- Fixed the cublas linux paths again by @martindevans in #134
- Fixed those cublas paths again by @martindevans in #135
- Translating the grammar parser by @Mihaiii in #136
- Higher Level Grammar System by @martindevans in #137
- Enable Semantic kernel support by @drasticactions in #138
- grammar_exception_types by @martindevans in #140
- GGUF by @martindevans in #122
- docs: update the docs to follow new version. by @AsakusaRinne in #141
- Update MacOS Binaries by @SignalRT in #143
- Remove LLamaNewlineTokens from InteractiveExecutorState by @martindevans in #144
- refactor: remove old version files. by @AsakusaRinne in #142
- Disable test parallelism by @martindevans in #145
- Removed duplicate
llama_sample_classifier_free_guidance
method by @martindevans in #146 - Swapped to llama-7b-chat by @martindevans in #147
New Contributors
- @drasticactions made their first contribution in #32
- @xbotter made their first contribution in #39
- @saddam213 made their first contribution in #48
- @fwaris made their first contribution in #51
- @martindevans made their first contribution in #52
- @zombieguy98 made their first contribution in #103
- @zerosoup made their first contribution in #104
- @erinloy made their first contribution in #108
- @Mihaiii made their first contribution in #136
Full Changelog: v0.4.0...v0.5.0
v0.4.2-preview: new backends
What's Changed
- update webapi example by @xbotter in #39
- MacOS metal support by @SignalRT in #47
- Basic ASP.NET Core website example by @saddam213 in #48
- fix breaking change in llama.cpp; bind to latest version llama.cpp to… by @fwaris in #51
- Documentation Spelling/Grammar by @martindevans in #52
- XML docs fixes by @martindevans in #53
- Cleaned up unnecessary extension methods by @martindevans in #55
- Memory Mapped LoadState/SaveState by @martindevans in #56
- Larger states by @martindevans in #57
- Instruct & Stateless web example implemented by @saddam213 in #59
- Fixed Multiple Enumeration by @martindevans in #54
- Fixed More Multiple Enumeration by @martindevans in #63
- Low level new loading system by @martindevans in #64
- Fixed Memory pinning in Sampling API by @martindevans in #68
- Fixed Spelling Mirostate -> Mirostat by @martindevans in #69
- Fixed Mirostate Sampling by @martindevans in #72
- GitHub actions by @martindevans in #74
- Update llama.cpp binaries to 5f631c2 and align the LlamaContext by @SignalRT in #77
- Expose some native classes by @saddam213 in #80
- feat: update the llama backends. by @AsakusaRinne in #78
New Contributors
- @xbotter made their first contribution in #39
- @saddam213 made their first contribution in #48
- @fwaris made their first contribution in #51
- @martindevans made their first contribution in #52
Full Changelog: v0.4.1-preview...v0.4.2-preview
v0.4.1-preview - follow up llama.cpp latest commit
This is a preview version which followed up the latest modifications of llama.cpp.
For some reasons the cuda backend hasn't been okay, we'll release v0.4.1 after dealing with that.
v0.4.0 - Executor and ChatSession
Version 0.4.0 introduces many break changes. However we strongly recommend to upgrade to 0.4.0 because it provides better abstractions and stability by refactoring the framework. The backend v0.3.0
and v0.3.1
still works for LLamaSharp v0.4.0
.
The main changes:
- Add three-level abstractions:
LLamaModel
,LLamaExecutor
andChatSession
. - Fix the BUG of saving and loading state.
- Support saving/loading chat session directly.
- Add more flexible APIs in the chat session.
- Add detailed documentations: https://scisharp.github.io/LLamaSharp/0.4/
Acknowledge
During the development, thanks a lot for the help from @TheTerrasque ! His/Her fork gives us many inspirations. Besides, many thanks for the following contributors!
- MacOS Arm64 support by @SignalRT in #24
- Fixed a typo in FixedSizeQueue by @mlof in #25
- Document interfaces by @mlof in #26