Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge ericlBuehler/mistral.rs into spiceai. #16

Merged
merged 79 commits into from
Jan 30, 2025
Merged

Conversation

Jeadie
Copy link

@Jeadie Jeadie commented Jan 23, 2025

🤔 Concerns

haricot and others added 30 commits December 12, 2024 07:20
* Adding streaming function to mistralrs server.

* Adding simple_stream example
* Add a forward_autocast method

* Add a to_gguf_quant method for bnb

* Handle blocksizes

* Maybe cast

* Add QuantMethod::dequantize_w

* Debug

* Debug

* Debug

* Fix the bug maybe???

* Fix the bug maybe???

* Clippy
* More vllama optimizations

* Oops

* Use addmm metal

* Make some progress

* Conditional

* Conditional

* No crossattn quant

* Fix loading from uqff
* Update docs

* Update deps
* Work on prefix cacher

* It works

* Clippy

* Enable partial matches
* Add --cpu flag to `mistralrs-server`

* Update lib.rs

* Update main.rs

* Update lib.rs

* Update lib.rs

* Update lib.rs
* Separate cuda paged attention impl

* Sketch necessary functions

* Implement swap blocks and copy blocks

* Implement reshape_and_cache

* Add the kernels

* Add the normal sdpa kernel as a basis

* Wire things up a bit

* Correct inputs

* Implement the pagedattention kernel

* Kernels compile

* Instantiate!

* Add kernel call code

* Implement the op for v1

* Implement the v2 kernel and op

* Clippy

* Correct memory info for metal

* Fixes

* Fix cuda

* Fix cuda

* Fix cuda

* Fix

* Debugging

* 🚀 It works!

* Use faster vector implementation

* Fix bug for kernel cache & Phi3 inference (EricLBuehler#1003)

* Fix bug for kernel cache & Phi3 inference

* Remove the leftover used in nightly debugging

* Fix warning

* Add (deactivated) bfloat support w/ simd

* Tune num_threads

* Update docs

* Disable paged attn by default on metal

---------

Co-authored-by: Guoqing Bao <[email protected]>
* Support for normal cache for mllama, phi3v, qwen2vl

* Clippy
…models (EricLBuehler#1009)

* Support BF16 kvcache & attention for GGUF/GGML quantization

* Fix clippy

* Pass dtype to xlora gguf/ggml model

* Remove the hardcoded fix for the literal chat template (side effect: the model cannot terminate itself for running GGUF file)

* Pass dtype to Lora GGUF/GGML models
* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Move start_offsets_kernel to correct device

* Update starcoder2.rs

* Support device mapping

* Support device mapping

* Support device mapping

* Support device mapping

* Support device mapping

* format

* Support device mapping

* remove mut

* remove mut

* Add get_unique_devices method

* Move tensor for device mapping

* Add DeviceMapper

* Fix wrong RotaryEmbedding import

* Fix wrong RotaryEmbedding import

* Remove unecessary tensor copies

* Add DeviceMapper

* Add DeviceMapper

* Add DeviceMapper

* Add device mapping

* Create tensor copies for each device for pa

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* Add device mapper

* add device mapper

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* Remove unecessary tensor move

* format

* format

* format

* clippy

* format
* Fixes for prefix cache + llama vision

* Fix for vllama
* Initial steps toward supporting deepseekv2

* Implement the attention forward

* Add the mlp

* Implement the moe gate and forward

* Fixes

* Forward pass runs

* Clippy

* Update

* It works

* Use faster rope

* Use normal cache

* Add framework for paged attn

* Fixes

* Support isq

* Add moqe support, residual tensors

* Add examples, python API, docs

* Fix tests

* Update deps
* Use float8 mistralrs_cudarc_fork feature

* Fix
cdoko and others added 19 commits January 18, 2025 05:35
…er#1071)

* pass mapper

* pass mapper

* pass mapper

* pass mapper

* pass mapper

* pass mapper

* pass mapper
* Fixing idefics3 and idefics2

* Fixing idefics3
* Improve handling of activations in device map

* Log sub models that are not device mapped

* Reduce defaults

* Register sub models for the rest of the vision models
…er#1077)

* Implement the deepseekv3 model

* Update apis and docs
* handle assistant messages with 'tool_calls' when used in chat_template

* linting

* add better methods for using tools in  and update examples

* fixes

* Update interactive_mode.rs

* Don't print GGUF model metadata when silent=true
… `Usage`. (EricLBuehler#1078)

* handle assistant messages with 'tool_calls' when used in chat_template

* linting

* add better methods for using tools in  and update examples

* fixes

* Update interactive_mode.rs

* add Usage to ChatCompletionChunkResponse

* add usage telemetry to streaming messages

* clppy
* Add siglip, configs

* Fix siglip

* Implement the resampler

* Implement the rest of the vision model

* Add the image processor

* Implement the processor

* Clippy

* A few fixes

* Even more fixes

* It works

* ISQ support

* Fix cuda

* Major refactor of rope

* Fix

* Fix resampler pos embed

* Complete merge

* Small optimization

* Add docs and examples

* Implement residual tensors

* Update docs
@Jeadie Jeadie self-assigned this Jan 23, 2025
Copy link

Code Metrics Report
  ===============================================================================
 Language            Files        Lines         Code     Comments       Blanks
===============================================================================
 C Header                2           35           28            0            7
 Dockerfile              1           41           22           10            9
 JSON                   12          105          104            0            1
 Python                 63         2706         2339           70          297
 Shell                   1           57           22           18           17
 Plain Text              3         3723            0         2413         1310
 TOML                   18          612          546            2           64
 YAML                    2           21           19            2            0
-------------------------------------------------------------------------------
 Jupyter Notebooks       4            0            0            0            0
 |- Markdown             2           77           32           31           14
 |- Python               2          205          178            1           26
 (Total)                            282          210           32           40
-------------------------------------------------------------------------------
 Markdown               43         3324            0         2520          804
 |- BASH                 6          101           98            0            3
 |- JSON                 1           12           12            0            0
 |- Python               7          121          109            0           12
 |- Rust                12          406          344            0           62
 |- TOML                 2           75           63            0           12
 (Total)                           4039          626         2520          893
-------------------------------------------------------------------------------
 Rust                  287        87687        78742         1808         7137
 |- Markdown           140         1499           25         1362          112
 (Total)                          89186        78767         3170         7249
===============================================================================
 Total                 436        98311        81822         6843         9646
===============================================================================
  

@Jeadie Jeadie merged commit d4e2702 into spiceai Jan 30, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants