Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MSVC build; and associated merge #4

Merged
merged 225 commits into from
Jan 22, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
225 commits
Select commit Hold shift + click to select a range
4a52aeb
bert attention mask (#1934)
lz1998 Aug 1, 2024
8696cf6
Enable the affine kernel for u8/u32. (#2376)
LaurentMazare Aug 1, 2024
fea46cb
Metal bgemm min changes (#2364)
ivarflakstad Aug 1, 2024
bd80078
Fix log_sum_exp to handle large positive/negative inputs (#2367)
yunjhongwu Aug 1, 2024
1ba87a9
Use BF16 on metal when possible. (#2378)
LaurentMazare Aug 1, 2024
ce90287
Add get_ids to GradStore (#2379)
spaghetti-source Aug 1, 2024
957d604
Enable BF16 on metal. (#2380)
LaurentMazare Aug 1, 2024
d4b6f6e
Add a minimal test for the metal bf16 matmul. (#2381)
LaurentMazare Aug 1, 2024
ac51f47
Add Hiera vision model. (#2382)
janimo Aug 1, 2024
2e9c010
Jina Bert Example fix and more configuration (#2191)
JoanFM Aug 1, 2024
9ca277a
Fix cargo fmt. (#2383)
LaurentMazare Aug 1, 2024
6991a37
update: LSTMState and GRUState fields to be public (#2384)
singjc Aug 1, 2024
0fcb40b
Revert the bf16 gemm metal changes for now. (#2386)
LaurentMazare Aug 1, 2024
19db6b9
Add the flux model for image generation. (#2390)
LaurentMazare Aug 4, 2024
aa7ac18
Simplify handling of flux modulations. (#2394)
LaurentMazare Aug 4, 2024
c0a559d
optimize gradient for silu a bit (#2393)
MilkFather Aug 4, 2024
89eae41
Support the flux-dev model too. (#2395)
LaurentMazare Aug 4, 2024
2be9bd2
Support for mistral-nemo. (#2396)
LaurentMazare Aug 4, 2024
1a48767
Add sdpa function with cublaslt
EricLBuehler Aug 4, 2024
7bbcf00
Update docs
EricLBuehler Aug 4, 2024
1bf7101
Add matmul_bias_and_scale
EricLBuehler Aug 4, 2024
d6d3d18
Rename
EricLBuehler Aug 4, 2024
e20d85a
Add a simple test and fix for cpu
EricLBuehler Aug 4, 2024
8d2f32a
Update sdpa function
EricLBuehler Aug 4, 2024
9f144d6
Add matmul_alpha
EricLBuehler Aug 4, 2024
c830f26
Use matmul_with_alpha in sdpa
EricLBuehler Aug 4, 2024
86d0876
Add it to mistral
EricLBuehler Aug 5, 2024
8d8889c
Add it to q llama
EricLBuehler Aug 5, 2024
d18eb13
Add attention benches
EricLBuehler Aug 5, 2024
500c9f2
add models support and example for THUDM/glm-4 (#2362)
donjuanplatinum Aug 5, 2024
d71b7d7
Fixes
EricLBuehler Aug 5, 2024
dfdce2b
Add the MMDiT model of Stable Diffusion 3 (#2397)
Czxck001 Aug 5, 2024
59bbc0d
Add the import script for the T5 tokenizer. (#2399)
LaurentMazare Aug 5, 2024
412e9f4
Merge commit 'd71b7d78396a944817876c56f1677bd17633234d'
EricLBuehler Aug 5, 2024
b7d9af0
fix: usage of `actions/checkout@v2` (#2403)
hamirmahal Aug 6, 2024
27ca77e
Simplify things a bit
EricLBuehler Aug 7, 2024
7ad6494
Mistral.rs GPTQ dev PR (#14)
EricLBuehler Aug 9, 2024
6e6c1c9
Fix issues in the encodec example README.md (#2407)
jnises Aug 10, 2024
14db029
Soft Non-Maximum Suppression (#2400)
onichmath Aug 10, 2024
d3fe989
Add documentation examples for `Tensor::i` and `Tensor::narrow` metho…
csicar Aug 10, 2024
35e5f31
Add Based LLM from Hazy Research. (#2411)
janimo Aug 12, 2024
68aa9c7
Fix the device for the bert attention mask. (#2414)
LaurentMazare Aug 14, 2024
53ce65f
Clippy fixes. (#2415)
LaurentMazare Aug 14, 2024
6f0e190
Fix on metal
EricLBuehler Aug 14, 2024
ec55f58
Add the flux model for image generation. (#2390)
LaurentMazare Aug 4, 2024
0a146d7
Simplify handling of flux modulations. (#2394)
LaurentMazare Aug 4, 2024
0f55c37
optimize gradient for silu a bit (#2393)
MilkFather Aug 4, 2024
aef4eba
Support the flux-dev model too. (#2395)
LaurentMazare Aug 4, 2024
c301efa
Support for mistral-nemo. (#2396)
LaurentMazare Aug 4, 2024
fd0e933
add models support and example for THUDM/glm-4 (#2362)
donjuanplatinum Aug 5, 2024
f8e2b36
Add the MMDiT model of Stable Diffusion 3 (#2397)
Czxck001 Aug 5, 2024
0e78d29
Add the import script for the T5 tokenizer. (#2399)
LaurentMazare Aug 5, 2024
1b796b9
fix: usage of `actions/checkout@v2` (#2403)
hamirmahal Aug 6, 2024
c9cdd54
Fix issues in the encodec example README.md (#2407)
jnises Aug 10, 2024
283a5cf
Soft Non-Maximum Suppression (#2400)
onichmath Aug 10, 2024
de719a2
Add documentation examples for `Tensor::i` and `Tensor::narrow` metho…
csicar Aug 10, 2024
2e72a3d
Add Based LLM from Hazy Research. (#2411)
janimo Aug 12, 2024
d7a9bd0
Fix the device for the bert attention mask. (#2414)
LaurentMazare Aug 14, 2024
3d40ffc
Clippy fixes. (#2415)
LaurentMazare Aug 14, 2024
c5c5d49
Update flash_fwd_launch_template.h with fix for kernels (#16)
joshpopelka20 Aug 14, 2024
2386e4e
Build fixes
EricLBuehler Aug 14, 2024
a38053f
Merge branch 'sdpa'
EricLBuehler Aug 14, 2024
2b75dd9
Fix build issue in EOS Token in llama-multiprocess (#2420)
hadilq Aug 16, 2024
69fdcfe
Apply rustfmt. (#2421)
LaurentMazare Aug 16, 2024
c1b9e07
Add support for gemma-2. (#2425)
LaurentMazare Aug 17, 2024
b75ef05
Fix the marian tokenizer importer. (#2426)
LaurentMazare Aug 17, 2024
7cff589
Support Minus(u) for arbitrary values of u, e.g. Minus(3). (#2428)
LaurentMazare Aug 17, 2024
736d8eb
Stream tensor (#2429)
LaurentMazare Aug 17, 2024
58197e1
parler-tts support (#2431)
LaurentMazare Aug 18, 2024
236b29f
Add the DAC model. (#2433)
LaurentMazare Aug 19, 2024
31a1075
onnx: implement LSTM op (#2268)
shua Aug 19, 2024
14fd2d9
Add a readme for the parler-tts example. (#2434)
LaurentMazare Aug 19, 2024
b47c0bc
Update README.md (#2435)
LaurentMazare Aug 19, 2024
1b1974e
Add GGUF BF16 support (#17)
EricLBuehler Aug 21, 2024
36bd9f9
Merge remote-tracking branch 'upstream/main'
EricLBuehler Aug 22, 2024
6fbddd6
Complete merge
EricLBuehler Aug 22, 2024
6070278
Bump the version to 0.6.1. (#2438)
LaurentMazare Aug 22, 2024
a8288b7
onnx: workaround pow with negative base (#2439)
shua Aug 22, 2024
1e96b8b
onnx: support negative index in Gather (#2440)
shua Aug 22, 2024
f706ef2
Add softcapping support to flash attention (#18)
EricLBuehler Aug 22, 2024
e3c146a
silero-vad v5 example (#2321)
shua Aug 22, 2024
2ec8729
Fix for parler-tts, do not add the last slice of padding tokens. (#2442)
LaurentMazare Aug 22, 2024
ccdbe87
Add FastViT model. (#2444)
janimo Aug 23, 2024
fdc2622
fix: qwen2 lm_head loading #2443 (#2445)
ilookee Aug 23, 2024
aafa24e
Update cudarc to 0.12. (#2451)
LaurentMazare Aug 27, 2024
29e25c4
FastViT fixes. (#2452)
janimo Aug 28, 2024
86613c0
MobileCLIP models S1 and S2 (#2454)
janimo Aug 29, 2024
c02b7c3
Fix FLUX.1 weights (#2457)
eugenehp Aug 29, 2024
3c8e120
Update kernels for metal bf16 (#19)
EricLBuehler Sep 2, 2024
014f140
fix(metal/accelerate): f64-f32 type mismatch (#20)
sammcj Sep 5, 2024
e326121
Clippy fixes for 1.81.0. (#2461)
LaurentMazare Sep 5, 2024
f317df8
Bump the version to 0.6.1. (#2438)
LaurentMazare Aug 22, 2024
8a9d2be
onnx: workaround pow with negative base (#2439)
shua Aug 22, 2024
a7142d3
onnx: support negative index in Gather (#2440)
shua Aug 22, 2024
f62d7e8
silero-vad v5 example (#2321)
shua Aug 22, 2024
ceab78e
Fix for parler-tts, do not add the last slice of padding tokens. (#2442)
LaurentMazare Aug 22, 2024
5b4c593
Add FastViT model. (#2444)
janimo Aug 23, 2024
ef9649c
fix: qwen2 lm_head loading #2443 (#2445)
ilookee Aug 23, 2024
7412bd0
Update cudarc to 0.12. (#2451)
LaurentMazare Aug 27, 2024
8e39086
FastViT fixes. (#2452)
janimo Aug 28, 2024
8632a2f
MobileCLIP models S1 and S2 (#2454)
janimo Aug 29, 2024
f492c04
Fix FLUX.1 weights (#2457)
eugenehp Aug 29, 2024
91e0c6e
Clippy fixes for 1.81.0. (#2461)
LaurentMazare Sep 5, 2024
ad84486
Improve candle_core::Error to make it more ergonomic (#21)
EricLBuehler Sep 11, 2024
7f5a470
Add API to get current device seed (#22)
EricLBuehler Sep 11, 2024
13b2a8a
Complete the missing backticks in the comments (#2469)
hongmengning Sep 11, 2024
5635650
Integrate the MLX gemm kernels (#2468)
LaurentMazare Sep 11, 2024
afb6575
Use the new MLX kernels to handle the BF16 matmul. (#2470)
LaurentMazare Sep 11, 2024
0cb0bd1
Add some metal gemm benchark. (#2471)
LaurentMazare Sep 11, 2024
72d6490
Hook the MLX matmul kernels in candle-core. (#2473)
LaurentMazare Sep 12, 2024
b60faeb
Missing metal kernels. (#2474)
LaurentMazare Sep 12, 2024
9240d03
Add QStorage::data for cuda and metal (#23)
EricLBuehler Sep 13, 2024
c09afc2
Fix for metal tanh. (#2475)
LaurentMazare Sep 13, 2024
8a99f7c
Fix build error with seed (#25)
EricLBuehler Sep 13, 2024
ebf722b
Export TensorIndexer public to candle users (#2477)
h1994st Sep 13, 2024
9e31a19
Add the i16 dtype (2) (#26)
ro99 Sep 15, 2024
6eea45a
Add a couple cast metal kernels. (#2479)
LaurentMazare Sep 15, 2024
382c6b5
Improve error message (#2485)
ivnsch Sep 20, 2024
c58c5d5
Add the mimi audio-tokenizer. (#2488)
LaurentMazare Sep 20, 2024
5fc4f17
Adding Granite 7b Instruct model example (#2487)
atilag Sep 21, 2024
af21040
Metal commands refactoring (#2489)
LaurentMazare Sep 21, 2024
844d45c
Bugfix for the metal elu kernel. (#2490)
LaurentMazare Sep 21, 2024
c2fca0c
Bump the crate version. (#2491)
LaurentMazare Sep 21, 2024
829dcfa
Update cudarc to 0.12.1. (#2494)
LaurentMazare Sep 22, 2024
8097559
Move the candle version to 0.7.1. (#2495)
LaurentMazare Sep 22, 2024
d01207d
Add a RotatingKVCache. (#2493)
LaurentMazare Sep 23, 2024
10d4718
Quantized version of flux. (#2500)
LaurentMazare Sep 26, 2024
a0184a4
move CI/Cuda runner
glegendre01 Sep 26, 2024
c3c392f
Merge pull request #2507 from huggingface/ci-move
glegendre01 Sep 26, 2024
ad8a4c5
Add some llama-3.2 examples. (#2508)
LaurentMazare Sep 26, 2024
ed48f54
Expand split ops (#2505)
stevenlovegrove Sep 26, 2024
2c25754
Clippy fixes for onnx + fix a broken test. (#2510)
LaurentMazare Sep 26, 2024
62525e8
Remove some extra whitelines. (#2513)
LaurentMazare Sep 28, 2024
261ed65
Add the SigLIP model. (#2515)
LaurentMazare Sep 28, 2024
3a3c48b
Bump the crate version to 0.7.2. (#2517)
LaurentMazare Sep 29, 2024
0ebb388
Paligemma siglip vision config (#2518)
LaurentMazare Sep 29, 2024
2f49e1b
Add PaliGemma. (#2519)
LaurentMazare Sep 29, 2024
683ab69
Add Pixtral. (#2521)
LaurentMazare Sep 30, 2024
dfe9a00
Pixtral polishing. (#2522)
LaurentMazare Sep 30, 2024
7246504
Yet another cuda qmm padding fix. (#2509)
LaurentMazare Sep 30, 2024
aa35bf2
Add/lstm direction (#2455)
singjc Sep 30, 2024
6110ad8
Refactor the whisper microphone example. (#2523)
LaurentMazare Sep 30, 2024
888d886
Add ColPali (#2524)
akshayballal95 Oct 1, 2024
def4c6c
Cuda quantized mmv bugfix. (#2526)
LaurentMazare Oct 1, 2024
a2bcc22
Efficient implementation of `Tensor::ones()` for `metal` (#2512)
AnubhabB Oct 1, 2024
d08212c
Merge remote-tracking branch 'upstream/main'
EricLBuehler Oct 2, 2024
fd08d3d
Tweak some metal tests. (#2528)
LaurentMazare Oct 2, 2024
f479840
Add a seed to the flux example. (#2529)
LaurentMazare Oct 2, 2024
c04861d
Should compile now on metal
EricLBuehler Oct 2, 2024
156ebd1
Fix dtype cast
EricLBuehler Oct 2, 2024
9363006
Add whisper large-v3 turbo to the example. (#2531)
LaurentMazare Oct 2, 2024
7b60bda
Add support for cuda streams. (#2532)
LaurentMazare Oct 2, 2024
90d04ff
Support whisper large-v3 turbo in the whisper-microphone example. (#2…
LaurentMazare Oct 2, 2024
6faecaa
Fix for cudnn bf16 conv2d. (#2535)
LaurentMazare Oct 2, 2024
20a57c4
Fix set_dtype
EricLBuehler Oct 3, 2024
56aacb0
Make the RNN configs accessible from the models. (#2541)
LaurentMazare Oct 4, 2024
410c89f
Add required feature for whisper example in Readme (#2539)
dengelt Oct 4, 2024
d2e4329
Tensor tools print all (#2543)
LaurentMazare Oct 5, 2024
f856b5c
pyo3 update. (#2545)
LaurentMazare Oct 6, 2024
e4a96f9
Switch to using the MLX matmul by default. (#2547)
LaurentMazare Oct 6, 2024
edf7668
improve (#2548)
jorgeantonio21 Oct 7, 2024
937e8ed
Add BertForMaskedLM to support SPLADE Models (#2550)
akshayballal95 Oct 7, 2024
0d96ec3
feat: intergrate chinese clip and add example (#2555)
SethWen Oct 10, 2024
ca7cf5c
Add Stable Diffusion 3 Example (#2558)
Czxck001 Oct 13, 2024
6eab6b5
Fix the guide to gain access to Stable Diffusion 3 Medium (#2559)
Czxck001 Oct 13, 2024
41ade77
fix: Allow marian configs to deserialize from json. (#2556)
Mikarific Oct 13, 2024
f553ab5
Adds support for Stella_en_v5 embedding model - 1.5B variant (#2551)
AnubhabB Oct 13, 2024
3d1dc06
Enable stable-diffusion 3 on metal. (#2560)
LaurentMazare Oct 14, 2024
a01aa89
onnx: ReduceMin/Max Ops (#2563)
AnubhabB Oct 15, 2024
dcd8333
Testcases (#2567)
AnubhabB Oct 17, 2024
fa4902f
Add initial f8 e4m3 dtype (#31)
EricLBuehler Oct 17, 2024
d050b60
Remove .vscode
EricLBuehler Oct 17, 2024
6287750
Fix some metal warnings
EricLBuehler Oct 17, 2024
7c09215
ONNX: GatherElements, Xor (#2568)
AnubhabB Oct 17, 2024
a2e9d41
use softmax_last_dim (metal and cuda kernel) in llama attention layer…
zackangelo Oct 23, 2024
1f8a28a
Sync ggml metal kernels (#33)
EricLBuehler Oct 25, 2024
3699c1a
Fix the repo name for llama 3.1. (#2576)
LaurentMazare Oct 26, 2024
07849aa
Update README.md (#2577)
sashaphmn Oct 26, 2024
522531d
Add some fast Metal MLX SDPA kernels (#32)
EricLBuehler Oct 26, 2024
37e0ab8
Stable diffusion 3.5 support. (#2578)
LaurentMazare Oct 27, 2024
594d984
Support for UG kernels. (#2579)
LaurentMazare Oct 27, 2024
0e2c8c1
UG metal integration. (#2580)
LaurentMazare Oct 27, 2024
41324ef
Merge remote-tracking branch 'upstream/main'
EricLBuehler Oct 27, 2024
aa93235
Conditional compilation for bf16
EricLBuehler Oct 28, 2024
629ec72
Conditional compilation for bf16
EricLBuehler Oct 28, 2024
498bc2c
Release the mmdit model earlier to reduce memory usage. (#2581)
LaurentMazare Oct 28, 2024
139ff56
Reduce memory usage for sd 3.5. (#2582)
LaurentMazare Oct 28, 2024
2d3df4a
Patch missing seed value
EricLBuehler Oct 29, 2024
d232e13
Support sd3.5 medium and MMDiT-X (#2587)
Czxck001 Oct 30, 2024
7ac0de1
Lazy upcasting for t5. (#2589)
LaurentMazare Oct 30, 2024
530ab96
Support Skip Layer Guidance (SLG) for Stable Diffusion 3.5 Medium (#2…
Czxck001 Nov 1, 2024
3fba2b5
Add the SmolLM2 models. (#2595)
LaurentMazare Nov 3, 2024
6454597
Improved launch config for layer-norm/rms-norm. (#2591)
LaurentMazare Nov 4, 2024
e2b6b36
Add some fast Metal MLX SDPA kernels (#2584)
EricLBuehler Nov 5, 2024
2e17ebd
Fix metal sdpa for v stride (#37)
EricLBuehler Nov 10, 2024
11495ab
Fix cpu map for i32
EricLBuehler Nov 11, 2024
3769206
Update docs (#2553)
zachcp Nov 11, 2024
9453cc3
Bump the crate version to 0.8.0. (#2612)
LaurentMazare Nov 12, 2024
06350c3
Add some missing index-select metal kernels. (#2613)
LaurentMazare Nov 12, 2024
be54a9a
Complete merge
EricLBuehler Nov 13, 2024
855fe38
Complete merge
EricLBuehler Nov 13, 2024
77a6cc6
Attention-optimized softmax for prompts (#38)
EricLBuehler Nov 14, 2024
6be03dd
Metal qmatmul mat-mat product (#39)
EricLBuehler Nov 14, 2024
cb8082b
Metal: Use mtl resource shared to avoid copy (#40)
EricLBuehler Nov 17, 2024
e97177b
Dont always compile fp8, bf16 for CUDA (#42)
EricLBuehler Nov 20, 2024
6b10eac
F8E4M3 support on Metal (#43)
EricLBuehler Nov 24, 2024
823a83a
Integrate fast MLX kernel for SDPA with long seqlen (#45)
EricLBuehler Nov 26, 2024
8742354
General Metal bf16 support (#46)
EricLBuehler Nov 28, 2024
e5685ce
Experimental imatrix and I- quants support (#47)
EricLBuehler Nov 29, 2024
84c89a6
Fix q3k imatrix quantization (#48)
EricLBuehler Dec 1, 2024
db0e646
Ensure support to cuda cc 53 (#49)
EricLBuehler Dec 1, 2024
ce17ba6
Fix Metal F8E4M3 impl (#50)
EricLBuehler Dec 2, 2024
a3814f5
Fix duplicate cuda cast instantiations (#51)
EricLBuehler Dec 2, 2024
df172aa
Add inplace softmax
EricLBuehler Dec 10, 2024
2c7408b
Add varbuilder get_unchecked (#52)
EricLBuehler Dec 10, 2024
394ec76
Merge branch 'main' into inplace_softmax
EricLBuehler Dec 10, 2024
c7bd96d
inplace_attn_softmax_last_dim
EricLBuehler Dec 10, 2024
e573895
Fix cuda
EricLBuehler Dec 11, 2024
0e8e8cb
Format
EricLBuehler Dec 11, 2024
6800496
Merge pull request #53 from EricLBuehler/inplace_softmax
EricLBuehler Dec 11, 2024
c0c2b23
Metal addmm support (#54)
EricLBuehler Dec 14, 2024
af655eb
Use cudarc fork to fix windows build (#58)
EricLBuehler Jan 7, 2025
f524bc6
Use float8 mistralrs_cudarc_fork feature (#59)
EricLBuehler Jan 7, 2025
f7d9f06
Begin to remove ug (#60)
EricLBuehler Jan 7, 2025
fd28f08
Fix Windows (msvc) build
sgrebnov Jan 18, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci_cuda.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ jobs:
concurrency:
group: ${{ github.workflow }}-${{ github.job }}-${{ github.head_ref || github.run_id }}
cancel-in-progress: true
runs-on: [single-gpu, nvidia-gpu, t4, ci]
runs-on:
group: aws-g4dn-2xlarge
container:
image: nvidia/cuda:12.3.1-devel-ubuntu22.04
options: --gpus 0
Expand Down
6 changes: 3 additions & 3 deletions .github/workflows/python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest] # For now, only test on Linux
steps:
steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v4

- name: Install Rust
uses: actions-rs/toolchain@v1
Expand Down Expand Up @@ -65,4 +65,4 @@ jobs:
working-directory: ./candle-pyo3
run: |
source .env/bin/activate
python -m pytest -s -v tests
python -m pytest -s -v tests
12 changes: 6 additions & 6 deletions .github/workflows/rust-ci.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
on:
on:
push:
branches:
branches:
- main
pull_request:

Expand All @@ -15,7 +15,7 @@ jobs:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
Expand All @@ -34,7 +34,7 @@ jobs:
os: [ubuntu-latest, windows-latest, macOS-latest]
rust: [stable]
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
Expand All @@ -49,7 +49,7 @@ jobs:
name: Rustfmt
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
Expand All @@ -65,7 +65,7 @@ jobs:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- uses: actions-rs/toolchain@v1
with:
profile: minimal
Expand Down
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,9 @@ candle-wasm-examples/*/package-lock.json
candle-wasm-examples/**/config*.json
.DS_Store
.idea/*
__pycache__
out.safetensors
out.wav
bria.mp3
bria.safetensors
bria.wav
12 changes: 0 additions & 12 deletions .vscode/settings.json

This file was deleted.

23 changes: 12 additions & 11 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ exclude = [
resolver = "2"

[workspace.package]
version = "0.6.0"
version = "0.8.0"
edition = "2021"
description = "Minimalist ML framework."
repository = "https://github.com/huggingface/candle"
Expand All @@ -33,21 +33,22 @@ ab_glyph = "0.2.23"
accelerate-src = { version = "0.3.2" }
anyhow = { version = "1", features = ["backtrace"] }
byteorder = "1.4.3"
candle = { path = "./candle-core", package = "candle-core", version = "0.6.0" }
candle-datasets = { path = "./candle-datasets", version = "0.6.0" }
candle-flash-attn = { path = "./candle-flash-attn", version = "0.6.0" }
candle-kernels = { path = "./candle-kernels", version = "0.6.0" }
candle-metal-kernels = { path = "./candle-metal-kernels", version = "0.6.0" }
candle-nn = { path = "./candle-nn", version = "0.6.0" }
candle-onnx = { path = "./candle-onnx", version = "0.6.0" }
candle-transformers = { path = "./candle-transformers", version = "0.6.0" }
candle = { path = "./candle-core", package = "candle-core", version = "0.8.0" }
candle-datasets = { path = "./candle-datasets", version = "0.8.0" }
candle-flash-attn = { path = "./candle-flash-attn", version = "0.8.0" }
candle-kernels = { path = "./candle-kernels", version = "0.8.0" }
candle-metal-kernels = { path = "./candle-metal-kernels", version = "0.8.0" }
candle-nn = { path = "./candle-nn", version = "0.8.0" }
candle-onnx = { path = "./candle-onnx", version = "0.8.0" }
candle-transformers = { path = "./candle-transformers", version = "0.8.0" }
clap = { version = "4.2.4", features = ["derive"] }
criterion = { version = "0.5.1", default-features=false }
cudarc = { version = "=0.11.6", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-version-from-build-system", "dynamic-linking"], default-features=false }
cudarc = { package = "mistralrs_cudarc_fork", version = "0.12.2", features = ["std", "cublas", "cublaslt", "curand", "driver", "nvrtc", "f16", "cuda-version-from-build-system", "dynamic-linking"], default-features=false }
fancy-regex = "0.13.0"
gemm = { version = "0.17.0", features = ["wasm-simd128-enable"] }
hf-hub = "0.3.0"
hf-hub = { version = "0.3.3", package = "candle-hf-hub" }
half = { version = "2.3.1", features = ["num-traits", "use-intrinsics", "rand_distr"] }
float8 = { version = "0.1.2", features = ["num-traits", "rand_distr"] }
hound = "3.5.1"
image = { version = "0.25.2", default-features = false, features = ["jpeg", "png"] }
imageproc = { version = "0.24.0", default-features = false }
Expand Down
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
[![discord server](https://dcbadge.vercel.app/api/server/hugging-face-879548962464493619)](https://discord.gg/hugging-face-879548962464493619)
[![Latest version](https://img.shields.io/crates/v/candle-core.svg)](https://crates.io/crates/candle-core)
[![Documentation](https://docs.rs/candle-core/badge.svg)](https://docs.rs/candle-core)
![License](https://img.shields.io/crates/l/candle-core.svg)
[![License](https://img.shields.io/github/license/base-org/node?color=blue)](https://github.com/huggingface/candle/blob/main/LICENSE-MIT)
[![License](https://img.shields.io/badge/license-Apache%202.0-blue?style=flat-square)](https://github.com/huggingface/candle/blob/main/LICENSE-APACHE)

**This is an optimized implmentation by Eric Buehler.**

Expand Down Expand Up @@ -65,7 +66,9 @@ We also provide a some command line based examples using state of the art models
- [LLaMA v1, v2, and v3](./candle-examples/examples/llama/): general LLM, includes
the SOLAR-10.7B variant.
- [Falcon](./candle-examples/examples/falcon/): general LLM.
- [Gemma](./candle-examples/examples/gemma/): 2b and 7b general LLMs from Google Deepmind.
- [Codegeex4](./candle-examples/examples/codegeex4-9b/): Code completion,code interpreter,web search,fuction calling,repository-level
- [GLM4](./candle-examples/examples/glm4/): Open Multilingual Multimodal Chat LMs by THUDM
- [Gemma v1 and v2](./candle-examples/examples/gemma/): 2b and 7b+/9b general LLMs from Google Deepmind.
- [RecurrentGemma](./candle-examples/examples/recurrent-gemma/): 2b and 7b
Griffin based models from Google that mix attention with a RNN like state.
- [Phi-1, Phi-1.5, Phi-2, and Phi-3](./candle-examples/examples/phi/): 1.3b,
Expand Down Expand Up @@ -120,6 +123,8 @@ We also provide a some command line based examples using state of the art models
model using residual vector quantization.
- [MetaVoice](./candle-examples/examples/metavoice/): foundational model for
text-to-speech.
- [Parler-TTS](./candle-examples/examples/parler-tts/): large text-to-speech
model.
- [T5](./candle-examples/examples/t5), [Bert](./candle-examples/examples/bert/),
[JinaBert](./candle-examples/examples/jina-bert/) : useful for sentence embeddings.
- [DINOv2](./candle-examples/examples/dinov2/): computer vision model trained
Expand Down Expand Up @@ -185,6 +190,7 @@ And then head over to
- [`candle-sampling`](https://github.com/EricLBuehler/candle-sampling): Sampling techniques for Candle.
- [`gpt-from-scratch-rs`](https://github.com/jeroenvlek/gpt-from-scratch-rs): A port of Andrej Karpathy's _Let's build GPT_ tutorial on YouTube showcasing the Candle API on a toy problem.
- [`candle-einops`](https://github.com/tomsanbear/candle-einops): A pure rust implementation of the python [einops](https://github.com/arogozhnikov/einops) library.
- [`atoma-infer`](https://github.com/atoma-network/atoma-infer): A Rust library for fast inference at scale, leveraging FlashAttention2 for efficient attention computation, PagedAttention for efficient KV-cache memory management, and multi-GPU support. It is OpenAI api compatible.

If you have an addition to this list, please submit a pull request.

Expand All @@ -208,7 +214,7 @@ If you have an addition to this list, please submit a pull request.
- StarCoder, StarCoder2.
- Phi 1, 1.5, 2, and 3.
- Mamba, Minimal Mamba
- Gemma 2b and 7b.
- Gemma v1 2b and 7b+, v2 2b and 9b.
- Mistral 7b v0.1.
- Mixtral 8x7b v0.1.
- StableLM-3B-4E1T, StableLM-2-1.6B, Stable-Code-3B.
Expand Down Expand Up @@ -236,9 +242,10 @@ If you have an addition to this list, please submit a pull request.
- Whisper, multi-lingual speech-to-text.
- EnCodec, audio compression model.
- MetaVoice-1B, text-to-speech model.
- Parler-TTS, text-to-speech model.
- Computer Vision Models.
- DINOv2, ConvMixer, EfficientNet, ResNet, ViT, VGG, RepVGG, ConvNeXT,
ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4.
ConvNeXTv2, MobileOne, EfficientVit (MSRA), MobileNetv4, Hiera, FastViT.
- yolo-v3, yolo-v8.
- Segment-Anything Model (SAM).
- SegFormer.
Expand Down
8 changes: 4 additions & 4 deletions candle-book/src/inference/hub.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Then let's start by downloading the [model file](https://huggingface.co/bert-bas

```rust
# extern crate candle_core;
# extern crate hf_hub;
use hf_hub::api::sync::Api;
# extern crate candle_hf_hub;
use candle_hf_hub::api::sync::Api;
use candle_core::Device;

let api = Api::new().unwrap();
Expand Down Expand Up @@ -50,8 +50,8 @@ Now that we have our weights, we can use them in our bert architecture:
```rust
# extern crate candle_core;
# extern crate candle_nn;
# extern crate hf_hub;
# use hf_hub::api::sync::Api;
# extern crate candle_hf_hub;
# use candle_hf_hub::api::sync::Api;
#
# let api = Api::new().unwrap();
# let repo = api.model("bert-base-uncased".to_string());
Expand Down
3 changes: 2 additions & 1 deletion candle-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ metal = { workspace = true, optional = true}
cudarc = { workspace = true, optional = true }
gemm = { workspace = true }
half = { workspace = true }
float8 = { workspace = true }
intel-mkl-src = { workspace = true, optional = true }
libc = { workspace = true, optional = true }
memmap2 = { workspace = true }
Expand All @@ -39,7 +40,7 @@ criterion = { workspace = true }

[features]
default = []
cuda = ["cudarc", "dep:candle-kernels"]
cuda = ["cudarc", "dep:candle-kernels", "float8/mistralrs_cudarc_fork"]
cudnn = ["cuda", "cudarc/cudnn"]
mkl = ["dep:libc", "dep:intel-mkl-src"]
accelerate = ["dep:libc", "dep:accelerate-src"]
Expand Down
7 changes: 5 additions & 2 deletions candle-core/benches/benchmarks/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,13 +20,16 @@ impl BenchDevice for Device {
Device::Cpu => Ok(()),
Device::Cuda(device) => {
#[cfg(feature = "cuda")]
return Ok(device.synchronize()?);
{
use cuda::WrapErr;
return Ok(device.synchronize().w()?);
}
#[cfg(not(feature = "cuda"))]
panic!("Cuda device without cuda feature enabled: {:?}", device)
}
Device::Metal(device) => {
#[cfg(feature = "metal")]
return Ok(device.wait_until_completed()?);
return device.wait_until_completed();
#[cfg(not(feature = "metal"))]
panic!("Metal device without metal feature enabled: {:?}", device)
}
Expand Down
17 changes: 16 additions & 1 deletion candle-core/src/backend.rs
Original file line number Diff line number Diff line change
Expand Up @@ -89,9 +89,23 @@ pub trait BackendStorage: Sized {
_: usize,
) -> Result<Self>;

fn matmul(
#[allow(clippy::too_many_arguments)]
fn matmul_with_alpha_beta(
&self,
_: &Self,
_: &mut Self,
_: Option<f64>,
_: (usize, usize, usize, usize),
_: &Layout,
_: &Layout,
_: &Layout,
) -> Result<()>;

#[allow(clippy::too_many_arguments)]
fn matmul_with_alpha(
&self,
_: &Self,
_: Option<f64>,
_: (usize, usize, usize, usize),
_: &Layout,
_: &Layout,
Expand Down Expand Up @@ -144,6 +158,7 @@ pub trait BackendDevice: Sized + std::fmt::Debug + Clone {
fn rand_normal(&self, _: &Shape, _: DType, _: f64, _: f64) -> Result<Self::Storage>;

fn set_seed(&self, _: u64) -> Result<()>;
fn get_current_seed(&self) -> Result<u64>;

/// Synchronize should block until all the operations on the device are completed.
fn synchronize(&self) -> Result<()>;
Expand Down
9 changes: 7 additions & 2 deletions candle-core/src/backprop.rs
Original file line number Diff line number Diff line change
Expand Up @@ -623,9 +623,9 @@ impl Tensor {
}
Op::Unary(arg, UnaryOp::Silu) => {
let sum_grad = grads.or_insert(arg)?;
// d/dx silu = sigmoid(x) * (1 + x * (1 - sigmoid(x)))
// d/dx silu = sigmoid(x) * (1 + x * (1 - sigmoid(x))) = sigmoid(x) * (1 - node) + node
let sigmoid_arg = (arg.neg()?.exp()? + 1.)?.recip()?;
let silu_grad = (&sigmoid_arg * (1. + (arg * (1. - &sigmoid_arg)?)?)?)?;
let silu_grad = &sigmoid_arg * (1. - *node) + *node;
*sum_grad = sum_grad.add(&(&grad * silu_grad)?)?
}
Op::Elu(arg, alpha) => {
Expand Down Expand Up @@ -756,4 +756,9 @@ impl GradStore {
};
Ok(grad)
}

/// Get the tensor ids of the stored gradient tensors
pub fn get_ids(&self) -> impl Iterator<Item = &TensorId> {
self.0.keys()
}
}
16 changes: 16 additions & 0 deletions candle-core/src/convert.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
//! Implement conversion traits for tensors
use crate::{DType, Device, Error, Tensor, WithDType};
use float8::F8E4M3;
use half::{bf16, f16, slice::HalfFloatSliceExt};
use std::convert::TryFrom;

Expand Down Expand Up @@ -130,6 +131,16 @@ impl Tensor {
f.write_u32::<LittleEndian>(v)?
}
}
DType::I16 => {
for v in vs.to_vec1::<i16>()? {
f.write_i16::<LittleEndian>(v)?
}
}
DType::I32 => {
for v in vs.to_vec1::<i32>()? {
f.write_i32::<LittleEndian>(v)?
}
}
DType::I64 => {
for v in vs.to_vec1::<i64>()? {
f.write_i64::<LittleEndian>(v)?
Expand All @@ -139,6 +150,11 @@ impl Tensor {
let vs = vs.to_vec1::<u8>()?;
f.write_all(&vs)?;
}
DType::F8E4M3 => {
for v in vs.to_vec1::<F8E4M3>()? {
f.write_u8(v.to_bits())?
}
}
}
Ok(())
}
Expand Down
Loading
Loading