Fix MSVC build; and associated merge #4

Jeadie · 2025-01-22T22:57:36Z

🗣 Description

🔨 Related Issues

🤔 Concerns

* bert attention mask * Allow for using None as a mask. * Revert part of the changes so that the proper default mask applies. * Cosmetic change. * Another cosmetic tweak. --------- Co-authored-by: Laurent <[email protected]>

* Add updated mfa metallib * Add bgemm and tests

…#2367)

* fix: fix jina bert example logic * feat: enable jina embeddings de * feat: allow more flexibility on Jina Bert

* Fix cargo fmt. * Clippy fix. * Cosmetic tweaks.

* Add the flux autoencoder. * Add the encoder down-blocks. * Upsampling in the decoder. * Sketch the flow matching model. * More flux model. * Add some of the positional embeddings. * Add the rope embeddings. * Add the sampling functions. * Add the flux example. * Fix the T5 bits. * Proper T5 tokenizer. * Clip encoder path fix. * Get the clip embeddings. * No configurable weights in layer norm. * More weights related fixes. * Yet another shape fix. * DType fix. * Fix a couple more shape issues. * DType fixes. * Fix the latent dims. * Fix more shape issues. * Autoencoder fixes. * Get some generations out. * Bugfix. * T5 padding. * Clippy fix. * Add the decode only mode. * Fix. * More fixes. * Finally get some generations to work. * Add readme.

* add models support and example for THUDM/glm-4 * fix the ci report * fmt * fix * Update README.org * Update README.org * fmt * Update README.org * README.md add codegeex4 * README.md add glm4 * Typo. * change expect into ? --------- Co-authored-by: Laurent Mazare <[email protected]>

* Add attn softmax * Add some docs * Add bf16 * Update kernels for f16 * All tests pass * Fix cpu clippy * Fix doc

* Test passes * All tests pass * Now all the tests really pass * Try out always using mm * Mirror llama.cpp metric * Mirror llama.cpp metric * Update test

* Don't always compile fp8 * Correct includes

* Add bf16, f32 conversions for f8e4m3 on metal * Support storage from slice

* Metal fast sdpa for long seqlen * Add a test * Add softcapping test

* General Metal bf16 support * Fix compilation * Fix compilation * Math * Math unary * Math affine * Maybe diambiguate * Define hugevalbf * Define matrix * Pare back? * Fix * Format * Clippy * Clippy

* Add q4k quantization with imatrix * Sketch some imatrix generation * Fixes * Add quantize_imatrix_onto * Support loading the imatrix file * Fix load_imatrix * Implement imatrix quantization for q2k * Implement imatrix quantization for q3k * Fix build on cuda * Add imatrix q5k, q6k quants

* Add inplace softmax * inplace_attn_softmax_last_dim * Fix cuda * Format

* Metal addmm support * Format

* Begin to remove ug * Begin to remove ug * Begin to remove ug * Begin to remove ug

sgrebnov · 2025-01-22T23:10:34Z

@Jeadie - I think moving forward instead of merging upstream to spiceai we should create new spiceai-0.8.2 branch based on upstream version / tag and propagate our specific fixes - ideally this should be none (everything should be upstreamed and release as part of official candle library)

lz1998 and others added 30 commits August 1, 2024 08:26

bert attention mask (huggingface#1934)

4a52aeb

* bert attention mask * Allow for using None as a mask. * Revert part of the changes so that the proper default mask applies. * Cosmetic change. * Another cosmetic tweak. --------- Co-authored-by: Laurent <[email protected]>

Enable the affine kernel for u8/u32. (huggingface#2376)

8696cf6

Metal bgemm min changes (huggingface#2364)

fea46cb

* Add updated mfa metallib * Add bgemm and tests

Fix log_sum_exp to handle large positive/negative inputs (huggingface…

bd80078

…#2367)

Use BF16 on metal when possible. (huggingface#2378)

1ba87a9

Add get_ids to GradStore (huggingface#2379)

ce90287

Enable BF16 on metal. (huggingface#2380)

957d604

Add a minimal test for the metal bf16 matmul. (huggingface#2381)

d4b6f6e

Add Hiera vision model. (huggingface#2382)

ac51f47

Jina Bert Example fix and more configuration (huggingface#2191)

2e9c010

* fix: fix jina bert example logic * feat: enable jina embeddings de * feat: allow more flexibility on Jina Bert

Fix cargo fmt. (huggingface#2383)

9ca277a

* Fix cargo fmt. * Clippy fix. * Cosmetic tweaks.

update: LSTMState and GRUState fields to be public (huggingface#2384)

6991a37

Revert the bf16 gemm metal changes for now. (huggingface#2386)

0fcb40b

Simplify handling of flux modulations. (huggingface#2394)

aa7ac18

optimize gradient for silu a bit (huggingface#2393)

c0a559d

Support the flux-dev model too. (huggingface#2395)

89eae41

Support for mistral-nemo. (huggingface#2396)

2be9bd2

Add sdpa function with cublaslt

1a48767

Update docs

7bbcf00

Add matmul_bias_and_scale

1bf7101

Rename

d6d3d18

Add a simple test and fix for cpu

e20d85a

Update sdpa function

8d2f32a

Add matmul_alpha

9f144d6

Use matmul_with_alpha in sdpa

c830f26

Add it to mistral

86d0876

Add it to q llama

8d8889c

Add attention benches

d18eb13

EricLBuehler and others added 26 commits November 12, 2024 20:18

Complete merge

be54a9a

Complete merge

855fe38

Attention-optimized softmax for prompts (huggingface#38)

77a6cc6

* Add attn softmax * Add some docs * Add bf16 * Update kernels for f16 * All tests pass * Fix cpu clippy * Fix doc

Metal qmatmul mat-mat product (huggingface#39)

6be03dd

* Test passes * All tests pass * Now all the tests really pass * Try out always using mm * Mirror llama.cpp metric * Mirror llama.cpp metric * Update test

Metal: Use mtl resource shared to avoid copy (huggingface#40)

cb8082b

Dont always compile fp8, bf16 for CUDA (huggingface#42)

e97177b

* Don't always compile fp8 * Correct includes

F8E4M3 support on Metal (huggingface#43)

6b10eac

* Add bf16, f32 conversions for f8e4m3 on metal * Support storage from slice

Integrate fast MLX kernel for SDPA with long seqlen (huggingface#45)

823a83a

* Metal fast sdpa for long seqlen * Add a test * Add softcapping test

General Metal bf16 support (huggingface#46)

8742354

* General Metal bf16 support * Fix compilation * Fix compilation * Math * Math unary * Math affine * Maybe diambiguate * Define hugevalbf * Define matrix * Pare back? * Fix * Format * Clippy * Clippy

Fix q3k imatrix quantization (huggingface#48)

84c89a6

Ensure support to cuda cc 53 (huggingface#49)

db0e646

Fix Metal F8E4M3 impl (huggingface#50)

ce17ba6

Fix duplicate cuda cast instantiations (huggingface#51)

a3814f5

Add inplace softmax

df172aa

Add varbuilder get_unchecked (huggingface#52)

2c7408b

Merge branch 'main' into inplace_softmax

394ec76

inplace_attn_softmax_last_dim

c7bd96d

Fix cuda

e573895

Format

0e8e8cb

Merge pull request huggingface#53 from EricLBuehler/inplace_softmax

6800496

* Add inplace softmax * inplace_attn_softmax_last_dim * Fix cuda * Format

Metal addmm support (huggingface#54)

c0c2b23

* Metal addmm support * Format

Use cudarc fork to fix windows build (huggingface#58)

af655eb

Use float8 mistralrs_cudarc_fork feature (huggingface#59)

f524bc6

Begin to remove ug (huggingface#60)

f7d9f06

* Begin to remove ug * Begin to remove ug * Begin to remove ug * Begin to remove ug

Fix Windows (msvc) build

fd28f08

Jeadie self-assigned this Jan 22, 2025

sgrebnov approved these changes Jan 22, 2025

View reviewed changes

Jeadie merged commit 296f14f into spiceai Jan 22, 2025
15 of 22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MSVC build; and associated merge #4

Fix MSVC build; and associated merge #4

Jeadie commented Jan 22, 2025

sgrebnov commented Jan 22, 2025 •

edited

Loading

Fix MSVC build; and associated merge #4

Fix MSVC build; and associated merge #4

Conversation

Jeadie commented Jan 22, 2025

🗣 Description

🔨 Related Issues

🤔 Concerns

sgrebnov commented Jan 22, 2025 • edited Loading

sgrebnov commented Jan 22, 2025 •

edited

Loading