Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

max-krasnyansky · 2024-05-10T02:55:00Z

Currently Windows ARM64 builds are not properly optimized, which results in low token
rates on Windows ARM64 platforms such as the upcoming Snapgradon X-Elite & Plus.

This update adds / resolves the following things:

Fixes MSVC & Clang warnings & errors in the logging code
Adds proper MatMul-INT8 support detection when building with MSVC for ARM64
Fixes errors in MatMul-INT8 when compiled with MSVC, which also fixes warnings with Clang,
and improves MatMul-INT8 NEON intrinsics usage in general
Adds CMake Toolchain files for Windows ARM64 MSVC and LLVM builds
We're using LLVM 16.x included in MS Visual Studio 2022
Updates GitHub Actions build workflow to produce optimized Windows ARM64 builds
All Windows cmake build targets now explicitly say x64 or arm64

Here are some before/after token rates from a Snapdragon X-Elite-based laptop.

llama-v2-7B, q4_0, CPU backend, 6 threads

Prebuilt Release (master)   | prompt-eval: 34-35 t/s | eval:   4-6 t/s
This PR (MSVC)              | prompt-eval: 60-62 t/s | eval: 10-11 t/s
This PR (LLVM/Clang)        | prompt-eval: 70-72 t/s | eval: 20-21 t/s

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition.
One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on.
Hosted Github CI Runners already include all that.

github-actions · 2024-05-10T23:56:47Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 541 iterations 🚀

Expand details for performance related PR only

Concurrent users: 8, duration: 10m
HTTP request : avg=8614.13ms p(95)=20803.78ms fails=, finish reason: stop=489 truncated=52
Prompt processing (pp): avg=96.96tk/s p(95)=402.6tk/s
Token generation (tg): avg=71.42tk/s p(95)=47.97tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=win-arm64-build commit=ece01fc2e99570f240ecc9a65f3e4f3df216e827

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 730.36, 730.36, 730.36, 730.36, 730.36, 833.8, 833.8, 833.8, 833.8, 833.8, 836.46, 836.46, 836.46, 836.46, 836.46, 836.11, 836.11, 836.11, 836.11, 836.11, 851.13, 851.13, 851.13, 851.13, 851.13, 847.98, 847.98, 847.98, 847.98, 847.98, 867.38, 867.38, 867.38, 867.38, 867.38, 872.2, 872.2, 872.2, 872.2, 872.2, 865.8, 865.8, 865.8, 865.8, 865.8, 878.85, 878.85, 878.85, 878.85, 878.85, 882.61, 882.61, 882.61, 882.61, 882.61, 868.55, 868.55, 868.55, 868.55, 868.55, 878.7, 878.7, 878.7, 878.7, 878.7, 863.95, 863.95, 863.95, 863.95, 863.95, 817.54, 817.54, 817.54, 817.54, 817.54, 822.79, 822.79, 822.79, 822.79, 822.79, 821.63, 821.63, 821.63, 821.63, 821.63, 829.02, 829.02, 829.02, 829.02, 829.02, 838.95, 838.95, 838.95, 838.95, 838.95, 837.14, 837.14, 837.14, 837.14, 837.14, 837.5, 837.5, 837.5, 837.5, 837.5, 841.08, 841.08, 841.08, 841.08, 841.08, 843.53, 843.53, 843.53, 843.53, 843.53, 839.73, 839.73, 839.73, 839.73, 839.73, 837.97, 837.97, 837.97, 837.97, 837.97, 840.42, 840.42, 840.42, 840.42, 840.42, 856.32, 856.32, 856.32, 856.32, 856.32, 855.65, 855.65, 855.65, 855.65, 855.65, 855.94, 855.94, 855.94, 855.94, 855.94, 857.43, 857.43, 857.43, 857.43, 857.43, 860.59, 860.59, 860.59, 860.59, 860.59, 857.14, 857.14, 857.14, 857.14, 857.14, 859.04, 859.04, 859.04, 859.04, 859.04, 870.52, 870.52, 870.52, 870.52, 870.52, 872.6, 872.6, 872.6, 872.6, 872.6, 873.58, 873.58, 873.58, 873.58, 873.58, 869.71, 869.71, 869.71, 869.71, 869.71, 866.43, 866.43, 866.43, 866.43, 866.43, 865.63, 865.63, 865.63, 865.63, 865.63, 868.0, 868.0, 868.0, 868.0, 868.0, 867.91, 867.91, 867.91, 867.91, 867.91, 874.67, 874.67, 874.67, 874.67, 874.67, 870.39, 870.39, 870.39, 870.39, 870.39, 870.82, 870.82, 870.82, 870.82, 870.82, 869.14, 869.14, 869.14, 869.14, 869.14, 866.61, 866.61, 866.61, 866.61, 866.61, 861.23, 861.23, 861.23, 861.23, 861.23, 863.26, 863.26, 863.26, 863.26, 863.26, 865.62, 865.62, 865.62, 865.62, 865.62, 865.15, 865.15, 865.15, 865.15, 865.15, 864.2, 864.2, 864.2, 864.2, 864.2, 865.5, 865.5, 865.5, 865.5, 865.5, 869.2, 869.2, 869.2, 869.2, 869.2, 872.0, 872.0, 872.0, 872.0, 872.0, 867.09, 867.09, 867.09, 867.09, 867.09, 868.79, 868.79, 868.79, 868.79, 868.79, 868.2, 868.2, 868.2, 868.2, 868.2, 868.92, 868.92, 868.92, 868.92, 868.92, 869.58, 869.58, 869.58, 869.58, 869.58, 870.46, 870.46, 870.46, 870.46]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 45.25, 45.25, 45.25, 45.25, 45.25, 27.07, 27.07, 27.07, 27.07, 27.07, 30.64, 30.64, 30.64, 30.64, 30.64, 32.14, 32.14, 32.14, 32.14, 32.14, 33.03, 33.03, 33.03, 33.03, 33.03, 34.3, 34.3, 34.3, 34.3, 34.3, 35.15, 35.15, 35.15, 35.15, 35.15, 35.18, 35.18, 35.18, 35.18, 35.18, 35.06, 35.06, 35.06, 35.06, 35.06, 34.09, 34.09, 34.09, 34.09, 34.09, 34.08, 34.08, 34.08, 34.08, 34.08, 33.83, 33.83, 33.83, 33.83, 33.83, 32.59, 32.59, 32.59, 32.59, 32.59, 32.58, 32.58, 32.58, 32.58, 32.58, 31.29, 31.29, 31.29, 31.29, 31.29, 30.84, 30.84, 30.84, 30.84, 30.84, 30.54, 30.54, 30.54, 30.54, 30.54, 30.57, 30.57, 30.57, 30.57, 30.57, 30.36, 30.36, 30.36, 30.36, 30.36, 30.29, 30.29, 30.29, 30.29, 30.29, 30.21, 30.21, 30.21, 30.21, 30.21, 30.17, 30.17, 30.17, 30.17, 30.17, 30.18, 30.18, 30.18, 30.18, 30.18, 30.28, 30.28, 30.28, 30.28, 30.28, 30.2, 30.2, 30.2, 30.2, 30.2, 30.47, 30.47, 30.47, 30.47, 30.47, 30.52, 30.52, 30.52, 30.52, 30.52, 30.58, 30.58, 30.58, 30.58, 30.58, 30.75, 30.75, 30.75, 30.75, 30.75, 30.95, 30.95, 30.95, 30.95, 30.95, 31.02, 31.02, 31.02, 31.02, 31.02, 31.1, 31.1, 31.1, 31.1, 31.1, 31.18, 31.18, 31.18, 31.18, 31.18, 31.26, 31.26, 31.26, 31.26, 31.26, 31.17, 31.17, 31.17, 31.17, 31.17, 31.03, 31.03, 31.03, 31.03, 31.03, 30.07, 30.07, 30.07, 30.07, 30.07, 30.12, 30.12, 30.12, 30.12, 30.12, 30.26, 30.26, 30.26, 30.26, 30.26, 30.31, 30.31, 30.31, 30.31, 30.31, 30.48, 30.48, 30.48, 30.48, 30.48, 30.6, 30.6, 30.6, 30.6, 30.6, 30.59, 30.59, 30.59, 30.59, 30.59, 30.37, 30.37, 30.37, 30.37, 30.37, 30.21, 30.21, 30.21, 30.21, 30.21, 29.11, 29.11, 29.11, 29.11, 29.11, 28.97, 28.97, 28.97, 28.97, 28.97, 29.03, 29.03, 29.03, 29.03, 29.03, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.22, 29.22, 29.22, 29.22, 29.22, 29.23, 29.23, 29.23, 29.23, 29.23, 29.12, 29.12, 29.12, 29.12, 29.12, 29.21, 29.21, 29.21, 29.21, 29.21, 29.2, 29.2, 29.2, 29.2, 29.2, 29.25, 29.25, 29.25, 29.25, 29.25, 29.39, 29.39, 29.39, 29.39, 29.39, 29.52, 29.52, 29.52, 29.52, 29.52, 29.56, 29.56, 29.56, 29.56, 29.56, 29.64, 29.64, 29.64, 29.64]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.29, 0.29, 0.29, 0.29, 0.29, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.35, 0.35, 0.35, 0.35, 0.35, 0.22, 0.22, 0.22, 0.22, 0.22, 0.42, 0.42, 0.42, 0.42, 0.42, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.29, 0.29, 0.29, 0.29, 0.29, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.31, 0.31, 0.31, 0.31, 0.31, 0.51, 0.51, 0.51, 0.51, 0.51, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.45, 0.45, 0.45, 0.45, 0.45, 0.57, 0.57, 0.57, 0.57, 0.57, 0.48, 0.48, 0.48, 0.48, 0.48, 0.43, 0.43, 0.43, 0.43, 0.43, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]

ggml-quants.c

hmartinez82 · 2024-05-11T10:20:50Z

OMG. Thank you for this. Do you think that the 8cx Gen 3 will benefit from these changes? Also. Would support QNN for Windows be too complicated?

cmake/arm64-windows-llvm.cmake

max-krasnyansky · 2024-05-13T04:47:53Z

@ggerganov
Thanks for fixing up q8_0_q8_0 (good eyes, it was a cut&paste error that I missed and CI didn't catch).
Should be good to merge now. I have more updates coming for the readme, and further ARM64 optimizations but waiting to merge this basic build/fixes stuff first.
Rebased / retested on top of the latest master.

hmartinez82 · 2024-05-13T22:02:38Z

@max-krasnyansky Understood.
Just for reference I have an 8cx Gen 3. I was able to get matmul working by using -march=armv8.3-a+dotprod+i8mm. I did notice a jump in the prompt eval speed.

max-krasnyansky · 2024-05-13T22:40:56Z

@hmartinez82

8cx Gen 3

Interesting. I didn't know int8 matmul works on 8cx gen3. That's great!
Can you please try running armv8.7-a compiled binaries as is? It might just work since we're technically not using other extensions (at least not explicitly).
If that doesn't work (ie you get a segfault due to unsupported instructions) please try
-march=armv8.4-a+dotprod+i8mm
If that works we could use that instead of armv8.7-a as the common set.

hmartinez82 · 2024-05-13T22:52:50Z

Well. I built it with -march=armv8.7-a and it worked with llama3, but not llama2 😑
When loading llama2, it crashes with:

llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
Illegal instruction

I'm trying -march=armv8.4-a+dotprod+i8mm now

hmartinez82 · 2024-05-13T23:00:17Z

@max-krasnyansky Ok bad news. I should have sticked with llama2 while testing.

I don't understand why (this is completely out of my league), but llama3 works even when compiling with armv8.7-a. It crashes when using llama2 as the model, even with -march=armv8.3-a+dotprod+i8mm . If I remove +i8mm then llama2 works .

In other words my lack of domain here led me to speak too soon. I couldn't imagine that different models would lead to different CPUs instructions being used 😓

max-krasnyansky · 2024-05-13T23:51:10Z

@hmartinez82
If you use the same quantization (q4_0) for both llama 2 and 3 then they would both use matmul-int8 (if enabled).
It's probably crashing in some other code path / other instruction used by the compiler.
Did you try -march=armv8.2-a+dotprod+i8mm ?
That'd be also good enough to get full rates on X-Elite (with llama 2 and 3 in q4_0).

hmartinez82 · 2024-05-14T00:02:13Z

Here's my llama2

llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
llm_load_print_meta: general.name     = LLaMA v2

and here's my llama3

llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix

I'm going to download Q4_0 of llama3.

But anyway. -march=armv8.2-a+dotprod+i8mm still crashes. I guess this confirms. 8cx Gen 3 does not support +i8mm

…h VA_ARGS

Co-authored-by: Georgi Gerganov <[email protected]>

max-krasnyansky · 2024-05-15T03:45:59Z

@ggerganov
Any objections to merging this? Please let me know if you have any questions/suggestions.

ggml-quants.c

slaren · 2024-05-15T16:42:13Z

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

max-krasnyansky · 2024-05-15T17:08:07Z

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

Ah. I'm going to add a full section in readme how to build native Windows ARM64.
And yes, you are correct. I was going to use presets in the CI as well but figured to start with it's more consistent to just explicitly specify CMAKE_TOOLCHAIN and things.
If you guys like the CMakePresets I have them for ubuntu-x64, macos, etc.

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition.
One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on.
Hosted Github CI Runners already include all that.

hmartinez82 · 2024-05-15T19:32:00Z

@max-krasnyansky Now who's going to be the good samaritan and add support for the 8cx NPU😅. It has MATMUL support I think .

max-krasnyansky · 2024-05-16T02:45:05Z

@slaren Please don't forget to hit that merge button :)
Would be good to avoid further rebases while all checks are passing.
I wanted to retest released binaries and will then submit README and further updates.

… MSVC (ggerganov#7191) * logging: add proper checks for clang to avoid errors and warnings with VA_ARGS * build: add CMake Presets and toolchian files for Windows ARM64 * matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings * ci: add support for optimized Windows ARM64 builds with MSVC and LLVM * matmul-int8: fixed typos in q8_0_q8_0 matmuls Co-authored-by: Georgi Gerganov <[email protected]> * matmul-int8: remove unnecessary casts in q8_0_q8_0 --------- Co-authored-by: Georgi Gerganov <[email protected]>

teleprint-me · 2024-05-28T00:49:59Z

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

hmartinez82 · 2024-05-28T01:53:12Z

Yes, same here. It forces you to select one of the presets, right?

teleprint-me · 2024-05-28T01:55:55Z

Yes, it does. Every time. Once I name it, it overwrites the file. The branch is affected afterwards as a result.

max-krasnyansky · 2024-05-28T04:19:47Z

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

Odd. I don't use Visual Studio Code but it seems to me like a settings issue.
CMake Presets is the standard CMake feature which has nothing to do with the IDEs / UIs.
https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html

teleprint-me · 2024-05-28T18:58:46Z

CMakePresets.json

+        "name": "arm64-windows-msvc", "hidden": true,
+        "architecture": { "value": "arm64",       "strategy": "external" },
+        "toolset":      { "value": "host=x86_64", "strategy": "external" },
+        "cacheVariables": {
+            "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-msvc.cmake"
+        }


@max-krasnyansky This is windows specific.

teleprint-me · 2024-05-28T18:59:08Z

CMakePresets.json

+        "name": "arm64-windows-llvm", "hidden": true,
+        "architecture": { "value": "arm64",       "strategy": "external" },
+        "toolset":      { "value": "host=x86_64", "strategy": "external" },
+        "cacheVariables": {
+            "CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake"


@max-krasnyansky This is windows specific.

teleprint-me · 2024-05-28T18:59:25Z

CMakePresets.json

+    { "name": "arm64-windows-llvm-debug"  , "inherits": [ "base", "arm64-windows-llvm",  "debug"   ] },
+    { "name": "arm64-windows-llvm-release", "inherits": [ "base", "arm64-windows-llvm",  "release" ] },
+    { "name": "arm64-windows-llvm+static-release", "inherits": [ "base", "arm64-windows-llvm",  "release", "static" ] },
+
+    { "name": "arm64-windows-msvc-debug"  , "inherits": [ "base", "arm64-windows-msvc",  "debug"   ] },
+    { "name": "arm64-windows-msvc-release", "inherits": [ "base", "arm64-windows-msvc",  "release" ] },
+    { "name": "arm64-windows-msvc+static-release", "inherits": [ "base", "arm64-windows-msvc",  "release", "static" ] }


@max-krasnyansky This is windows specific.

teleprint-me · 2024-05-28T19:01:32Z

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

max-krasnyansky · 2024-05-28T19:11:49Z

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

Please see the CMake documentation link I included above.

And yes, the things you listed are windows specific, that's the whole point, we added native windows arm64 build ;-)
I will submit submit additional ubuntu, android and macos presets later.

teleprint-me · 2024-05-28T21:28:26Z

I did read it. It doesn't change the fact that these settings are system specific. This file should be ignored.

slaren · 2024-05-28T21:32:35Z

I am not sure that we need to make changes to accommodate what seems to be a buggy or misconfigured VS Code extension. FWITW I use VS Code, but not the cmake extension, because I always found it more annoying than useful.

hmartinez82 · 2024-05-28T21:34:26Z

@slaren I have to concur with you. The CMake extension should not force us to use the presets just because they happen to be in the file system.

teleprint-me · 2024-05-28T22:36:14Z

@slaren These are system specific settings. They are settings geared towards ARM builds on Microsoft Windows. While the settings can be inclusive, it doesn't change the current state of the file. I respect your opinion and input. I have nothing left to say or add to this discussion. I stand by what I've said.

mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs devops improvements to build systems and github actions labels May 10, 2024

ggerganov reviewed May 11, 2024

View reviewed changes

ggml-quants.c Outdated Show resolved Hide resolved

ggml-quants.c Outdated Show resolved Hide resolved

ggml-quants.c Outdated Show resolved Hide resolved

hmartinez82 reviewed May 12, 2024

View reviewed changes

cmake/arm64-windows-llvm.cmake Show resolved Hide resolved

max-krasnyansky force-pushed the win-arm64-build branch from 34715fb to 34b669c Compare May 13, 2024 04:42

max-krasnyansky and others added 5 commits May 14, 2024 20:35

logging: add proper checks for clang to avoid errors and warnings wit…

ff48f5a

…h VA_ARGS

build: add CMake Presets and toolchian files for Windows ARM64

7d46953

matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings

18d20c0

ci: add support for optimized Windows ARM64 builds with MSVC and LLVM

e838a3d

matmul-int8: fixed typos in q8_0_q8_0 matmuls

3978014

Co-authored-by: Georgi Gerganov <[email protected]>

max-krasnyansky force-pushed the win-arm64-build branch from 34b669c to 3978014 Compare May 15, 2024 03:41

max-krasnyansky requested a review from ggerganov May 15, 2024 03:51

slaren reviewed May 15, 2024

View reviewed changes

ggml-quants.c Outdated Show resolved Hide resolved

matmul-int8: remove unnecessary casts in q8_0_q8_0

ece01fc

slaren approved these changes May 15, 2024

View reviewed changes

ggerganov approved these changes May 15, 2024

View reviewed changes

mofosyne merged commit 13ad16a into ggerganov:master May 16, 2024
67 checks passed

teleprint-me reviewed May 28, 2024

View reviewed changes

max-krasnyansky mentioned this pull request Jul 17, 2024

Improvements for running on Windows with Snapdragon X #8531

Merged

3 tasks

Septa2112 mentioned this pull request Aug 8, 2024

Add support for cpu_get_num_physical_cores() on Windows #8771

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

max-krasnyansky commented May 10, 2024 •

edited

Loading

github-actions bot commented May 10, 2024 •

edited

Loading

hmartinez82 commented May 11, 2024

max-krasnyansky commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 13, 2024 •

edited

Loading

max-krasnyansky commented May 13, 2024

hmartinez82 commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 13, 2024 •

edited

Loading

max-krasnyansky commented May 13, 2024

hmartinez82 commented May 14, 2024

max-krasnyansky commented May 15, 2024

slaren commented May 15, 2024

max-krasnyansky commented May 15, 2024

hmartinez82 commented May 15, 2024 •

edited

Loading

max-krasnyansky commented May 16, 2024

teleprint-me commented May 28, 2024 •

edited

Loading

hmartinez82 commented May 28, 2024

teleprint-me commented May 28, 2024

max-krasnyansky commented May 28, 2024

teleprint-me May 28, 2024 •

edited

Loading

teleprint-me May 28, 2024 •

edited

Loading

teleprint-me May 28, 2024 •

edited

Loading

teleprint-me commented May 28, 2024

max-krasnyansky commented May 28, 2024

teleprint-me commented May 28, 2024

slaren commented May 28, 2024

hmartinez82 commented May 28, 2024

teleprint-me commented May 28, 2024

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

Conversation

max-krasnyansky commented May 10, 2024 • edited Loading

github-actions bot commented May 10, 2024 • edited Loading

hmartinez82 commented May 11, 2024

max-krasnyansky commented May 13, 2024 • edited Loading

hmartinez82 commented May 13, 2024 • edited Loading

max-krasnyansky commented May 13, 2024

hmartinez82 commented May 13, 2024 • edited Loading

hmartinez82 commented May 13, 2024 • edited Loading

max-krasnyansky commented May 13, 2024

hmartinez82 commented May 14, 2024

max-krasnyansky commented May 15, 2024

slaren commented May 15, 2024

max-krasnyansky commented May 15, 2024

hmartinez82 commented May 15, 2024 • edited Loading

max-krasnyansky commented May 16, 2024

teleprint-me commented May 28, 2024 • edited Loading

hmartinez82 commented May 28, 2024

teleprint-me commented May 28, 2024

max-krasnyansky commented May 28, 2024

teleprint-me May 28, 2024 • edited Loading

Choose a reason for hiding this comment

teleprint-me May 28, 2024 • edited Loading

Choose a reason for hiding this comment

teleprint-me May 28, 2024 • edited Loading

Choose a reason for hiding this comment

teleprint-me commented May 28, 2024

max-krasnyansky commented May 28, 2024

teleprint-me commented May 28, 2024

slaren commented May 28, 2024

hmartinez82 commented May 28, 2024

teleprint-me commented May 28, 2024

max-krasnyansky commented May 10, 2024 •

edited

Loading

github-actions bot commented May 10, 2024 •

edited

Loading

max-krasnyansky commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 13, 2024 •

edited

Loading

hmartinez82 commented May 15, 2024 •

edited

Loading

teleprint-me commented May 28, 2024 •

edited

Loading

teleprint-me May 28, 2024 •

edited

Loading

teleprint-me May 28, 2024 •

edited

Loading

teleprint-me May 28, 2024 •

edited

Loading