Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC #7191

Merged
merged 6 commits into from
May 16, 2024

Conversation

max-krasnyansky
Copy link
Collaborator

@max-krasnyansky max-krasnyansky commented May 10, 2024

Currently Windows ARM64 builds are not properly optimized, which results in low token
rates on Windows ARM64 platforms such as the upcoming Snapgradon X-Elite & Plus.

This update adds / resolves the following things:

  • Fixes MSVC & Clang warnings & errors in the logging code
  • Adds proper MatMul-INT8 support detection when building with MSVC for ARM64
  • Fixes errors in MatMul-INT8 when compiled with MSVC, which also fixes warnings with Clang,
    and improves MatMul-INT8 NEON intrinsics usage in general
  • Adds CMake Toolchain files for Windows ARM64 MSVC and LLVM builds
    We're using LLVM 16.x included in MS Visual Studio 2022
  • Updates GitHub Actions build workflow to produce optimized Windows ARM64 builds
    All Windows cmake build targets now explicitly say x64 or arm64

Here are some before/after token rates from a Snapdragon X-Elite-based laptop.

llama-v2-7B, q4_0, CPU backend, 6 threads

Prebuilt Release (master)   | prompt-eval: 34-35 t/s | eval:   4-6 t/s
This PR (MSVC)              | prompt-eval: 60-62 t/s | eval: 10-11 t/s
This PR (LLVM/Clang)        | prompt-eval: 70-72 t/s | eval: 20-21 t/s

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition.
One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on.
Hosted Github CI Runners already include all that.

@mofosyne mofosyne added Review Complexity : High Generally require indepth knowledge of LLMs or GPUs devops improvements to build systems and github actions labels May 10, 2024
Copy link
Contributor

github-actions bot commented May 10, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 541 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8614.13ms p(95)=20803.78ms fails=, finish reason: stop=489 truncated=52
  • Prompt processing (pp): avg=96.96tk/s p(95)=402.6tk/s
  • Token generation (tg): avg=71.42tk/s p(95)=47.97tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=win-arm64-build commit=ece01fc2e99570f240ecc9a65f3e4f3df216e827

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 730.36, 730.36, 730.36, 730.36, 730.36, 833.8, 833.8, 833.8, 833.8, 833.8, 836.46, 836.46, 836.46, 836.46, 836.46, 836.11, 836.11, 836.11, 836.11, 836.11, 851.13, 851.13, 851.13, 851.13, 851.13, 847.98, 847.98, 847.98, 847.98, 847.98, 867.38, 867.38, 867.38, 867.38, 867.38, 872.2, 872.2, 872.2, 872.2, 872.2, 865.8, 865.8, 865.8, 865.8, 865.8, 878.85, 878.85, 878.85, 878.85, 878.85, 882.61, 882.61, 882.61, 882.61, 882.61, 868.55, 868.55, 868.55, 868.55, 868.55, 878.7, 878.7, 878.7, 878.7, 878.7, 863.95, 863.95, 863.95, 863.95, 863.95, 817.54, 817.54, 817.54, 817.54, 817.54, 822.79, 822.79, 822.79, 822.79, 822.79, 821.63, 821.63, 821.63, 821.63, 821.63, 829.02, 829.02, 829.02, 829.02, 829.02, 838.95, 838.95, 838.95, 838.95, 838.95, 837.14, 837.14, 837.14, 837.14, 837.14, 837.5, 837.5, 837.5, 837.5, 837.5, 841.08, 841.08, 841.08, 841.08, 841.08, 843.53, 843.53, 843.53, 843.53, 843.53, 839.73, 839.73, 839.73, 839.73, 839.73, 837.97, 837.97, 837.97, 837.97, 837.97, 840.42, 840.42, 840.42, 840.42, 840.42, 856.32, 856.32, 856.32, 856.32, 856.32, 855.65, 855.65, 855.65, 855.65, 855.65, 855.94, 855.94, 855.94, 855.94, 855.94, 857.43, 857.43, 857.43, 857.43, 857.43, 860.59, 860.59, 860.59, 860.59, 860.59, 857.14, 857.14, 857.14, 857.14, 857.14, 859.04, 859.04, 859.04, 859.04, 859.04, 870.52, 870.52, 870.52, 870.52, 870.52, 872.6, 872.6, 872.6, 872.6, 872.6, 873.58, 873.58, 873.58, 873.58, 873.58, 869.71, 869.71, 869.71, 869.71, 869.71, 866.43, 866.43, 866.43, 866.43, 866.43, 865.63, 865.63, 865.63, 865.63, 865.63, 868.0, 868.0, 868.0, 868.0, 868.0, 867.91, 867.91, 867.91, 867.91, 867.91, 874.67, 874.67, 874.67, 874.67, 874.67, 870.39, 870.39, 870.39, 870.39, 870.39, 870.82, 870.82, 870.82, 870.82, 870.82, 869.14, 869.14, 869.14, 869.14, 869.14, 866.61, 866.61, 866.61, 866.61, 866.61, 861.23, 861.23, 861.23, 861.23, 861.23, 863.26, 863.26, 863.26, 863.26, 863.26, 865.62, 865.62, 865.62, 865.62, 865.62, 865.15, 865.15, 865.15, 865.15, 865.15, 864.2, 864.2, 864.2, 864.2, 864.2, 865.5, 865.5, 865.5, 865.5, 865.5, 869.2, 869.2, 869.2, 869.2, 869.2, 872.0, 872.0, 872.0, 872.0, 872.0, 867.09, 867.09, 867.09, 867.09, 867.09, 868.79, 868.79, 868.79, 868.79, 868.79, 868.2, 868.2, 868.2, 868.2, 868.2, 868.92, 868.92, 868.92, 868.92, 868.92, 869.58, 869.58, 869.58, 869.58, 869.58, 870.46, 870.46, 870.46, 870.46]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 45.25, 45.25, 45.25, 45.25, 45.25, 27.07, 27.07, 27.07, 27.07, 27.07, 30.64, 30.64, 30.64, 30.64, 30.64, 32.14, 32.14, 32.14, 32.14, 32.14, 33.03, 33.03, 33.03, 33.03, 33.03, 34.3, 34.3, 34.3, 34.3, 34.3, 35.15, 35.15, 35.15, 35.15, 35.15, 35.18, 35.18, 35.18, 35.18, 35.18, 35.06, 35.06, 35.06, 35.06, 35.06, 34.09, 34.09, 34.09, 34.09, 34.09, 34.08, 34.08, 34.08, 34.08, 34.08, 33.83, 33.83, 33.83, 33.83, 33.83, 32.59, 32.59, 32.59, 32.59, 32.59, 32.58, 32.58, 32.58, 32.58, 32.58, 31.29, 31.29, 31.29, 31.29, 31.29, 30.84, 30.84, 30.84, 30.84, 30.84, 30.54, 30.54, 30.54, 30.54, 30.54, 30.57, 30.57, 30.57, 30.57, 30.57, 30.36, 30.36, 30.36, 30.36, 30.36, 30.29, 30.29, 30.29, 30.29, 30.29, 30.21, 30.21, 30.21, 30.21, 30.21, 30.17, 30.17, 30.17, 30.17, 30.17, 30.18, 30.18, 30.18, 30.18, 30.18, 30.28, 30.28, 30.28, 30.28, 30.28, 30.2, 30.2, 30.2, 30.2, 30.2, 30.47, 30.47, 30.47, 30.47, 30.47, 30.52, 30.52, 30.52, 30.52, 30.52, 30.58, 30.58, 30.58, 30.58, 30.58, 30.75, 30.75, 30.75, 30.75, 30.75, 30.95, 30.95, 30.95, 30.95, 30.95, 31.02, 31.02, 31.02, 31.02, 31.02, 31.1, 31.1, 31.1, 31.1, 31.1, 31.18, 31.18, 31.18, 31.18, 31.18, 31.26, 31.26, 31.26, 31.26, 31.26, 31.17, 31.17, 31.17, 31.17, 31.17, 31.03, 31.03, 31.03, 31.03, 31.03, 30.07, 30.07, 30.07, 30.07, 30.07, 30.12, 30.12, 30.12, 30.12, 30.12, 30.26, 30.26, 30.26, 30.26, 30.26, 30.31, 30.31, 30.31, 30.31, 30.31, 30.48, 30.48, 30.48, 30.48, 30.48, 30.6, 30.6, 30.6, 30.6, 30.6, 30.59, 30.59, 30.59, 30.59, 30.59, 30.37, 30.37, 30.37, 30.37, 30.37, 30.21, 30.21, 30.21, 30.21, 30.21, 29.11, 29.11, 29.11, 29.11, 29.11, 28.97, 28.97, 28.97, 28.97, 28.97, 29.03, 29.03, 29.03, 29.03, 29.03, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.24, 29.22, 29.22, 29.22, 29.22, 29.22, 29.23, 29.23, 29.23, 29.23, 29.23, 29.12, 29.12, 29.12, 29.12, 29.12, 29.21, 29.21, 29.21, 29.21, 29.21, 29.2, 29.2, 29.2, 29.2, 29.2, 29.25, 29.25, 29.25, 29.25, 29.25, 29.39, 29.39, 29.39, 29.39, 29.39, 29.52, 29.52, 29.52, 29.52, 29.52, 29.56, 29.56, 29.56, 29.56, 29.56, 29.64, 29.64, 29.64, 29.64]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.29, 0.29, 0.29, 0.29, 0.29, 0.25, 0.25, 0.25, 0.25, 0.25, 0.14, 0.14, 0.14, 0.14, 0.14, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.22, 0.22, 0.22, 0.22, 0.22, 0.17, 0.17, 0.17, 0.17, 0.17, 0.22, 0.22, 0.22, 0.22, 0.22, 0.2, 0.2, 0.2, 0.2, 0.2, 0.35, 0.35, 0.35, 0.35, 0.35, 0.22, 0.22, 0.22, 0.22, 0.22, 0.42, 0.42, 0.42, 0.42, 0.42, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.21, 0.21, 0.21, 0.21, 0.21, 0.19, 0.19, 0.19, 0.19, 0.19, 0.22, 0.22, 0.22, 0.22, 0.22, 0.24, 0.24, 0.24, 0.24, 0.24, 0.2, 0.2, 0.2, 0.2, 0.2, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.29, 0.29, 0.29, 0.29, 0.29, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2, 0.21, 0.21, 0.21, 0.21, 0.21, 0.31, 0.31, 0.31, 0.31, 0.31, 0.51, 0.51, 0.51, 0.51, 0.51, 0.34, 0.34, 0.34, 0.34, 0.34, 0.18, 0.18, 0.18, 0.18, 0.18, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.28, 0.28, 0.28, 0.28, 0.28, 0.45, 0.45, 0.45, 0.45, 0.45, 0.57, 0.57, 0.57, 0.57, 0.57, 0.48, 0.48, 0.48, 0.48, 0.48, 0.43, 0.43, 0.43, 0.43, 0.43, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.19, 0.21, 0.21, 0.21, 0.21, 0.21, 0.18, 0.18, 0.18, 0.18, 0.18, 0.19, 0.19, 0.19, 0.19, 0.19, 0.2, 0.2, 0.2, 0.2, 0.2, 0.08, 0.08, 0.08, 0.08, 0.08, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.13, 0.13, 0.13, 0.13]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 541 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715791145 --> 1715791771
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0, 3.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 3.0, 3.0, 3.0, 3.0]
                    
Loading

ggml-quants.c Outdated Show resolved Hide resolved
ggml-quants.c Outdated Show resolved Hide resolved
ggml-quants.c Outdated Show resolved Hide resolved
@hmartinez82
Copy link

OMG. Thank you for this. Do you think that the 8cx Gen 3 will benefit from these changes? Also. Would support QNN for Windows be too complicated?

@max-krasnyansky
Copy link
Collaborator Author

max-krasnyansky commented May 13, 2024

@ggerganov
Thanks for fixing up q8_0_q8_0 (good eyes, it was a cut&paste error that I missed and CI didn't catch).
Should be good to merge now. I have more updates coming for the readme, and further ARM64 optimizations but waiting to merge this basic build/fixes stuff first.
Rebased / retested on top of the latest master.

@hmartinez82
Copy link

hmartinez82 commented May 13, 2024

@max-krasnyansky Understood.
Just for reference I have an 8cx Gen 3. I was able to get matmul working by using -march=armv8.3-a+dotprod+i8mm. I did notice a jump in the prompt eval speed.

@max-krasnyansky
Copy link
Collaborator Author

@hmartinez82

8cx Gen 3

Interesting. I didn't know int8 matmul works on 8cx gen3. That's great!
Can you please try running armv8.7-a compiled binaries as is? It might just work since we're technically not using other extensions (at least not explicitly).
If that doesn't work (ie you get a segfault due to unsupported instructions) please try
-march=armv8.4-a+dotprod+i8mm
If that works we could use that instead of armv8.7-a as the common set.

@hmartinez82
Copy link

hmartinez82 commented May 13, 2024

Well. I built it with -march=armv8.7-a and it worked with llama3, but not llama2 😑
When loading llama2, it crashes with:

llama_new_context_with_model: graph nodes  = 1030
llama_new_context_with_model: graph splits = 1
Illegal instruction

I'm trying -march=armv8.4-a+dotprod+i8mm now

@hmartinez82
Copy link

hmartinez82 commented May 13, 2024

@max-krasnyansky Ok bad news. I should have sticked with llama2 while testing.

I don't understand why (this is completely out of my league), but llama3 works even when compiling with armv8.7-a. It crashes when using llama2 as the model, even with -march=armv8.3-a+dotprod+i8mm . If I remove +i8mm then llama2 works .

In other words my lack of domain here led me to speak too soon. I couldn't imagine that different models would lead to different CPUs instructions being used 😓

@max-krasnyansky
Copy link
Collaborator Author

@hmartinez82
If you use the same quantization (q4_0) for both llama 2 and 3 then they would both use matmul-int8 (if enabled).
It's probably crashing in some other code path / other instruction used by the compiler.
Did you try -march=armv8.2-a+dotprod+i8mm ?
That'd be also good enough to get full rates on X-Elite (with llama 2 and 3 in q4_0).

@hmartinez82
Copy link

Here's my llama2

llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW)
llm_load_print_meta: general.name     = LLaMA v2

and here's my llama3

llm_load_print_meta: model type       = 8B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 8.03 B
llm_load_print_meta: model size       = 4.58 GiB (4.89 BPW)
llm_load_print_meta: general.name     = Meta-Llama-3-8B-Instruct-imatrix

I'm going to download Q4_0 of llama3.

But anyway. -march=armv8.2-a+dotprod+i8mm still crashes. I guess this confirms. 8cx Gen 3 does not support +i8mm

@max-krasnyansky
Copy link
Collaborator Author

@ggerganov
Any objections to merging this? Please let me know if you have any questions/suggestions.

@max-krasnyansky max-krasnyansky requested a review from ggerganov May 15, 2024 03:51
ggml-quants.c Outdated Show resolved Hide resolved
@slaren
Copy link
Collaborator

slaren commented May 15, 2024

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

@max-krasnyansky
Copy link
Collaborator Author

Could you add some documentation about how to use the CMakePresets.json file? A comment in the PR description is enough. If I understand correctly, this is not being used in any of the CI builds, but rather is meant to provide a set of presets for people building with MSVC. Is that correct?

Ah. I'm going to add a full section in readme how to build native Windows ARM64.
And yes, you are correct. I was going to use presets in the CI as well but figured to start with it's more consistent to just explicitly specify CMAKE_TOOLCHAIN and things.
If you guys like the CMakePresets I have them for ubuntu-x64, macos, etc.

Here is how to build with LLVM/Clang using CMake Presets:

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-llvm-release
...
src\llama.cpp> cmake --build build-arm64-windows-llvm-release
...
src\llama.cpp> cmake --install build-arm64-windows-llvm-release --prefix pkg-arm64-windows-llvm

Here is how to build with MSVC

# from MS V.Stdio dev shell
src\llama.cpp> cmake --preset arm64-windows-msvc-release
...
src\llama.cpp> cmake --build build-arm64-windows-msvc-release
...
src\llama.cpp>cmake --install build-arm64-windows-msvc-release --prefix pkg-arm64-windows-msvc

This all works with MS Visual Studio 2022 Community Edition.
One just needs to enable all native ARM64 related features, and install LLVM/Clang add-on.
Hosted Github CI Runners already include all that.

@hmartinez82
Copy link

hmartinez82 commented May 15, 2024

@max-krasnyansky Now who's going to be the good samaritan and add support for the 8cx NPU😅. It has MATMUL support I think .

@max-krasnyansky
Copy link
Collaborator Author

@slaren Please don't forget to hit that merge button :)
Would be good to avoid further rebases while all checks are passing.
I wanted to retest released binaries and will then submit README and further updates.

@mofosyne mofosyne merged commit 13ad16a into ggerganov:master May 16, 2024
67 checks passed
teleprint-me pushed a commit to teleprint-me/llama.cpp that referenced this pull request May 17, 2024
… MSVC (ggerganov#7191)

* logging: add proper checks for clang to avoid errors and warnings with VA_ARGS

* build: add CMake Presets and toolchian files for Windows ARM64

* matmul-int8: enable matmul-int8 with MSVC and fix Clang warnings

* ci: add support for optimized Windows ARM64 builds with MSVC and LLVM

* matmul-int8: fixed typos in q8_0_q8_0 matmuls

Co-authored-by: Georgi Gerganov <[email protected]>

* matmul-int8: remove unnecessary casts in q8_0_q8_0

---------

Co-authored-by: Georgi Gerganov <[email protected]>
@teleprint-me
Copy link
Contributor

teleprint-me commented May 28, 2024

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

@hmartinez82
Copy link

Yes, same here. It forces you to select one of the presets, right?

@teleprint-me
Copy link
Contributor

Yes, it does. Every time. Once I name it, it overwrites the file. The branch is affected afterwards as a result.

@max-krasnyansky
Copy link
Collaborator Author

The CMakePresets.json file has been giving me issues. Visual Studio Code is available on all OSs and this is setup specifically for Windows. I'm now greeted with a prompt for it every time and Visual Studio Code attempts to overwrite it which creates conflicts. System specific configurations should be separated or inclusive.

Odd. I don't use Visual Studio Code but it seems to me like a settings issue.
CMake Presets is the standard CMake feature which has nothing to do with the IDEs / UIs.
https://cmake.org/cmake/help/latest/manual/cmake-presets.7.html

Comment on lines +20 to +25
"name": "arm64-windows-msvc", "hidden": true,
"architecture": { "value": "arm64", "strategy": "external" },
"toolset": { "value": "host=x86_64", "strategy": "external" },
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-msvc.cmake"
}
Copy link
Contributor

@teleprint-me teleprint-me May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-krasnyansky This is windows specific.

Comment on lines +29 to +33
"name": "arm64-windows-llvm", "hidden": true,
"architecture": { "value": "arm64", "strategy": "external" },
"toolset": { "value": "host=x86_64", "strategy": "external" },
"cacheVariables": {
"CMAKE_TOOLCHAIN_FILE": "${sourceDir}/cmake/arm64-windows-llvm.cmake"
Copy link
Contributor

@teleprint-me teleprint-me May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-krasnyansky This is windows specific.

Comment on lines +37 to +43
{ "name": "arm64-windows-llvm-debug" , "inherits": [ "base", "arm64-windows-llvm", "debug" ] },
{ "name": "arm64-windows-llvm-release", "inherits": [ "base", "arm64-windows-llvm", "release" ] },
{ "name": "arm64-windows-llvm+static-release", "inherits": [ "base", "arm64-windows-llvm", "release", "static" ] },

{ "name": "arm64-windows-msvc-debug" , "inherits": [ "base", "arm64-windows-msvc", "debug" ] },
{ "name": "arm64-windows-msvc-release", "inherits": [ "base", "arm64-windows-msvc", "release" ] },
{ "name": "arm64-windows-msvc+static-release", "inherits": [ "base", "arm64-windows-msvc", "release", "static" ] }
Copy link
Contributor

@teleprint-me teleprint-me May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@max-krasnyansky This is windows specific.

@teleprint-me
Copy link
Contributor

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

@max-krasnyansky
Copy link
Collaborator Author

@max-krasnyansky These are usually auto-generated, but can be hand-crafted.

Please see the CMake documentation link I included above.

And yes, the things you listed are windows specific, that's the whole point, we added native windows arm64 build ;-)
I will submit submit additional ubuntu, android and macos presets later.

@teleprint-me
Copy link
Contributor

I did read it. It doesn't change the fact that these settings are system specific. This file should be ignored.

@slaren
Copy link
Collaborator

slaren commented May 28, 2024

I am not sure that we need to make changes to accommodate what seems to be a buggy or misconfigured VS Code extension. FWITW I use VS Code, but not the cmake extension, because I always found it more annoying than useful.

@hmartinez82
Copy link

@slaren I have to concur with you. The CMake extension should not force us to use the presets just because they happen to be in the file system.

@teleprint-me
Copy link
Contributor

@slaren These are system specific settings. They are settings geared towards ARM builds on Microsoft Windows. While the settings can be inclusive, it doesn't change the current state of the file. I respect your opinion and input. I have nothing left to say or add to this discussion. I stand by what I've said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions Review Complexity : High Generally require indepth knowledge of LLMs or GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants