feat: bump llama.rn version to enable runtime repacking for Q4_0 #160

a-ghorbani · 2025-01-06T18:32:23Z

Description

This PR adds -DLM_GGML_USE_CPU_AARCH64 compile definition to Android CMake build, which in turn enables runtime tensor repacking optimization for Q4_0 quantization on ARM64.

The screenshot from PocketPal's benchmarking before/after setting -DLM_GGML_USE_CPU_AARCH64:

Platform Affected

iOS
Android

Checklist

Necessary comments have been made.
I have tested this change on:
- iOS Simulator/Device
- Android Emulator/Device
Unit tests and integration tests pass locally.

feat: bump llama.rn version to enable runtime repacking for Q4_0

92c0b45

a-ghorbani marked this pull request as ready for review January 6, 2025 18:32

a-ghorbani merged commit 6c24871 into main Jan 6, 2025
1 check passed

a-ghorbani deleted the feat/android-aarch64-runtime-repack branch January 6, 2025 18:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: bump llama.rn version to enable runtime repacking for Q4_0 #160

feat: bump llama.rn version to enable runtime repacking for Q4_0 #160

a-ghorbani commented Jan 6, 2025 •

edited

Loading

feat: bump llama.rn version to enable runtime repacking for Q4_0 #160

feat: bump llama.rn version to enable runtime repacking for Q4_0 #160

Conversation

a-ghorbani commented Jan 6, 2025 • edited Loading

Description

Platform Affected

Checklist

a-ghorbani commented Jan 6, 2025 •

edited

Loading