Skip to content

Commit

Permalink
Support multiple GPUs (split mode) on SYCL backend (#5806)
Browse files Browse the repository at this point in the history
* suport multiple cards: split-mode - layer|row

* rm warning

* rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test

* update news

* fix merge error

* update according to review comments
  • Loading branch information
NeoZhangJianyu authored Mar 2, 2024
1 parent 9bf297a commit 7156413
Show file tree
Hide file tree
Showing 8 changed files with 1,534 additions and 842 deletions.
21 changes: 21 additions & 0 deletions README-sycl.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# llama.cpp for SYCL

- [Background](#background)
- [News](#news)
- [OS](#os)
- [Intel GPU](#intel-gpu)
- [Docker](#docker)
Expand All @@ -25,6 +26,21 @@ The llama.cpp for SYCL is used to support Intel GPUs.

For Intel CPU, recommend to use llama.cpp for X86 (Intel MKL building).

## News

- 2024.3
- Support multiple cards: **--split-mode**: [none|layer]; not support [row], it's on developing.
- Support to assign main GPU by **--main-gpu**, replace $GGML_SYCL_DEVICE.
- Support detecting all GPUs with level-zero and same top **Max compute units**.
- Support OPs
- hardsigmoid
- hardswish
- pool2d

- 2024.1
- Create SYCL backend for Intel GPU.
- Support Windows build

## OS

|OS|Status|Verified|
Expand Down Expand Up @@ -449,6 +465,7 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device
|-|-|-|
|GGML_SYCL_DEVICE|0 (default) or 1|Set the device id used. Check the device ids by default running output|
|GGML_SYCL_DEBUG|0 (default) or 1|Enable log function by macro: GGML_SYCL_DEBUG|
|ZES_ENABLE_SYSMAN| 0 (default) or 1|Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory.<br>Recommended to use when --split-mode = layer|

## Known Issue

Expand All @@ -458,6 +475,10 @@ Using device **0** (Intel(R) Arc(TM) A770 Graphics) as main device

Solution: add **--no-mmap** or **--mmap 0**.

- Split-mode: [row] is not supported

It's on developing.

## Q&A

- Error: `error while loading shared libraries: libsycl.so.7: cannot open shared object file: No such file or directory`.
Expand Down
4 changes: 4 additions & 0 deletions common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -640,6 +640,10 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
} else if (arg_next == "layer") {
params.split_mode = LLAMA_SPLIT_MODE_LAYER;
} else if (arg_next == "row") {
#ifdef GGML_USE_SYCL
fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n");
exit(1);
#endif // GGML_USE_SYCL
params.split_mode = LLAMA_SPLIT_MODE_ROW;
} else {
invalid_param = true;
Expand Down
17 changes: 6 additions & 11 deletions examples/llama-bench/llama-bench.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -123,20 +123,15 @@ static std::string get_gpu_info() {
}
#endif
#ifdef GGML_USE_SYCL
int device_list[GGML_SYCL_MAX_DEVICES];
ggml_sycl_get_gpu_list(device_list, GGML_SYCL_MAX_DEVICES);

for (int i = 0; i < GGML_SYCL_MAX_DEVICES; i++) {
if (device_list[i] >0 ){
char buf[128];
ggml_sycl_get_device_description(i, buf, sizeof(buf));
id += buf;
int count = ggml_backend_sycl_get_device_count();
for (int i = 0; i < count; i++) {
char buf[128];
ggml_sycl_get_device_description(i, buf, sizeof(buf));
id += buf;
if (i < count - 1) {
id += "/";
}
}
if (id.length() >2 ) {
id.pop_back();
}
#endif
// TODO: other backends
return id;
Expand Down
2 changes: 1 addition & 1 deletion examples/sycl/ls-sycl-device.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

#include "ggml-sycl.h"

int main(int argc, char ** argv) {
int main() {
ggml_backend_sycl_print_sycl_devices();
return 0;
}
17 changes: 12 additions & 5 deletions examples/sycl/run-llama2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,19 @@ INPUT2="Building a website can be done in 10 simple steps:\nStep 1:"
source /opt/intel/oneapi/setvars.sh

if [ $# -gt 0 ]; then
export GGML_SYCL_DEVICE=$1
GGML_SYCL_DEVICE=$1
else
export GGML_SYCL_DEVICE=0
GGML_SYCL_DEVICE=0
fi
echo GGML_SYCL_DEVICE=$GGML_SYCL_DEVICE
echo "use $GGML_SYCL_DEVICE as main GPU"
#export GGML_SYCL_DEBUG=1
./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0
#./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 5 -e -ngl 33 -t 1 -s 0


#ZES_ENABLE_SYSMAN=1, Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory. Recommended to use when --split-mode = layer.

#use all GPUs with same max compute units
ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0

#use main GPU only
#ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0 -mg $GGML_SYCL_DEVICE -sm none

Loading

0 comments on commit 7156413

Please sign in to comment.