RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF #38

andreasbinder · 2023-09-16T08:46:20Z

Good Day!
I tried to run the GSM8k example with the model from HF as you described: (only adjust the log and prompt paths)

  CUDA_VISIBLE_DEVICES=0,1 python examples/rap_gsm8k/inference.py --base_lm hf --hf_path meta-llama/Llama-2-70b-hf --hf_peft_path None --hf_quantized 'nf4'

However, I receive the following error

    RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I think this is related to the warning also mentioned in the log trace:

    llm-reasoners/reasoners/lm/hf_model.py:137: UserWarning: the eos_token '\n' is encoded into [29871, 13] with length != 1, using 13 as the eos_token_id
  warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '

When searching on GitHub, I think it is related to input mismatching due to some false tokenisation 1 2 3.
Did you also encounter this problem or how did you go about it?
I will try the other versions of Llama in the meantime.

I am using transformers 4.33.1

Thx!

The text was updated successfully, but these errors were encountered:

Ber666 · 2023-09-18T16:36:08Z

Hi, for the CUDA error, could you try following the message? For debugging consider passing CUDA_LAUNCH_BLOCKING=1

The warning you showed shouldn't matter. It's expected in this example. We want the generation to stop at \n, and 13 is the token index of \n. For some reason, it's encoded into 2 tokens ([29871, 13]), so we just use 13 as the eos_token.

Rem1L · 2023-10-06T15:57:11Z

please send us with more detailed information of error since the RuntimeError and warning cannot provide enough information. we are delighted to help you with our work:p

andreasbinder · 2023-10-20T08:58:34Z

Hi! I am sorry for the late reply :(
I worked with TheBloke/Llama-2-13B-GPTQ for most experiments so far. I now tried Llama 2 again, and I did not run into a problem this time ^^
In case I find the encounter the error and the corresponding solution too, I will let you know!

Ber666 closed this as completed Sep 18, 2023

Ber666 reopened this Sep 18, 2023

andreasbinder closed this as completed Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF #38

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF #38

andreasbinder commented Sep 16, 2023 •

edited

Loading

Ber666 commented Sep 18, 2023

Rem1L commented Oct 6, 2023

andreasbinder commented Oct 20, 2023

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF #38

RuntimeError: CUDA error: device-side assert triggered when using Llama 2 from HF #38

Comments

andreasbinder commented Sep 16, 2023 • edited Loading

Ber666 commented Sep 18, 2023

Rem1L commented Oct 6, 2023

andreasbinder commented Oct 20, 2023

andreasbinder commented Sep 16, 2023 •

edited

Loading