You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I think this is related to the warning also mentioned in the log trace:
llm-reasoners/reasoners/lm/hf_model.py:137: UserWarning: the eos_token '\n' is encoded into [29871, 13] with length != 1, using 13 as the eos_token_id
warnings.warn(f'the eos_token {repr(token)} is encoded into {tokenized} with length != 1, '
When searching on GitHub, I think it is related to input mismatching due to some false tokenisation 123.
Did you also encounter this problem or how did you go about it?
I will try the other versions of Llama in the meantime.
I am using transformers 4.33.1
Thx!
The text was updated successfully, but these errors were encountered:
Hi, for the CUDA error, could you try following the message? For debugging consider passing CUDA_LAUNCH_BLOCKING=1
The warning you showed shouldn't matter. It's expected in this example. We want the generation to stop at \n, and 13 is the token index of \n. For some reason, it's encoded into 2 tokens ([29871, 13]), so we just use 13 as the eos_token.
please send us with more detailed information of error since the RuntimeError and warning cannot provide enough information. we are delighted to help you with our work:p
Hi! I am sorry for the late reply :(
I worked with TheBloke/Llama-2-13B-GPTQ for most experiments so far. I now tried Llama 2 again, and I did not run into a problem this time ^^
In case I find the encounter the error and the corresponding solution too, I will let you know!
Good Day!
I tried to run the GSM8k example with the model from HF as you described: (only adjust the log and prompt paths)
CUDA_VISIBLE_DEVICES=0,1 python examples/rap_gsm8k/inference.py --base_lm hf --hf_path meta-llama/Llama-2-70b-hf --hf_peft_path None --hf_quantized 'nf4'
However, I receive the following error
I think this is related to the warning also mentioned in the log trace:
When searching on GitHub, I think it is related to input mismatching due to some false tokenisation 1 2 3.
Did you also encounter this problem or how did you go about it?
I will try the other versions of Llama in the meantime.
I am using
transformers 4.33.1
Thx!
The text was updated successfully, but these errors were encountered: