Integration of llama3.1 fixes #197

Feelas · 2024-07-29T15:32:36Z

Quick question: when is an update to an optimum-habana version which includes huggingface/optimum-habana#1154 (fix for rope_scaling @ llama3.1 family) planned?

endomorphosis · 2024-07-31T01:53:31Z

I will look into this today as well see also
HabanaAI/vllm-fork#140

endomorphosis · 2024-07-31T04:12:43Z

I staged some changes on my local repo, and when the PR request for optimum is finished, i will update my fork and make a PR to update the dependencies.

endomorphosis · 2024-08-06T08:42:15Z

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.

#199 (comment)
https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer

However for the moment being I have not yet gotten llama 3.1 405b fp8 working

XinyaoWa · 2024-08-08T02:48:33Z

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.

#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer

However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi
I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!

docker build -t tgi_gaudi_llama3.1  .

endomorphosis · 2024-08-08T02:55:11Z

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.
#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer
However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!
docker build -t tgi_gaudi_llama3.1  . 

this branch is for debugging, there was a push today, huggingface/optimum-habana#1163 (comment) i will make a new docker container based on the new push

XinyaoWa · 2024-08-08T03:13:17Z

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.
#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer
However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!
docker build -t tgi_gaudi_llama3.1  . 
this branch is for debugging, there was a push today, huggingface/optimum-habana#1163 (comment) i will make a new docker container based on the new push

Thanks a lot! Can I ask when will the new TGI docker container ready? I may want to directly try that one~

endomorphosis · 2024-08-08T03:17:59Z

I'll start working on it in about 45 minutes

…

On Wed, Aug 7, 2024 at 8:13 PM XinyaoWa ***@***.***> wrote: I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g. #199 (comment) <#199 (comment)> https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer However for the moment being I have not yet gotten llama 3.1 405b fp8 working Hi I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot! docker build -t tgi_gaudi_llama3.1 . [image: image] <https://private-user-images.githubusercontent.com/82487983/356062282-211fde63-7084-4803-a6c8-9c10d4b4540f.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjMwODU5NjIsIm5iZiI6MTcyMzA4NTY2MiwicGF0aCI6Ii84MjQ4Nzk4My8zNTYwNjIyODItMjExZmRlNjMtNzA4NC00ODAzLWE2YzgtOWMxMGQ0YjQ1NDBmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA4MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwODA4VDAyNTQyMlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWNhNDBkNTgwOTE2ZmI4ZjhlNmUwMTVlOGNmYTUyZWQ1OWM2ZThjZjI1NTc1YmJkMGZiYjRjNmI1MjNiZGU5ZGYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.-NrxXmeF2d7E16ExYxidf_6Kv3JidjKbSEuCs5tZM_A> this branch is for debugging, there was a push today, huggingface/optimum-habana#1163 (comment) <huggingface/optimum-habana#1163 (comment)> i will make a new docker container based on the new push Thanks a lot! Can I ask when will the new TGI docker container ready? I may want to directly try that one~ — Reply to this email directly, view it on GitHub <#197 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAZ7LEW62JFPQOAJEWYWE4DZQLO6HAVCNFSM6AAAAABLUQ6Z7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZUHA3TCMRUGY> . You are receiving this because you commented.Message ID: ***@***.***>

endomorphosis · 2024-08-08T06:11:16Z

I have fixed the dependencies and built the docker container

XinyaoWa · 2024-08-08T06:15:46Z

I have fixed the dependencies and built the docker container

Great!! Where can I found the ready docker container, is there a link in dockerhub? Thanks a lot!

endomorphosis · 2024-08-08T06:24:49Z

I just pushed it to endomorphosis/tgi_gaudi as per your request

Note:
There is not yet a formal release in huggingface/optimum and huggingface/optimum_habana so it uses git for the python dependencies.

I have not yet fixed the quantization bug present in huggingface/optimum_habana json configuration key mismatch, and i have not yet validated whether I can quantize llama 3.1 405B with a single node using parameter offloading, nor do I have multiple gaudi machines to quantize the llama 405b for habana, and llama 3.1 405B fp8 huggingface repository will load weights as bf16 right now.

Please inquire with the OPEA team whether they can assist me with the quantization effort, so that I can subsequently then try to add speculative decoding with llama 3.1 8b as the draft model.

XinyaoWa · 2024-08-08T07:44:44Z

I just pushed it to endomorphosis/tgi_gaudi as per your request

Note: There is not yet a formal release in huggingface/optimum and huggingface/optimum_habana so it uses git for the python dependencies.

I have not yet fixed the quantization bug present in huggingface/optimum_habana json configuration key mismatch, and i have not yet validated whether I can quantize llama 3.1 405B with a single node using parameter offloading, nor do I have multiple gaudi machines to quantize the llama 405b for habana, and llama 3.1 405B fp8 huggingface repository will load weights as bf16 right now.

Please inquire with the OPEA team whether they can assist me with the quantization effort, so that I can subsequently then try to add speculative decoding with llama 3.1 8b as the draft model.

Thanks a lot for your docker container, I will download and have a check~
For the quantization, here are the OPEA team members @changwangss @thuang6 @kevinintel who are responsible for this part, maybe you can consult them~

Feelas · 2024-09-17T09:17:27Z

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

endomorphosis · 2024-09-17T09:28:24Z

I haven't tested it in a while, I gave up on trying to get llama405b on a single node because of the dependency problems, that come along with using any method of quantization, but I assume that any half precision models should work.

regisss · 2024-09-20T08:26:11Z

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

I think it should work but I have not tried it yet. @tthakkal Have you already tried to run Llama 3.1?

tthakkal · 2024-09-20T16:09:13Z

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

I think it should work but I have not tried it yet. @tthakkal Have you already tried to run Llama 3.1?

We tested Llama3.1-8B and Llama3.1-70B bf16 and fp8
Llama3.1-8B on 1 card
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-8b-on-1-card
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-8b-on-1-card-1

Llama3.1-70B 8 cards
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-70b-8-cards
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-70b-on-8-cards

endomorphosis · 2024-09-20T19:20:39Z

you shouldn't need 8 cards, two cards is sufficient.

tthakkal · 2024-09-20T19:28:10Z

you shouldn't need 8 cards, two cards is sufficient.

That could work, just haven't tested it

regisss closed this as completed Sep 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integration of llama3.1 fixes #197

Integration of llama3.1 fixes #197

Feelas commented Jul 29, 2024

endomorphosis commented Jul 31, 2024

endomorphosis commented Jul 31, 2024

endomorphosis commented Aug 6, 2024 •

edited

Loading

XinyaoWa commented Aug 8, 2024 •

edited

Loading

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

endomorphosis commented Aug 8, 2024 via email

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

Feelas commented Sep 17, 2024

endomorphosis commented Sep 17, 2024

regisss commented Sep 20, 2024

tthakkal commented Sep 20, 2024

endomorphosis commented Sep 20, 2024

tthakkal commented Sep 20, 2024

Integration of llama3.1 fixes #197

Integration of llama3.1 fixes #197

Comments

Feelas commented Jul 29, 2024

endomorphosis commented Jul 31, 2024

endomorphosis commented Jul 31, 2024

endomorphosis commented Aug 6, 2024 • edited Loading

XinyaoWa commented Aug 8, 2024 • edited Loading

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

endomorphosis commented Aug 8, 2024 via email

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

endomorphosis commented Aug 8, 2024

XinyaoWa commented Aug 8, 2024

Feelas commented Sep 17, 2024

endomorphosis commented Sep 17, 2024

regisss commented Sep 20, 2024

tthakkal commented Sep 20, 2024

endomorphosis commented Sep 20, 2024

tthakkal commented Sep 20, 2024

endomorphosis commented Aug 6, 2024 •

edited

Loading

XinyaoWa commented Aug 8, 2024 •

edited

Loading