Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of llama3.1 fixes #197

Closed
Feelas opened this issue Jul 29, 2024 · 17 comments
Closed

Integration of llama3.1 fixes #197

Feelas opened this issue Jul 29, 2024 · 17 comments

Comments

@Feelas
Copy link

Feelas commented Jul 29, 2024

Quick question: when is an update to an optimum-habana version which includes huggingface/optimum-habana#1154 (fix for rope_scaling @ llama3.1 family) planned?

@endomorphosis
Copy link

I will look into this today as well see also
HabanaAI/vllm-fork#140

@endomorphosis
Copy link

I staged some changes on my local repo, and when the PR request for optimum is finished, i will update my fork and make a PR to update the dependencies.

@endomorphosis
Copy link

endomorphosis commented Aug 6, 2024

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.

#199 (comment)
https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer

However for the moment being I have not yet gotten llama 3.1 405b fp8 working

@XinyaoWa
Copy link

XinyaoWa commented Aug 8, 2024

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.

#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer

However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi
I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!

docker build -t tgi_gaudi_llama3.1  . 
image

@endomorphosis
Copy link

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.
#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer
However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!

docker build -t tgi_gaudi_llama3.1  . 
image

this branch is for debugging, there was a push today, huggingface/optimum-habana#1163 (comment) i will make a new docker container based on the new push

@XinyaoWa
Copy link

XinyaoWa commented Aug 8, 2024

I created a fork and was able to get llama3.1 8b instruct working, but it reports that some of the token id's are wrong, but the inference appears to work correctly, see e.g.
#199 (comment) https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer
However for the moment being I have not yet gotten llama 3.1 405b fp8 working

Hi I'm trying to run llama3.1_8b with your repo https://github.com/endomorphosis/tgi-gaudi/tree/endomorphosis/llama3.1_tokenizer, but meet some issues when building docker, seems the packages are conflict, could you please help to have a look? Thanks a lot!

docker build -t tgi_gaudi_llama3.1  . 
image

this branch is for debugging, there was a push today, huggingface/optimum-habana#1163 (comment) i will make a new docker container based on the new push

Thanks a lot! Can I ask when will the new TGI docker container ready? I may want to directly try that one~

@endomorphosis
Copy link

endomorphosis commented Aug 8, 2024 via email

@endomorphosis
Copy link

I have fixed the dependencies and built the docker container

@XinyaoWa
Copy link

XinyaoWa commented Aug 8, 2024

I have fixed the dependencies and built the docker container

Great!! Where can I found the ready docker container, is there a link in dockerhub? Thanks a lot!

@endomorphosis
Copy link

I just pushed it to endomorphosis/tgi_gaudi as per your request

Note:
There is not yet a formal release in huggingface/optimum and huggingface/optimum_habana so it uses git for the python dependencies.

I have not yet fixed the quantization bug present in huggingface/optimum_habana json configuration key mismatch, and i have not yet validated whether I can quantize llama 3.1 405B with a single node using parameter offloading, nor do I have multiple gaudi machines to quantize the llama 405b for habana, and llama 3.1 405B fp8 huggingface repository will load weights as bf16 right now.

Please inquire with the OPEA team whether they can assist me with the quantization effort, so that I can subsequently then try to add speculative decoding with llama 3.1 8b as the draft model.

@XinyaoWa
Copy link

XinyaoWa commented Aug 8, 2024

I just pushed it to endomorphosis/tgi_gaudi as per your request

Note: There is not yet a formal release in huggingface/optimum and huggingface/optimum_habana so it uses git for the python dependencies.

I have not yet fixed the quantization bug present in huggingface/optimum_habana json configuration key mismatch, and i have not yet validated whether I can quantize llama 3.1 405B with a single node using parameter offloading, nor do I have multiple gaudi machines to quantize the llama 405b for habana, and llama 3.1 405B fp8 huggingface repository will load weights as bf16 right now.

Please inquire with the OPEA team whether they can assist me with the quantization effort, so that I can subsequently then try to add speculative decoding with llama 3.1 8b as the draft model.

Thanks a lot for your docker container, I will download and have a check~
For the quantization, here are the OPEA team members @changwangss @thuang6 @kevinintel who are responsible for this part, maybe you can consult them~

@Feelas
Copy link
Author

Feelas commented Sep 17, 2024

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

@endomorphosis
Copy link

I haven't tested it in a while, I gave up on trying to get llama405b on a single node because of the dependency problems, that come along with using any method of quantization, but I assume that any half precision models should work.

@regisss
Copy link
Collaborator

regisss commented Sep 20, 2024

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

I think it should work but I have not tried it yet. @tthakkal Have you already tried to run Llama 3.1?

@tthakkal
Copy link
Collaborator

@regisss do you think that having #222 integrated should also fix this one? I don't have the time to test it now, but would be good to close if it should work now, thanks.

I think it should work but I have not tried it yet. @tthakkal Have you already tried to run Llama 3.1?

We tested Llama3.1-8B and Llama3.1-70B bf16 and fp8
Llama3.1-8B on 1 card
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-8b-on-1-card
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-8b-on-1-card-1

Llama3.1-70B 8 cards
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-70b-8-cards
https://github.com/huggingface/tgi-gaudi?tab=readme-ov-file#llama31-70b-on-8-cards

@endomorphosis
Copy link

you shouldn't need 8 cards, two cards is sufficient.

@tthakkal
Copy link
Collaborator

you shouldn't need 8 cards, two cards is sufficient.

That could work, just haven't tested it

@regisss regisss closed this as completed Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants