Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of Mac m1 #18

Merged
merged 2 commits into from
Sep 21, 2023
Merged

Support of Mac m1 #18

merged 2 commits into from
Sep 21, 2023

Conversation

davideuler
Copy link

No description provided.

@facebook-github-bot
Copy link

Hi @davideuler!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

@zachschillaci27
Copy link

Thanks! I just ran this on my M1 Pro without a problem.

@zachschillaci27
Copy link

It is not working on M2 ?

I didn't mean for that to imply it won't work on M2, just specifying my hardware. Unfortunately I don't have an M2 to test it on, but naively I would expect it to work.

@killian-mannarelli
Copy link

Working on M2

@edisonslamp
Copy link

Thanks! I just ran this on my M1 Pro without a problem.

Hello! Can't figure out how to run the model on M1. Is there any manuals for this? When I follow Meta's instruction here I face the CUDA problem.

@zachschillaci27
Copy link

Thanks! I just ran this on my M1 Pro without a problem.

Hello! Can't figure out how to run the model on M1. Is there any manuals for this? When I follow Meta's instruction here I face the CUDA problem.

Until this PR is merged, you'll need to run off of this feature branch. You can follow the steps below:

  1. Add the new remote repository:
git remote add mps [email protected]:davideuler/codellama.git      
  1. Fetch the changes in the new repository:
git fetch mps
  1. Checkout this branch from the new remote:
git checkout mps/main

Now you should be all set to run!

@lostmygithubaccount
Copy link

does anyone have an example running on the 13B or larger? I can only load in the 7B and get hanging or various errors trying to load in the larger versions

@Domenico-Esposito
Copy link

It doesn't work for me on Mac M1. I have the following error.

I have the same error for all available examples: example_completion.py, example_infilling.py, and example_instructions.py.

NOTE: Redirects are currently not supported in Windows or MacOs.
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 45430) of binary: /Users/dms/.pyenv/versions/3.10.5/bin/python3.10
Traceback (most recent call last):
  File "/Users/dms/.pyenv/versions/3.10.5/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
======================================================
example_instructions.py FAILED
------------------------------------------------------

@Rehanchy
Copy link

Rehanchy commented Sep 8, 2023

It doesn't work for me on Mac M1. I have the following error.

I have the same error for all available examples: example_completion.py, example_infilling.py, and example_instructions.py.

NOTE: Redirects are currently not supported in Windows or MacOs.
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 45430) of binary: /Users/dms/.pyenv/versions/3.10.5/bin/python3.10
Traceback (most recent call last):
  File "/Users/dms/.pyenv/versions/3.10.5/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/Users/dms/.pyenv/versions/3.10.5/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
======================================================
example_instructions.py FAILED
------------------------------------------------------

Doesn't work for me on Mac M2, same error as this one, have I got any setups wrong?

@davideuler
Copy link
Author

It works on my mac studio with M1 ultra, Python 3.10.10. I checked the dependencies. I'm not sure if the dependencies will cause different result.

pip freeze | grep -e torch -e transformer -e sentencepiece -e fairscale
ctransformers @ https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.22+cu117-py3-none-any.whl#sha256=ac34fd73bf1c00bda40510c3bda689db420cfa37a2341e625fe6fd26355248b2
fairscale==0.4.13
pytorch-lightning==1.6.5
sentencepiece==0.1.99
torch==2.0.1
torchaudio==2.0.2
torcheval==0.0.6
torchmetrics==0.11.1
torchtnt==0.2.0
torchvision==0.16.0.dev20230816
transformers @ git+https://github.com/huggingface/transformers@baf1daa58eb2960248fd9f7c3af0ed245b8ce4af
transformers-stream-generator==0.0.4

@syhw syhw requested a review from mpu September 12, 2023 08:20
@Domenico-Esposito
Copy link

It works on my mac studio with M1 ultra, Python 3.10.10. I checked the dependencies. I'm not sure if the dependencies will cause different result.

pip freeze | grep -e torch -e transformer -e sentencepiece -e fairscale
ctransformers @ https://github.com/jllllll/ctransformers-cuBLAS-wheels/releases/download/AVX2/ctransformers-0.2.22+cu117-py3-none-any.whl#sha256=ac34fd73bf1c00bda40510c3bda689db420cfa37a2341e625fe6fd26355248b2
fairscale==0.4.13
pytorch-lightning==1.6.5
sentencepiece==0.1.99
torch==2.0.1
torchaudio==2.0.2
torcheval==0.0.6
torchmetrics==0.11.1
torchtnt==0.2.0
torchvision==0.16.0.dev20230816
transformers @ git+https://github.com/huggingface/transformers@baf1daa58eb2960248fd9f7c3af0ed245b8ce4af
transformers-stream-generator==0.0.4

Nothing, even using the same versions, I have the same error.
PS: The version "torchvision==0.16.0.dev20230816" doesn't seem to exist. I used version 0.15.2. Do you think the error is related to this?

Copy link
Contributor

@mpu mpu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution. The new code is a net improvement over the existing code that simply crashes on M1, so I'll merge it.

@mpu mpu merged commit 2f0f7bb into meta-llama:main Sep 21, 2023
mpu added a commit that referenced this pull request Sep 22, 2023
mpu added a commit that referenced this pull request Sep 22, 2023
qsimeon

This comment was marked as resolved.

kt-cheng pushed a commit to kt-cheng/codellama-docker that referenced this pull request May 31, 2024
kt-cheng pushed a commit to kt-cheng/codellama-docker that referenced this pull request May 31, 2024
kt-cheng pushed a commit to kt-cheng/codellama-docker that referenced this pull request May 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.