Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding flash attention to one click installer #4015

Closed
trihardseven opened this issue Sep 21, 2023 · 10 comments
Closed

Adding flash attention to one click installer #4015

trihardseven opened this issue Sep 21, 2023 · 10 comments
Labels
enhancement New feature or request stale

Comments

@trihardseven
Copy link

Description

Adding flash attention to one click installer, for usage with exllamaV2

Additional Context

Me and others not so tech savvy people are having issues installing it manually on windows

@trihardseven trihardseven added the enhancement New feature or request label Sep 21, 2023
@maddog7667
Copy link

I agree, considering the instructions to get flash attention working are vague af and assume that the user has years of tech school and are computer gurus.

@Panchovix
Copy link
Contributor

Flash-attention 2 doesn't works for now on Windows. I have trying building it but no luck so far.

@CamiloMM
Copy link

Hey! I recognize Panchovix. If he can't get it to build imma give up right now.

I hope someone with a PhD in Python bullshit saves the day.

@donQx
Copy link

donQx commented Sep 30, 2023

2023-09-30 12:29:14 WARNING:You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be.
Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features

Does this Manual work ? on windows?

@redyandsalted
Copy link

redyandsalted commented Sep 30, 2023

2023-09-30 12:29:14 WARNING:You are running ExLlamaV2 without flash-attention. This will cause the VRAM usage to be a lot higher than it could be. Try installing flash-attention following the instructions here: https://github.com/Dao-AILab/flash-attention#installation-and-features

Does this Manual work ? on windows?

I'm running Windows and this is the manual I was using for installing Flash Attention 2, after having it complain about my Cuda version not matching my Pytorch Cuda version, I installed the correct one (11.7 for me) and uninstalled Cuda 12, I ran into a different error:

(C:\text-generation-webui\installer_files\env) C:\Users\no-one>pip install flash-attn --no-build-isolation
Collecting flash-attn
  Using cached flash_attn-2.3.0.tar.gz (2.3 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [22 lines of output]
      error: pathspec 'csrc/cutlass' did not match any file(s) known to git
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\no-one\AppData\Local\Temp\pip-install-lanm8n3l\flash-attn_18a22cfa604e4f58be0406b9a1517187\setup.py", line 115, in <module>
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
        File "C:\Users\no-one\AppData\Local\Temp\pip-install-lanm8n3l\flash-attn_18a22cfa604e4f58be0406b9a1517187\setup.py", line 66, in get_cuda_bare_metal_version
          raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 421, in check_output
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 503, in run
          with Popen(*popenargs, **kwargs) as process:
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 971, in __init__
          self._execute_child(args, executable, preexec_fn, close_fds,
        File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 1456, in _execute_child
          hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
      FileNotFoundError: [WinError 2] The system cannot find the file specified


      torch.__version__  = 2.0.1+cu117


      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

the alternative it gives of cloning the repo and running the setup file results in a very similar error

Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'csrc/cutlass'
Cloning into 'C:/text-generation-webui/flash-attention/csrc/cutlass'...
Submodule path 'csrc/cutlass': checked out 'e0aaa3c3b38db9a89c31f04fef91e92123ad5e2e'


torch.__version__  = 2.0.1+cu117


Traceback (most recent call last):
  File "C:\text-generation-webui\flash-attention\setup.py", line 115, in <module>
    _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
  File "C:\text-generation-webui\flash-attention\setup.py", line 66, in get_cuda_bare_metal_version
    raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 503, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 971, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\text-generation-webui\installer_files\env\lib\subprocess.py", line 1456, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified

right now it seems it just isn't ready for windows

@CamiloMM
Copy link

CamiloMM commented Oct 2, 2023

Maybe Oobabooga should either suppress this message, or add a clarification, at least on Windows?

@Nicoolodion2
Copy link

Yeah, it doesn't work for windows right now. Only for Linux (macOS) I think. We'll have to see who is earlier. Flash attention supporting windows or Oobabooga giving an statment.

@Panchovix
Copy link
Contributor

I managed to build it on Windows. But, you will need CUDA 12.1 and torch+cu121, else it won't compile.

More info Dao-AILab/flash-attention#595

@bdashore3
Copy link
Contributor

See #4235

@github-actions github-actions bot added the stale label Nov 20, 2023
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

9 participants