Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I run this with wgpu? #3

Open
majian4work opened this issue Aug 4, 2023 · 9 comments
Open

How can I run this with wgpu? #3

majian4work opened this issue Aug 4, 2023 · 9 comments

Comments

@majian4work
Copy link

I want to test this project on my laptop with Intel Iris Xe Graphics, how can I achieve that?
my cpu memory is 16G.

@Gadersd
Copy link
Owner

Gadersd commented Aug 7, 2023

burn-wgpu currently doesn't use the full device memory available so llama2 can't run with it just yet but I am working on a solution. Hopefully within the next few days I'll have it working with wgpu.

@majian4work
Copy link
Author

Thank you for your effort.
It may be necessary to implement quantization for clients with less than 16GB of memory.

@smallstepman
Copy link

burn-wgpu currently doesn't use the full device memory available

Could you please explain what exactly is the current limitation and maybe you also if you know if there are plans to solve it in burn-wgpu or wgpu?
Is there anything I could do to help?

@majian4work
Copy link
Author

I try to some modification

    type GraphicsApi = AutoGraphicsApi;
    type Backend = WgpuBackend<GraphicsApi, Elem, i32>;
    let device = WgpuDevice::default();

found some problem:

  1. K::repeat default implementation limit base dim must be 1 before repeat
    after quick fix 1. got another error:
In Device::create_bind_group
    Buffer binding 0 range 524288000 exceeds `max_*_buffer_binding_size` limit 134217728

@majian4work
Copy link
Author

By the way, I just load one layer transformer block because there wasn't enough memory available.

@Gadersd
Copy link
Owner

Gadersd commented Aug 24, 2023

burn-wgpu has been updated to utilize the full GPU memory so it should now work as long as your GPU has enough memory.

@smallstepman
Copy link

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

@majian4work
Copy link
Author

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I attempted to modify the code directly, but I am unsure if it is correct. I just want to test whether or not it will run on my laptop, without caring about the result.

@hlhr202
Copy link

hlhr202 commented Apr 20, 2024

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I have the same problem. I m using stas/tiny-random-llama-2
This probably caused by RotaryEncodingConfig::init
when repeat the freq_cis, the shape of freq_cis is [256, 2, 2]

the jit of burn has this repeat function

pub(crate) fn repeat<R: Runtime, E: JitElement, const D1: usize>(
    input: JitTensor<R, E, D1>,
    dim: usize,
    times: usize,
) -> JitTensor<R, E, D1> {
    let mut shape = input.shape.clone();
    if shape.dims[dim] != 1 {
        panic!("Can only repeat dimension with dim=1");
    }

@Gadersd could you suggest any fix here? thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants