How can I run this with wgpu? #3

majian4work · 2023-08-04T09:57:34Z

I want to test this project on my laptop with Intel Iris Xe Graphics, how can I achieve that?
my cpu memory is 16G.

Gadersd · 2023-08-07T14:52:22Z

burn-wgpu currently doesn't use the full device memory available so llama2 can't run with it just yet but I am working on a solution. Hopefully within the next few days I'll have it working with wgpu.

majian4work · 2023-08-08T00:19:18Z

Thank you for your effort.
It may be necessary to implement quantization for clients with less than 16GB of memory.

smallstepman · 2023-08-09T13:10:42Z

burn-wgpu currently doesn't use the full device memory available

Could you please explain what exactly is the current limitation and maybe you also if you know if there are plans to solve it in burn-wgpu or wgpu?
Is there anything I could do to help?

majian4work · 2023-08-09T14:02:28Z

I try to some modification

    type GraphicsApi = AutoGraphicsApi;
    type Backend = WgpuBackend<GraphicsApi, Elem, i32>;
    let device = WgpuDevice::default();

found some problem:

K::repeat default implementation limit base dim must be 1 before repeat
after quick fix 1. got another error:

In Device::create_bind_group
    Buffer binding 0 range 524288000 exceeds `max_*_buffer_binding_size` limit 134217728

majian4work · 2023-08-10T00:44:59Z

By the way, I just load one layer transformer block because there wasn't enough memory available.

Gadersd · 2023-08-24T20:31:10Z

burn-wgpu has been updated to utilize the full GPU memory so it should now work as long as your GPU has enough memory.

smallstepman · 2023-09-13T22:00:54Z

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

majian4work · 2023-09-14T02:49:46Z

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I attempted to modify the code directly, but I am unsure if it is correct. I just want to test whether or not it will run on my laptop, without caring about the result.

hlhr202 · 2024-04-20T07:48:44Z

@Ma-Jian1 how did you fix issue No.1 ("Can only repeat dimension with dim=1")?

I have the same problem. I m using stas/tiny-random-llama-2
This probably caused by RotaryEncodingConfig::init
when repeat the freq_cis, the shape of freq_cis is [256, 2, 2]

the jit of burn has this repeat function

pub(crate) fn repeat<R: Runtime, E: JitElement, const D1: usize>(
    input: JitTensor<R, E, D1>,
    dim: usize,
    times: usize,
) -> JitTensor<R, E, D1> {
    let mut shape = input.shape.clone();
    if shape.dims[dim] != 1 {
        panic!("Can only repeat dimension with dim=1");
    }

@Gadersd could you suggest any fix here? thx

hlhr202 mentioned this issue Apr 21, 2024

Failed to run llama2-burn on webgpu tracel-ai/burn#1670

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I run this with wgpu? #3

How can I run this with wgpu? #3

majian4work commented Aug 4, 2023

Gadersd commented Aug 7, 2023

majian4work commented Aug 8, 2023

smallstepman commented Aug 9, 2023

majian4work commented Aug 9, 2023

majian4work commented Aug 10, 2023

Gadersd commented Aug 24, 2023

smallstepman commented Sep 13, 2023

majian4work commented Sep 14, 2023

hlhr202 commented Apr 20, 2024

How can I run this with wgpu? #3

How can I run this with wgpu? #3

Comments

majian4work commented Aug 4, 2023

Gadersd commented Aug 7, 2023

majian4work commented Aug 8, 2023

smallstepman commented Aug 9, 2023

majian4work commented Aug 9, 2023

majian4work commented Aug 10, 2023

Gadersd commented Aug 24, 2023

smallstepman commented Sep 13, 2023

majian4work commented Sep 14, 2023

hlhr202 commented Apr 20, 2024