Fall back to CPU in GPU kernels #4380

dionhaefner · 2020-09-22T15:24:56Z

dionhaefner
Sep 22, 2020

We implement some custom XLA calls that are right now only available on CPU. Until we have a full GPU implementation, we would like to support calling our custom calls from jitted GPU functions by copying to the host and calling the CPU custom call targets.

Is this possible somehow?

def hybrid_func(x):
    # do math on GPU
    x = x + 1
   
    # use our custom call that is being executed on CPU
    # this copies x to the host, does the computation, and copies back to device
    x = my_custom_xla_op(x)

    # continue with GPU stuff
    # ...

    return x

hybrid_func = jax.jit(hybrid_func, platform=jax.devices('gpu')[0])
hybrid_func(jnp.zeros(10))

Answered by jekbradbury

Sep 29, 2020

I think the custom call targets you register on the GPU backend are really host functions (that get device pointers, and can call device kernels). So it would be fine to pull a value back to the host, perform computation there (e.g. by calling a CPU custom call target function), and push it back to the device all within a function registered as a GPU custom call.

View full answer

jekbradbury · 2020-09-29T06:11:38Z

jekbradbury
Sep 29, 2020

I think the custom call targets you register on the GPU backend are really host functions (that get device pointers, and can call device kernels). So it would be fine to pull a value back to the host, perform computation there (e.g. by calling a CPU custom call target function), and push it back to the device all within a function registered as a GPU custom call.

2 replies

dionhaefner Sep 29, 2020
Author

Thanks, that's exactly what we are doing right now :)

I assume there's no way to do the copying on JAX side? If possible it would be nicer to avoid having to link to the CUDA runtime library, but we can live with it.

jekbradbury Sep 29, 2020

Unfortunately no, I don't think there's a way to do the copying on the JAX side for now--we don't have a mechanism for using higher-level parts of JAX (like PJRT or anything in Python) from the custom call environment. (I think if we did add such a mechanism, it would probably allow wrapping arbitrary Python as a custom call, which would certainly work for your use case but might have performance gotchas.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall back to CPU in GPU kernels #4380

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Fall back to CPU in GPU kernels #4380

dionhaefner Sep 22, 2020

Replies: 1 comment · 2 replies

jekbradbury Sep 29, 2020

dionhaefner Sep 29, 2020 Author

jekbradbury Sep 29, 2020

dionhaefner
Sep 22, 2020

Replies: 1 comment 2 replies

jekbradbury
Sep 29, 2020

dionhaefner Sep 29, 2020
Author