Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy for null pointers returned by drivers? #484

Closed
kvark opened this issue Oct 20, 2021 · 8 comments
Closed

Policy for null pointers returned by drivers? #484

kvark opened this issue Oct 20, 2021 · 8 comments

Comments

@kvark
Copy link
Collaborator

kvark commented Oct 20, 2021

Following up to gfx-rs/wgpu#2091 (comment)
TL;DR: ExtDebugUtilsFn::cmd_begin_debug_utils_label_ext is NULL

Something needs to check for NULL. It seems to me that Ash is in a better position to do this.
One way to proceed would be to return an Error from DebugUtils::new if any of the pointers are NULL. This is unfortunate because it's an API change, and in order to be consistent with the rest all initialization would have to return a result.
Another way is to add something like fn new_fallible specifically to DebugUtils to let clients work around bad drivers.

@Ralith
Copy link
Collaborator

Ralith commented Oct 20, 2021

What driver is this? Are you certain that it advertises support for the extension?

@kvark
Copy link
Collaborator Author

kvark commented Oct 20, 2021

AdapterInfo { name: "Intel(R) UHD Graphics 630", vendor: 32902, device: 16027, device_type: IntegratedGpu, backend: Vulkan }
Driver Version 24.20.100.628

I haven't confirmed it myself. cc @cwfitzgerald, who was investigating this on Discord.

Here is how the initialization in wgpu goes:

  1. we gather a list of extensions here - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L122
  2. In debug, it includes ext::DebugUtils::name(): https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L159
  3. at the end of the function, we filter out ones not supported by the instance
  4. so the result of this function only contains extensions that are needed and supported - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L476
  5. finally, we initialize DebugUtils extension here - https://github.com/gfx-rs/wgpu/blob/a59a627f124d03b34c826de64b77f5bf74d6af43/wgpu-hal/src/vulkan/instance.rs#L206

@Ralith
Copy link
Collaborator

Ralith commented Oct 20, 2021

Thanks, wgpu's behavior looks sane to me. Debug utils are often implemented by layers, so I wonder if there might be a janky layer involved.

Overall, I'm reluctant to support ash going out of its way to defend against things that are expressly forbidden by the spec. I think this should be reported upstream, provided a driver update doesn't fix the user's problem. A mitigation could be applied at a higher layer by allowing the user to explicitly disable the debug extension (perhaps this is already supported?) and/or by blacklisting it for the driver. If we add extra checking/workarounds at the ash API level every time we see a driver bug, we might end up with arbitrarily large amounts of cruft, with little ability to remove it.

@kvark
Copy link
Collaborator Author

kvark commented Oct 23, 2021

If we can work around this on our side, that's an acceptable outcome. The function we call - cmd_begin_debug_utils_label is a wrapper though, it doesn't tell us in any way if it's not possible to call.

@Ralith
Copy link
Collaborator

Ralith commented Oct 23, 2021

Reviewing the specific implementation of ExtDebugUtilsFn::load, it should be impossible for ExtDebugUtilsFn::cmd_begin_debug_utils_label_ext to become null, because ash already checks that and drops in a stub function if it is null.

https://github.com/MaikKlein/ash/blob/e10bbf3063d9b84b9d8c04e6e2baae7d4881cce4/ash/src/vk/extensions.rs#L9965-L9984

Are you certain that's what's happening? If so, can you trace where the pointer is becoming null?

@hayashi-stl
Copy link

hayashi-stl commented Feb 19, 2022

I've also run into this issue when running wgpu's boids example (using Ubuntu 18.04 with an AMD GPU), after running

sudo add-apt-repository ppa:oibaf/graphics-drivers
sudo apt update
sudo apt upgrade
sudo apt install libvulkan1 mesa-vulkan-drivers vulkan-utils

to install Vulkan drivers and then confirming that vulkan-smoketest works. The version of driver installed is mesa 20.0.8).
Using lldb, I found out that on a specific line in this code:

cmd_begin_debug_utils_label_ext: unsafe {
    unsafe extern "system" fn cmd_begin_debug_utils_label_ext(
        _command_buffer: CommandBuffer,
        _p_label_info: *const DebugUtilsLabelEXT,
    ) {
        panic!(concat!(
            "Unable to load ",
            stringify!(cmd_begin_debug_utils_label_ext)
        ))
    }
    let cname = ::std::ffi::CStr::from_bytes_with_nul_unchecked(
        b"vkCmdBeginDebugUtilsLabelEXT\0",
    );
    let val = _f(cname);
    // right here, specifically
    if val.is_null() {
        cmd_begin_debug_utils_label_ext
    } else {
        ::std::mem::transmute(val)
    }
}

val is not null. It instead points to a function whose disassembly is

libvulkan.so.1`___lldb_unnamed_symbol1120:
    0x7ffff4f1aaa0 <+0>: movq   (%rdi), %rax
    0x7ffff4f1aaa3 <+3>: jmpq   *0x640(%rax)
    0x7ffff4f1aaa9 <+9>: nopl   (%rax)

I found out that when the program reaches this code,

pub unsafe fn cmd_begin_debug_utils_label_ext(
    &self,
    command_buffer: CommandBuffer,
    p_label_info: *const DebugUtilsLabelEXT,
) {
    // right here
    (self.cmd_begin_debug_utils_label_ext)(command_buffer, p_label_info)
}

the disassembly (and next instruction, marked by an arrow) is:

boids`cmd_begin_debug_utils_label_ext:
    0x5555561bed90 <+0>:  subq   $0x28, %rsp
    0x5555561bed94 <+4>:  movq   %rdx, (%rsp)
    0x5555561bed98 <+8>:  movq   %rsi, %rax
    0x5555561bed9b <+11>: movq   (%rsp), %rsi
    0x5555561bed9f <+15>: movq   %rax, 0x8(%rsp)
    0x5555561beda4 <+20>: movq   %rdi, %rax
    0x5555561beda7 <+23>: movq   0x8(%rsp), %rdi
    0x5555561bedac <+28>: movq   %rax, 0x10(%rsp)
    0x5555561bedb1 <+33>: movq   %rdi, 0x18(%rsp)
    0x5555561bedb6 <+38>: movq   %rsi, 0x20(%rsp)
->  0x5555561bedbb <+43>: callq  *0x28(%rax)
    0x5555561bedbe <+46>: addq   $0x28, %rsp
    0x5555561bedc2 <+50>: retq  

and the registers are:

       rax = 0x0000555556ad2d20
       rbx = 0x00007fffffffda80
       rcx = 0x00007fffffff9830
       rdx = 0x00007fffffff9830
       rdi = 0x000055555a3c2eb0
       rsi = 0x00007fffffff9830
       rbp = 0x0000555556475de0  boids`__libc_csu_init
       rsp = 0x00007fffffff97a0
        r8 = 0x000000003b9cbe02
        r9 = 0x0000000000000000
       r10 = 0x0000555559eac070
       r11 = 0x0000000000000000
       r12 = 0x00005555556b8e50  boids`_start
       r13 = 0x00007fffffffdb90
       r14 = 0x00005555569e4b28
       r15 = 0x0000000000000000
       rip = 0x00005555561bedbb  boids`cmd_begin_debug_utils_label_ext + 43 at extensions.rs:9584:9
    rflags = 0x0000000000000206
        cs = 0x0000000000000033
        fs = 0x0000000000000000
        gs = 0x0000000000000000
        ss = 0x000000000000002b
        ds = 0x0000000000000000
        es = 0x0000000000000000

Reading from rax + 0x28 gives 0x00007ffff4f1aaa0, and disassembling from there gives the disassembly of libvulkan.so.1`___lldb_unnamed_symbol1120 again. Stepping at the instruction level, rax becomes 0x0000555559eb3c50, which is presumably the address to the command buffer. Dumping 0x650 bytes from there gives

0x555559eb3c50: a0 42 f2 f4 ff 7f 00 00 20 d2 78 f4 ff 7f 00 00  .B...... .x.....
0x555559eb3c60: 70 d4 78 f4 ff 7f 00 00 a0 d4 78 f4 ff 7f 00 00  p.x.......x.....
0x555559eb3c70: 10 d7 78 f4 ff 7f 00 00 a0 d7 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3c80: 80 db 78 f4 ff 7f 00 00 80 e1 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3c90: 90 e1 78 f4 ff 7f 00 00 10 e2 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3ca0: 40 e2 78 f4 ff 7f 00 00 50 e2 78 f4 ff 7f 00 00  @.x.....P.x.....
0x555559eb3cb0: 90 e4 78 f4 ff 7f 00 00 00 e5 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3cc0: 90 e5 78 f4 ff 7f 00 00 60 e2 78 f4 ff 7f 00 00  ..x.....`.x.....
0x555559eb3cd0: 60 e3 78 f4 ff 7f 00 00 f0 e3 78 f4 ff 7f 00 00  `.x.......x.....
0x555559eb3ce0: c0 e5 78 f4 ff 7f 00 00 f0 e7 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3cf0: d0 e9 78 f4 ff 7f 00 00 c0 ea 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3d00: 50 eb 78 f4 ff 7f 00 00 f0 eb 78 f4 ff 7f 00 00  P.x.......x.....
0x555559eb3d10: 70 f0 78 f4 ff 7f 00 00 d0 f2 78 f4 ff 7f 00 00  p.x.......x.....
0x555559eb3d20: 00 f6 78 f4 ff 7f 00 00 10 f7 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3d30: 70 f7 78 f4 ff 7f 00 00 90 f7 78 f4 ff 7f 00 00  p.x.......x.....
0x555559eb3d40: a0 f7 78 f4 ff 7f 00 00 50 38 81 f4 ff 7f 00 00  ..x.....P8......
0x555559eb3d50: 70 3a 81 f4 ff 7f 00 00 d0 3a 81 f4 ff 7f 00 00  p:.......:......
0x555559eb3d60: b0 f7 78 f4 ff 7f 00 00 40 f9 78 f4 ff 7f 00 00  [email protected].....
0x555559eb3d70: 80 9d 79 f4 ff 7f 00 00 30 9e 79 f4 ff 7f 00 00  ..y.....0.y.....
0x555559eb3d80: 10 98 79 f4 ff 7f 00 00 90 98 79 f4 ff 7f 00 00  ..y.......y.....
0x555559eb3d90: 10 99 79 f4 ff 7f 00 00 b0 9a 79 f4 ff 7f 00 00  ..y.......y.....
0x555559eb3da0: 60 9b 79 f4 ff 7f 00 00 00 ee 7f f4 ff 7f 00 00  `.y.............
0x555559eb3db0: d0 ee 7f f4 ff 7f 00 00 10 e9 7f f4 ff 7f 00 00  ................
0x555559eb3dc0: 20 ea 7f f4 ff 7f 00 00 70 ea 7f f4 ff 7f 00 00   .......p.......
0x555559eb3dd0: 00 ec 7f f4 ff 7f 00 00 80 d0 7f f4 ff 7f 00 00  ................
0x555559eb3de0: 10 d1 7f f4 ff 7f 00 00 80 7a 7f f4 ff 7f 00 00  .........z......
0x555559eb3df0: d0 25 79 f4 ff 7f 00 00 e0 27 79 f4 ff 7f 00 00  .%y......'y.....
0x555559eb3e00: 00 0c 79 f4 ff 7f 00 00 80 11 79 f4 ff 7f 00 00  ..y.......y.....
0x555559eb3e10: 40 1a 79 f4 ff 7f 00 00 10 22 79 f4 ff 7f 00 00  @.y......"y.....
0x555559eb3e20: 10 28 79 f4 ff 7f 00 00 70 2a 79 f4 ff 7f 00 00  .(y.....p*y.....
0x555559eb3e30: 20 2b 79 f4 ff 7f 00 00 10 2c 79 f4 ff 7f 00 00   +y......,y.....
0x555559eb3e40: a0 2b 79 f4 ff 7f 00 00 c0 37 79 f4 ff 7f 00 00  .+y......7y.....
0x555559eb3e50: 20 0a 79 f4 ff 7f 00 00 d0 0b 79 f4 ff 7f 00 00   .y.......y.....
0x555559eb3e60: a0 26 7f f4 ff 7f 00 00 b0 31 7f f4 ff 7f 00 00  .&.......1......
0x555559eb3e70: 20 32 7f f4 ff 7f 00 00 30 29 78 f4 ff 7f 00 00   2......0)x.....
0x555559eb3e80: 20 2a 78 f4 ff 7f 00 00 f0 2a 78 f4 ff 7f 00 00   *x......*x.....
0x555559eb3e90: 30 12 78 f4 ff 7f 00 00 a0 11 78 f4 ff 7f 00 00  0.x.......x.....
0x555559eb3ea0: b0 14 78 f4 ff 7f 00 00 20 1e 78 f4 ff 7f 00 00  ..x..... .x.....
0x555559eb3eb0: a0 14 78 f4 ff 7f 00 00 10 1f 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3ec0: 20 24 78 f4 ff 7f 00 00 80 24 78 f4 ff 7f 00 00   $x......$x.....
0x555559eb3ed0: e0 24 78 f4 ff 7f 00 00 00 25 78 f4 ff 7f 00 00  .$x......%x.....
0x555559eb3ee0: 50 25 78 f4 ff 7f 00 00 80 25 78 f4 ff 7f 00 00  P%x......%x.....
0x555559eb3ef0: b0 25 78 f4 ff 7f 00 00 00 26 78 f4 ff 7f 00 00  .%x......&x.....
0x555559eb3f00: 50 26 78 f4 ff 7f 00 00 b0 17 78 f4 ff 7f 00 00  P&x.......x.....
0x555559eb3f10: b0 16 78 f4 ff 7f 00 00 e0 15 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb3f20: a0 2b 78 f4 ff 7f 00 00 00 2c 78 f4 ff 7f 00 00  .+x......,x.....
0x555559eb3f30: 80 2c 78 f4 ff 7f 00 00 00 2d 78 f4 ff 7f 00 00  .,x......-x.....
0x555559eb3f40: 20 2f 78 f4 ff 7f 00 00 90 2f 78 f4 ff 7f 00 00   /x....../x.....
0x555559eb3f50: 10 f6 7a f4 ff 7f 00 00 b0 ba 7c f4 ff 7f 00 00  ..z.......|.....
0x555559eb3f60: 40 24 7a f4 ff 7f 00 00 90 ba 7c f4 ff 7f 00 00  @$z.......|.....
0x555559eb3f70: a0 ba 7c f4 ff 7f 00 00 90 f6 7a f4 ff 7f 00 00  ..|.......z.....
0x555559eb3f80: f0 f5 7a f4 ff 7f 00 00 00 a7 7c f4 ff 7f 00 00  ..z.......|.....
0x555559eb3f90: e0 a7 7c f4 ff 7f 00 00 90 a8 7c f4 ff 7f 00 00  ..|.......|.....
0x555559eb3fa0: a0 21 7d f4 ff 7f 00 00 c0 40 78 f4 ff 7f 00 00  .!}......@x.....
0x555559eb3fb0: d0 40 78 f4 ff 7f 00 00 e0 40 78 f4 ff 7f 00 00  .@x......@x.....
0x555559eb3fc0: 50 40 78 f4 ff 7f 00 00 90 44 81 f4 ff 7f 00 00  [email protected]......
0x555559eb3fd0: 70 45 81 f4 ff 7f 00 00 f0 42 81 f4 ff 7f 00 00  pE.......B......
0x555559eb3fe0: 80 45 81 f4 ff 7f 00 00 10 3f 81 f4 ff 7f 00 00  .E.......?......
0x555559eb3ff0: f0 1d 78 f4 ff 7f 00 00 80 39 78 f4 ff 7f 00 00  ..x......9x.....
0x555559eb4000: d0 3b 78 f4 ff 7f 00 00 30 3c 78 f4 ff 7f 00 00  .;x.....0<x.....
0x555559eb4010: 40 27 78 f4 ff 7f 00 00 a0 e4 78 f4 ff 7f 00 00  @'x.......x.....
0x555559eb4020: 30 e5 78 f4 ff 7f 00 00 80 15 79 f4 ff 7f 00 00  0.x.......y.....
0x555559eb4030: 50 41 78 f4 ff 7f 00 00 b0 2e 78 f4 ff 7f 00 00  PAx.......x.....
0x555559eb4040: 90 e3 78 f4 ff 7f 00 00 d0 e2 78 f4 ff 7f 00 00  ..x.......x.....
0x555559eb4050: 40 e4 78 f4 ff 7f 00 00 40 2b 78 f4 ff 7f 00 00  @.x.....@+x.....
0x555559eb4060: 40 d4 78 f4 ff 7f 00 00 40 3d 79 f4 ff 7f 00 00  @.x.....@=y.....
0x555559eb4070: 10 3e 79 f4 ff 7f 00 00 e0 37 79 f4 ff 7f 00 00  .>y......7y.....
0x555559eb4080: 10 3a 79 f4 ff 7f 00 00 30 3d 79 f4 ff 7f 00 00  .:y.....0=y.....
0x555559eb4090: 40 22 79 f4 ff 7f 00 00 90 6e f3 f4 ff 7f 00 00  @"y......n......
0x555559eb40a0: 70 4d 81 f4 ff 7f 00 00 80 4d 81 f4 ff 7f 00 00  pM.......M......
0x555559eb40b0: 60 4e 81 f4 ff 7f 00 00 d0 4e 81 f4 ff 7f 00 00  `N.......N......
0x555559eb40c0: f0 4e 81 f4 ff 7f 00 00 60 7d f3 f4 ff 7f 00 00  .N......`}......
0x555559eb40d0: 90 4d 81 f4 ff 7f 00 00 00 00 00 00 00 00 00 00  .M..............
0x555559eb40e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb40f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4120: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4130: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4140: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4150: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4160: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4170: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4190: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb41f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4210: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4220: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4230: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4240: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4250: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4260: 00 00 00 00 00 00 00 00 50 e0 f1 f4 ff 7f 00 00  ........P.......
0x555559eb4270: 90 e1 f1 f4 ff 7f 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x555559eb4290: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

The pointer at rax + 0x640 is null. So it seems like something is up with the command buffer.

Running the boids example in release mode works fine.
Running the boids example in debug mode on Ubuntu 20.04 with mesa 22.1 also works fine.

@MarijnS95
Copy link
Collaborator

@kvark has this "driver returns nullpointer" issue been resolved?

@josh65536 that seems to be a different issue, where the driver function is indeed non-NULL but the internal implementation/state is botched somehow; seems we've got a separate report of "something similar" in #700, but yet to be confirmed with more detailed dumps/stacktraces.

@MarijnS95
Copy link
Collaborator

Closing this after not receiving any followup, there appears to be nothing actionable for ash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants