Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not queue a CRTC sequence: Operation not supported (os error 95) #281

Open
nabajour opened this issue Oct 2, 2024 · 13 comments
Open
Labels

Comments

@nabajour
Copy link

nabajour commented Oct 2, 2024

I'm trying to get jay to work on my setup. After fixing another issue ( #280 ), I was able to start jay and set the correct resolution.

But if I open a terminal (kitty or wezterm), the window doesn't seem to display fully and doesn't refresh (only refreshes on resize when I open and close another terminal) and then it seems to freeze.

I get this in the logs:

[2024-10-02T17:12:49.342Z WARN  jay::backends::metal::video] Cannot use existing connector configuration. Trying to perform modeset.
[2024-10-02T17:12:49.463Z ERROR jay::backends::metal::video] Could not queue a CRTC sequence: Could not queue a CRTC sequence: Operation not supported (os error 95)

the crtc message then repeats a lot.

My setup seems to be a bit hairy on the graphics side with wayland:

  • os: Debian unstable
  • graphics card: NVidia 2080 TI
  • drivers: upstream nvidia drivers 560.35.03 (ubuntu packages from nvidia)
  • 30" HDMI Monitor connected over an adapter to the USB-C output of the graphics card.

I was able to get sway and hyprland to work with this driver version, so it doesn't seem to be a fundamental issue with my setup.

I'm not sure if this is a jay issue or an issue with my configuration, so any pointer to get this to work would be welcome.

@mahkoh
Copy link
Owner

mahkoh commented Oct 2, 2024

Do you have modesetting enabled? https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting

@nabajour
Copy link
Author

nabajour commented Oct 2, 2024

Sorry, added some info, pressed one return to many...

Do you have modesetting enabled? https://wiki.archlinux.org/title/NVIDIA#DRM_kernel_mode_setting

yes, cat /sys/module/nvidia_drm/parameters/modeset shows Y.

@nabajour
Copy link
Author

nabajour commented Oct 2, 2024

I currently have modeset, but not nvidia_drm.fbdev=1

@mahkoh
Copy link
Owner

mahkoh commented Oct 2, 2024

I don't understand how this could be. EOPNOTSUPP should only be returned in the following two situations:

	if (!drm_core_check_feature(dev, DRIVER_MODESET))
		return -EOPNOTSUPP;

	if (!drm_dev_has_vblank(dev))
		return -EOPNOTSUPP;

where drm_dev_has_vblank checks that there are any CRTCs registered by the driver. The open source nvidia driver does this here:

#if !defined(NV_DRM_CRTC_STATE_HAS_NO_VBLANK)
    drm_vblank_init(dev, dev->mode_config.num_crtc);
#endif

It's possible that the proprietary driver does not call this function or that NV_DRM_CRTC_STATE_HAS_NO_VBLANK is defined in the build that you are using.

The ioctl also calls some driver-specific functions but it looks like the open source driver does not set these functions.

Is the package you are using the open source driver or the proprietary driver?


Support for this ioctl is unfortunately a hard requirement. Applications freezing is expected if it doesn't work.

@mahkoh
Copy link
Owner

mahkoh commented Oct 2, 2024

So, NV_DRM_CRTC_STATE_HAS_NO_VBLANK just checks that "fake vblank events" are available which is always true on non-ancient kernels. So the nvidia driver never calls this function.

@cubanismo is this something you want to support? There is no way to detect this lack of support from userspace AFAICT and nouveau does call drm_vblank_init.

Otherwise how is userspace to supposed to know about vblank times on your driver? Using a userspace timer diverges from the actual times very quickly IME.

@nabajour
Copy link
Author

nabajour commented Oct 2, 2024

Is the package you are using the open source driver or the proprietary driver?

It's the NVidia "Open" driver:

Oct 02 16:10:12 halo kernel: nvidia-modeset: Loading NVIDIA UNIX Open Kernel Mode Setting Driver for x86_64  560.35.03  Release Build  (dvs-builder@U16-I1-N07-12-3)  Fri Aug 16 21:22:33 UTC 2024

Actually, grepping through the logs, it looks like the driver also complains:

Oct 02 19:51:06 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
Oct 02 19:51:15 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
Oct 02 19:51:18 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2

@mahkoh
Copy link
Owner

mahkoh commented Oct 3, 2024

Please test #282. It emulates vblank events with flip events. This approach as some downsides:

  • Our understanding of the vblank clock will be less accurate so the felt input delay will be higher and it is more likely that we miss frames.
  • Higher CPU and GPU usage due to the need to render and submit new frames where we would otherwise only have to listen to vblank events.
  • It's outright broken with tearing presentation since the emulation will generate fake vblank events unthrottled.
  • It might or might not interfere with VRR.

Many of these downsides are probably shared by the current state of sway and hyprland.

You might want to switch to nouveau + nvk for better support.

@nabajour
Copy link
Author

nabajour commented Oct 3, 2024

Please test #282. It emulates vblank events with flip events.

I tested it, it seems to be working a bit better, in the sense that I can open a terminal, run vkcube-wayland for a while, open some applications. But it seems to freeze anywqay after some minutes, without any specific message in the logs. My shortcut to exit still works and jay quits cleanly.

Some observations on the side, dunno if they are relevant:

  • Vulkan backend starts up being smooth, until it freezes
  • OpenGL backend is stuttery at start up (update rate at 0.5 to 1Hz). If I start vkcube-wayland, it gets smooth, and then freezes.
  • I run my tests from a VT, with SDDM and an Xorg session in the background.
  • I wasn't able to start jay for a long time until I noticed that my VTs were using kmscon, which seems to block jay to grab the device with drmSetMaster. Uninstalling kmscon fixed this.

You might want to switch to nouveau + nvk for better support.

I'm also doing some CUDA development, so I'm stuck with NVidia and their driver.

Many of these downsides are probably shared by the current state of sway and hyprland.

I haven't done some long term testing, but they didn't freeze on me during my tests. I'm also surprised to be the first one to run into this. Am I the only one using such a setup? There could be something fishy in my setup. I'll see if I can set it up on another computer with also an nvidia card.

@mahkoh
Copy link
Owner

mahkoh commented Oct 3, 2024

Could you post the entire log file? Anything in the journal?

But it seems to freeze anywqay after some minutes, without any specific message in the logs. My shortcut to exit still works and jay quits cleanly.

This sounds like we're not receiving flip events. If we don't receive flip events we won't try to render a new frame. That would be consistent with the error messages you posted above:

Oct 02 19:51:18 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2

You can find many similar reports on google: https://www.google.com/search?q=%22Flip+event+timeout+on+head%22

OpenGL backend is stuttery at start up (update rate at 0.5 to 1Hz). If I start vkcube-wayland, it gets smooth, and then freezes.

No idea what's happening here.

I wasn't able to start jay for a long time until I noticed that my VTs were using kmscon, which seems to block jay to grab the device with drmSetMaster.

Indeed, that is not supported.

@nabajour
Copy link
Author

nabajour commented Oct 3, 2024

Could you post the entire log file? Anything in the journal?

jay-2024-10-03T14:13:39.087Z-0.txt

freeze happens somewhere around 14:19:20Z. I then waited 2 minutes to see if it unfroze and let it settle, before quitting with the quit shortcut.

System logs around that time:

Oct 03 16:17:01 halo CRON[15256]: pam_unix(cron:session): session closed for user root
Oct 03 16:19:30 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
Oct 03 16:22:05 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
Oct 03 16:22:08 halo kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 2
Oct 03 16:22:17 halo bluetoothd[1845]: Endpoint unregistered: sender=:1.296 path=/MediaEndpoint/A2DPSource/ldac

You can find many similar reports on google: https://www.google.com/search?q=%22Flip+event+timeout+on+head%22

OK, so there is an already known issue around this! Thanks for the info

@cubanismo
Copy link

So, NV_DRM_CRTC_STATE_HAS_NO_VBLANK just checks that "fake vblank events" are available which is always true on non-ancient kernels. So the nvidia driver never calls this function.

@cubanismo is this something you want to support? There is no way to detect this lack of support from userspace AFAICT and nouveau does call drm_vblank_init.

Otherwise how is userspace to supposed to know about vblank times on your driver? Using a userspace timer diverges from the actual times very quickly IME.

I don't work on this area of the code, but my understanding of that function call is it isn't needed on the kernels where it is excluded (As you say, all reasonably modern kernels), because the equivalent "fake" vblank events will be generated automatically without it. Is that not what you're seeing? Yes, it would be better if the driver reported accurate (I.e., real) vblank events, but it does not at the moment.

@mahkoh
Copy link
Owner

mahkoh commented Oct 4, 2024

Is that not what you're seeing?

Unfortunately not. All of the UAPIs to access vblank events are guarded by

	if (!drm_dev_has_vblank(dev))
		return -EOPNOTSUPP;

and if the driver doesn't call drm_vblank_init this always fails. I found this old thread that talks about this: https://forums.developer.nvidia.com/t/how-to-use-drmwaitvblank-for-nvidia-linux/218905

In particular, I use DRM_IOCTL_CRTC_GET_SEQUENCE to get a constant stream of vblank events.

@cubanismo
Copy link

Ack, and thanks for following up. I've asked to bump the priority of the work needed to enable the relevant support in the driver based on your feedback.

@mahkoh mahkoh added the nvidia label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants