Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMD RADV driver discussion #252

Closed
doitsujin opened this issue Apr 7, 2018 · 171 comments
Closed

AMD RADV driver discussion #252

doitsujin opened this issue Apr 7, 2018 · 171 comments

Comments

@doitsujin
Copy link
Owner

doitsujin commented Apr 7, 2018

Some games may lock up the GPU when using the RADV Vulkan driver on AMD cards, which results in a frozen system. Unless this is caused by an obvious DXVK bug (i.e. there are Vulkan validation errors when VK_INSTANCE_LAYERS=VK_LAYER_LUNARG_standard_validation is set), please do not open a new issue if you encounter one of these hangs.

Instead, please comment on this thread and:

  • Important: Test the game with latest mesa-git and llvm-svn!
  • Provide information about the game, including the settings used, and your GPU
  • Provide information on how to reproduce the issue, including an apitrace if possible
  • If possible, provide a hang report, see instructions below.

Creating a hang report

In order to obtain a hang report from RADV for a specific game, set the RADV_DEBUG environment variable and redirect stderr and stdout to a file as follows:

export RADV_TRACE_FILE=/***/radv-trace.txt
export RADV_DEBUG=allbos,syncshaders,vmfaults
export WINEDEBUG=-all
wine game.exe 2>&1 | tee hang_report.txt

For games which launch themselves through Steam, modifying the launch options may be necessary.

Important: Please make sure that you have spirv-tools installed and that the spirv-dis executable is in your PATH.

@doitsujin doitsujin changed the title AMD RADV: GPU hang / system freeze meta bug AMD RADV: GPU hang / system freeze meta issue Apr 7, 2018
@sierkov-bot
Copy link

Witcher 3

GPU: RX 560 4GB (mesa-git , llvm-svn, dxvk-git 20180410.1fb22a6)

Settings: High presets, disabled vsync.

How to reproduce: In the beginning of the game, after Geralt wakes up from the dream and they are about to hit the road they smell ghouls. When either me or Vesemir hits the ghoul GPU hangs, not at the exact moment of hit, but soon after it. I've been able to reproduce it several times, every time my GPU hangs when i fight the ghouls.

Logs
hang report
d3d11_log
dxgi_log

My saves and in-game settings: The Witcher 3.tar.gz

@shmerl
Copy link
Contributor

shmerl commented Apr 11, 2018

@rserkov : to avoid this in the log:

sh: umr: command not found

You can build umr debugger from here.

@sierkov-bot
Copy link

@shmerl i just skimmed through the log since i don't understand much of what it says. Should i redo hang report with umr installed?

@doitsujin
Copy link
Owner Author

doitsujin commented Apr 11, 2018

@rserkov Not sure if umr provides all that much useful information, although it wouldn't hurt. Looks like you don't have spirv-tools installed though, getting the SPIR-V disassembly would be rather important to see if there is maybe something wrong with the shaders.

@sierkov-bot
Copy link

sierkov-bot commented Apr 11, 2018

@doitsujin redid with spirv-tools and umr installed, hang report. Please let me know if there is anything else i can do to help.

@jarrard
Copy link

jarrard commented Apr 11, 2018

Is it really a hang or a screen freeze, can you type blind into a console login and password then reboot (ie. ctrl+f1).

I have encounter a system hand/freeze with my 1080ti but discovered it was not a hang but a unrecoverable screen freeze that which you can still type during and reboot via a terminal console.

@sr-tream
Copy link

@jarrard on AMD, this freeze of the screen after a while goes into complete suspension of the system.

@Kzimir
Copy link

Kzimir commented Apr 12, 2018

Assassin's Creed III

GPU: RX 560 4GB (mesa-git , llvm-5.0.1, dxvk-r952.adb1789). Same issue with llvm-git.

Settings: Normal Settings, VSync disabled

How to reproduce: I'm in Boston and if i enable the Eagle Vision, the game crash and the system hangs. Need to hard reboot. The system can hangs after to play for a long time.

Logs
hang report
d3d11_log
dxgi_log

@Nerellus
Copy link

Nerellus commented Apr 15, 2018

Star Trek Online

GPU: RX 570 8GB
mesa: git @ 6a519a157b5fe5d449444c04a0429e8a24546e9c
llvm: svn @ 330092 (commit 319534 reverted)
dxvk: git @ 31ed6e5

Settings: Defaults

How to reproduce:
cd /path/to/Star\ Trek\ Online_en/Star\ Trek\ Online/Live
wine x64/GameClient.exe -Locale English -server 208.95.186.11
GPU hangs while loading login screen

Logs:
hang_report.txt
radv-trace.txt
GameClient_d3d11.log
GameClient_dxgi.log

apitrace:
STO.dxvk.trace
STO.win7.trace

Unfortunately I can't get a trace with wined3d. This trace was made with dxvk+amdvlk (which does not hang here), when replayed with RADV it hangs as normal.
Added additional apitrace from Windows 7.

@mradermaxlol
Copy link

Okay, straight outta #193, eh? :)

Here's the hang report: hang_report.txt
I ran as mentioned in the how-to (I ran TheCrew.exe from UPlay's game directory), with spirv-tools installed. I compiled {lib32-,}llvm-svn_r330096 with that amdgpu thing reverted & {lib32-,}mesa-git_101626.6a519a157b.
The only visual change was that with all that RADV debugging enabled I could see the chat thingie rendering, though everything else remained the same - there's a static image, background sounds and that's it.
Here's the output of running the game with only DXVK_DEBUG_LAYERS=1 set: consolelog.txt

DXVK version used: 98b8d41

@GloriousEggroll
Copy link
Contributor

Overwatch hangs on llvm 6.0.0, 5.0/5.0.1/5.0.2 can be used

@tdjb
Copy link

tdjb commented Apr 19, 2018

Confirming Overwatch hangs on llvm 6.0.0 as well as on llvm 7.0.0-svn with mesa-git.

@stalkerg
Copy link

Maybe will be better if we make also issues on mesa and llvm bug trackers and put links here?
In my opinion, if GPU hangs it driver problem.

@exolyte
Copy link

exolyte commented Apr 30, 2018

Event[0]

The game hangs in the first loading screen after the intro.

mesa: 18.1 (96ed371)
llvm: 7.0 (331148)
dxvk: 4c298d4
GPU: RX 570

event0_d3d11.log
event0_dxgi.log
event0-hang_report.txt
event0-radv-trace.txt
Apitrace

EDIT 7th of june: Event[0] still hangs with the hellblade mesa workaround. I've added an apitrace to reproduce the hang.

@asumagic
Copy link

asumagic commented May 7, 2018

Overwatch
Seems easier to reproduce the hang with graphics set to absolute maximum when using RADV_DEBUG. Happens on low settings as well.

GPU: RX580

Hang report
Nothing worth mentioning in _d3d11 and _dxgi logs, but here they are anyway.

@jarrard
Copy link

jarrard commented May 8, 2018

You guys having hangs should monitor your GPU temperatures while playing with either a overlay or a log to txt method. I believe some radeon cards will start to crash above 85c

@shmerl
Copy link
Contributor

shmerl commented May 8, 2018

Sapphire cards are cooled pretty well, they never reach such high temperature for me, even on 100% load (and I do monitor it, you can run something like Ksysguard in parallel, it has neat hardware monitor features where you can add any sensor to show a dynamic graph). But I didn't have GPU hangs either so far with dxvk.

Is there a way to test a hang with TW3? I can try some save and check if it's a temperature issue or not.

@shmerl
Copy link
Contributor

shmerl commented May 8, 2018

Example (99% GPU load with dxvk / The Witcher 3, 1920x1200 Sapphire Pulse Vega 56):

ksysguard_tw3_dxvk

It maxes out around at 74°C for me.

@doitsujin
Copy link
Owner Author

@Enverex can you try to record an apitrace that reproduces the hang in ESO and/or WoFF?

DRM-Next is known to occationally cause regressions.

@Enverex
Copy link

Enverex commented Nov 19, 2018

It looks like both of those are caused by DRM-Next, not just the one as I originally thought. Would you still like traces or is it just worth disregarding them for now?

@debiangamer
Copy link

debiangamer commented Nov 21, 2018

Also Elder Scrolls Online.

The game did work fine some time ago when it was free: https://www.youtube.com/watch?v=Vq9jZqbitbY&t=296s

@Enverex
Copy link

Enverex commented Nov 21, 2018

As mentioned, the issue only happens on DRM-Next, so unless you're running that, you won't have issues.

@macskay
Copy link

macskay commented Nov 22, 2018

Hi,
First off I want to apologize, if this is not the right thread, as I've tested multiple drivers including RADV therefore this seemed the best thread to discuss my issue.

Currently running Ubuntu 18.04.1 and trying to get AMD RADV working with my R9 280X. I got it working with a couple of games, others however simply do not start and throw me a page fault on read/write access. I've setup the games through Lutris, i.e. Origin with DVXK support and Uplay with DVXK support.

The games not working are:

The games that are working are:

  • Battlefield V
  • Fifa 19
  • The Lord of the Rings Online
  • AC: Black Flag

I tried the AMDVLK as well as the AMDGPU-PRO and the Mesa (AMD RADV) drivers. Getting the same error again and again. To my current knowledge this has to be an issue with my driver setup, since a friend using an NVIDIA can start at least one game ("A Way Out") without any problems using Lutris.

Also when issuing vulkaninfo for the Device Names it spits out two devices (although I only have one gpu).

max@guybrush:~ $ vulkaninfo | less | grep deviceName
WARNING: radv is not a conformant vulkan implementation, testing use only.
	deviceName     = AMD RADV TAHITI (LLVM 7.0.0)
	deviceName     = AMD Radeon HD 7900 Series

When enabling devinfo in my DVXK_HUD it shows be that the latter of the two is used, so I tried filtering by the device name to use the RADV one, but when setting DXVK_FILTER_DEVICE_NAME="AMD RADV TAHITI (LLVM 7.0.0)" it tells me that there is no adapter found and when an application then starts no devinfo is given, so it does not seem to filter the devices correctly.

Do these games not working because of the same DRM-next error? But why do they work on other gpus then? Shouldn't they be blocked too, if it's a DRM related issue?

Any help is highly appreciated.

@doitsujin
Copy link
Owner Author

doitsujin commented Nov 22, 2018

@macskay Those are probably not driver issues, AC:Origins is known not to work due to its DRM. Not sure about the other two, but A Way Out may require some tinkering with wine.

@macskay
Copy link

macskay commented Nov 22, 2018

@doitsujin Well yeah, AC:Origins I figured in the meantime, ANNO 2205 however seems to work fine with Caching disabled as stated in #686 and the wine configuration for "A Way Out" is equal to the one my friend has in Lutris. I copied his settings.

// Edit:
OK, the strangest thing just happened. My friend and I decided to switch gpus, as his seems to be working. When installing the NVIDIA nothing changed. I uninstalled all AMD drivers, installed the NVIDIAs but the problem still persisted. When switching back to AMD and reinstall AMD drivers the game "A Way Out" successfully started and we could even play in a lobby together (with the drawback, the game has a yellowish-shader but oh well). So the game does start now. Haven't tried any of the others, but it seems to be very odd nevertheless. I haven't reinstalled the game, just the drivers (for the 20th time or so)

@aqxa1
Copy link
Contributor

aqxa1 commented Nov 28, 2018

Another game with GPU hangs on Vega is Sunset Overdrive. They appear to be random, rather than at a particular location, and occur once an hour or so:

[16034.889009] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:56 vmid:2 pasid:32775, for process Sunset.exe pid 1152 thread Sunset.exe pid 1152
)
[16034.889011] amdgpu 0000:0d:00.0: at address 0x00008001a71ba000 from 27
[16034.889013] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x002C0070
[16034.889046] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:56 vmid:2 pasid:32775, for process Sunset.exe pid 1152 thread Sunset.exe pid 1152
)
[16034.889049] amdgpu 0000:0d:00.0: at address 0x00008001a71ba000 from 27
[16034.889050] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x002C0070
[16053.544256] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:24 vmid:2 pasid:32775, for process Sunset.exe pid 1152 thread Sunset.exe pid 1152
)
[16053.544260] amdgpu 0000:0d:00.0: at address 0x00008001a71ba000 from 27
[16053.544261] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00201030
[16063.554337] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=4097909, emitted seq=4097911
[16063.554339] [drm] GPU recovery disabled.

AMDVLK doesn't seem to work with the game so I can't test there.

I should also note that the Yakuza 0 workaround isn't in mesa-git yet (because it could cause performance issues) so I'm not sure I'd consider it fixed yet, at least for Vega.

@gort818
Copy link
Contributor

gort818 commented Nov 30, 2018

@Enverex try building the latest amd-staging-drm-next kernel, I had a lot of hangs with dxvk, if you are on arch check this out https://aur.archlinux.org/packages/linux-amd-staging-drm-next-git/

I haven't got any issues, I built the kernel a few days ago.

@Enverex
Copy link

Enverex commented Nov 30, 2018

"linux-drm-next-git" was the one I originally tried (that had far, far more issues than the stock kernel). The stock kernel actually seems fine with DXVK from what I've seen (at least in everything I've tried so far or that had issues before), it was just DRM-Next that had issues.

@gort818
Copy link
Contributor

gort818 commented Nov 30, 2018

I have not had any issues with the stock kernel and dxvk either, but I wanted the drm-next for fixes eg. increasing the power limit. I also seem to get better performance. I had the exact same hangs a few weeks ago. But now running great .

@Enverex
Copy link

Enverex commented Nov 30, 2018

In that case I'll compile and switch to that kernel then report back.

@aqxa1
Copy link
Contributor

aqxa1 commented Dec 14, 2018

Yakuza 0 Vega hangs are now fixed in mesa git, so no need to patch now.

@hakzsam
Copy link
Contributor

hakzsam commented Dec 14, 2018

Yes, and that also fixes The Evil Within.

@igo95862
Copy link

igo95862 commented Jan 25, 2019

Anyone experiencing driver crash in Endless Space 2?
Game has a free weekend right now.
Using WINE3D11 will not cause the crash.
I don't have mesa-git or llvm-svn. Only Mesa 18.3.1 and LLVM 7.0.1. So I did not want to open the bug report since it might be fixed on newer Mesa or LLVM

RX 480 card

@zurohki
Copy link

zurohki commented Mar 23, 2019

Space Engineers is causing GPU hangs. It's something about the terrain that does it - playing in space works fine for hours at a time, but starting a new game on a planet hangs in a minute or two.

The game's pretty unstable overall and crashes quite a bit, but it still shouldn't be able to hang the GPU.

Ryzen 2700X
Vega 64
llvm-9.0.0_356367
mesa-19.0_g493b3ada9b1
kernel 5.0.1
wine-staging 4.4 from https://github.com/lutris/wine
DXVK 1.0.1
Graphics settings: 3840x2160, medium detail

spaceengineers-crash.txt

spaceengineers-crash-2.txt with WINEDEBUG=-all

@urbenlegend
Copy link

Ace Combat: Assault Horizon reliably crashes for me after loading the first mission. It plays 5 seconds work of the cutscene and then freezes the entire system.

Ryzen 2700X
Vega 64
LLVM 7.0
Mesa 19.0.0
Kernel 5.0.3
Proton 3.16-8

@aqxa1
Copy link
Contributor

aqxa1 commented Mar 25, 2019

@urbenlegend

LLVM 7.0

That's a pretty old version of LLVM, and I seem to remember LLVM being partially responsible for some GPU hangs. You might want to try LLVM 8 or 9 (and use a version of Mesa compiled with it).

@dlove67
Copy link

dlove67 commented Mar 25, 2019

@thirdeyefunction @urbenlegend

I've got a similar build:

Threadripper 1950X
Vega 64
LLVM 8.0/9.0
Mesa 19.0.0
Kernel 5.0
Proton 3.16-8

And I've gotten the same error. I haven't tried in a week or two so I can see if any recent LLVM git updates corrected it, but it's definitely not just LLVM7.0 that's affected here.

@aqxa1
Copy link
Contributor

aqxa1 commented Mar 27, 2019

According to PCGW and the Steam Store the system requirements suggest that the game only supports D3D9. Is this correct or is there an optional D3D11 mode? If it's D3D9 only, then RADV (and DXVK) is unrelated to this issue.

Or are you referring to Ace Combat 7: Skies Unknown?

@urbenlegend
Copy link

urbenlegend commented Mar 27, 2019

@thirdeyefunction Well, if I enable PROTON_USE_WINED3D, the game won't even launch so I am assuming it is using DXVK in some capacity. And no it is not Ace Combat 7, it is Assault Horizon.

@hakzsam
Copy link
Contributor

hakzsam commented Mar 28, 2019

Can you please fill bug reports directly here https://bugs.freedesktop.org (under Drivers/Vulkan/Radeon) ?

@zurohki
Copy link

zurohki commented Apr 28, 2019

I posted about mine over at https://bugs.freedesktop.org/show_bug.cgi?id=110291

@Sur3
Copy link

Sur3 commented Aug 2, 2019

The release notes of 1.3 say that AMD RADV uses early-discards instead of discards via VK_EXT_shader_demote_to_helper_invocation, what's the difference, are early discards better? And also it says it only works with ACO instead of LLVM backend, is there a bug related to that and can I test it with LLVM somehow anyway?

@Oschowa
Copy link
Contributor

Oschowa commented Aug 2, 2019

@Sur3 VK_EXT_shader_demote_to_helper_invocation is only implemented in ACO currently. Early discards are buggy (i.e. cause GPU hangs in certain games) on LLVM, but you use it anyways with
dxvk.useEarlyDiscard = True
in dxvk.conf

@Commaster
Copy link

I'm trying to debug Warframe with this, but adding

RADV_TRACE_FILE=/***/radv-trace.txt
RADV_DEBUG=allbos,syncshaders,vmfaults

to the launch options causes the game to spam child processes sh -c dmesg a million times over and basically never finishing the loading process. The hang_report.txt is filled with

ERROR: ld.so: object '.../.steam/ubuntu12_32/gameoverlayrenderer.so' from LD_PRELOAD cannot be preloaded (wrong ELF class: ELFCLASS32): ignored.

i7-5930K
AMD TAHITI 7950
LLVM 9.0 (oibaf)
Mesa 19.3 (oibaf)
Kernel 5.3 (ubuntu bionic-proposed)
Proton 4.17.2 (GloriousEggroll)
Lunarg vulkan sdk or 1.1.70 ubuntu libvulkan1

P.S. Happens also on older software (Mesa 19.0.8, Kernel 5.0, Proton 4.2.9, LLVM 8.0 etc..) as well as amdvlk instead of mesa-vulkan-drivers.

The lock ups are completely random, can happen several hours or a couple minutes in, be it on a pause screen or in the middle of an epilepsy-inducing fight. GPU temps are below 70 when the system locks up and cycles half a second of sound through the speakers, even Magic SysRq doesn't work.

Is there any other way to debug this issue?

@Fatmice
Copy link

Fatmice commented Nov 15, 2019

Hi, this looks like a powerplay issue. I also experiences the same problem with random lockup on a Vega 64.

There seems to be patch being submitted to mesa to correct this. See this thread
https://bugs.freedesktop.org/show_bug.cgi?id=109955

Work around is to limit memory clock to state 1,2,3

If you want someone to apply your changes in bug report no. 110777 to the kernel for testing, I can so but will not be to it until this weekend.
As a side note, I've had great success manually limiting the memory clock to level 1,2,3 on my Vega 64. I've played over 7 hours of Stellaris without a crash.

echo "manual" > /sys/class/drm/card0/device/power_dpm_force_performance_level
echo "1 2 3" > /sys/class/drm/card0/device/pp_dpm_mclk

Repository owner locked as resolved and limited conversation to collaborators Mar 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests