Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SDL_Delay() requires a &mut TimerSubsystem? #1268

Closed
jordanhalase opened this issue Nov 5, 2022 · 5 comments · Fixed by #1270
Closed

SDL_Delay() requires a &mut TimerSubsystem? #1268

jordanhalase opened this issue Nov 5, 2022 · 5 comments · Fixed by #1270

Comments

@jordanhalase
Copy link
Contributor

jordanhalase commented Nov 5, 2022

Hello,

I know that the typical solution to this is to use something like std::thread::sleep() instead of the SDL-provided delay function, but this is a weird quirk in the bindings nonetheless.

I have a short project where I have a running window and create a timer callback that runs on another thread to print ongoing FPS data via let _fps_timer = video_subsystem.add_timer() here on line 57. _fps_timer must live throughout the game loop in order to print FPS data.

At the end of the game loop I call timer_subsystem.delay(), and get a compiler error because _fps_timer is already holding an immutable reference to timer_subsystem. (Note that timer_subsystem would be mut on line 56.)

I found it strange that delay() expects a mutable reference to TimerSubsystem. Most other subsystem methods only require immutable &self references. In fact, in the SDL2 code, SDL_Delay() doesn't even initialize the Timer subsystem, and doesn't even need a call to SDL_Init()!

So I have two one questions:

  1. Couldn't delay() take an immutable reference to &self instead of &mut self?
  2. Furthermore, couldn't delay() just be a freestanding function like sdl2::timer::delay() or even sdl2::delay()?

Thank you.

@jordanhalase
Copy link
Contributor Author

jordanhalase commented Nov 6, 2022

    pub fn delay(&mut self, ms: u32) {
        // Google says this is probably not thread-safe (TODO: prove/disprove this).
        unsafe { sys::SDL_Delay(ms) }
    }

While I'm here, I'd like to address this TODO:

In Rust's standard library on Windows, std::thread::sleep() is literally just a simple call to

unsafe { c::Sleep(super::dur2timeout(dur)) }

SDL2 is more complicated.

Windows

In src/timer/windows/SDL_systimer.c, SDL_Delay() does different things depending on the version of the compiler. If Sleep() is available, it runs this:

    if (!ticks_started) {
        SDL_TicksInit();
    }
    Sleep(ms);

ticks_started is a static BOOL. There is a race condition between the if-check and SDL_TicksInit(), but only if SDL was not already initialized. This race condition is avoided by ensuring SDL is initialized first, which these bindings already do by tying it to TimerSubsystem. This variable only gets written during initialization and shutdown, so non-atomic reads should be safe during the lifetime of the SDL library. This answers my second question about being a freestanding function. As far as threading goes, this is safe as long as SDL_Init() is called first and not after SDL_Quit().

If Sleep() is unavailable (old versions of WinRT), it runs this:

    static HANDLE mutex = 0;
    if (!mutex) {
        mutex = CreateEventEx(0, 0, 0, EVENT_ALL_ACCESS);
    }
    WaitForSingleObjectEx(mutex, ms, FALSE);

This creates a dummy mutex and waits for it to timeout. Clever, but there is a race condition that can cause a memory leak if multiple threads create a mutex and save it to mutex. If, for example, an 8-core CPU calls this function simultaneously from 8 threads, there is an (extremely small) possibility that up to 7 mutexes will be leaked as they all write to the same mutex, but no more. Subsequent calls will not leak any more memory once mutex is set by at least one thread. There is no chance for deadlock because these are dummy mutexes meant to be timed out on. It doesn't matter which dummy mutex WaitForSingleObject waits on because any dummy will do. The mutex is stored in a HANDLE, which is a void* typedef, which is atomic (and aligned) on x86/64 in which WinRT runs, so WaitForSingleObjectEx will always receive a non-mangled mutex. I would argue this is thread safe, despite the chance of leaking a little memory.

However, WinRT also runs on ARM. Atomicity of simple loads and stores on ARM is not like on x86/64, so unfortunately I do not think this is thread safe, because WaitForSingleObject could potentially receive a mangled mutex value.

I could keep investigating more platforms but I think this is a nail in the coffin. SDL_Delay() cannot be proven to be thread safe for all platforms. While it can be thread safe on some platforms, SDL targets platforms that are dead, and the code base is not shy about using hacky tricks to get things working in an unsafe way.

For this reason, this comment

Could be replaced with one along the lines of

HOWEVER

I still don't see why TimerSubsystem::delay() requires a &mut self. Going through the code, SDL_Delay() looks to be in the timer subsystem codebase in name only. It actually relies on SDL_InitTicks() being set at startup, not SDL_InitTimer(). SDL_InitTicks() is always called on startup whether or not SDL_INIT_TIMER is set. I can't see any code where they interact in any way. The way I see it, SDL_Delay() could just be TimerSubsystem::delay(&self), or even Sdl::delay(&self).

In fact, SDL_AddTimer() and SDL_RemoveTimer() are fully thread-safe (so long as inited), so any thread can safely add or remove timer callbacks. (I'm not sure how one would actually use these functions from multiple Rust threads as they are, however. This could potentially be a can of worms that may have us split the timer subsystem into a separate ticks subsystem.)

Thoughts?

@Cobrand
Copy link
Member

Cobrand commented Nov 6, 2022

I don't have the answer to any of this, your best bet would be to blame the line, get who did this, try to find a related issue or series of commit that are related, and maybe you'll get answer there. It's possible that this is a mistake yes, but it's also possible that there is a hidden reason.

@jordanhalase
Copy link
Contributor Author

jordanhalase commented Nov 6, 2022

Hello @Cobrand, it's good to see your quick response.

When I blamed the line, I see it was added in 0a3453b by @nukep in 2015 and never touched again. Prior to that, it really was a freestanding sdl2::timer::delay() but requires SDL_Init() to be safe and so was tied to TimerSubsystem, which is surely "safe" but overzealous, and breaks usage of other methods in TimerSubsystem. I found a related issue where they say the following:

I've been nitting the crap out of this project recently... but the timer probably needs redesigned, too.

Couple that with the TODO: prove/disprove this tells me this was added in as a quick way to make the binding safe while assuming the worst by making it &mut self. We can do better.

I was exasperated going through SDL_Delay() last night, but it turns out that it's actually very simple on most platforms, so I inspected further.

Here's what I noticed on all platforms for SDL_Delay():

  • Haiku: Calls snooze()
  • 3DS: Calls svcSleepThread()
  • Ngage: Calls User::After(TTimeIntervalMicroSeconds32())
  • OS/2: Calls thread-safe Dos* functions
  • PS2: Calls nanosleep()
  • PSP: Calls sceKernelDelayThreadCB()
  • Vita: Calls sceKernelDelayThreadCB()
  • Unix: Calls either nanosleep() if compiled with it, or SDL_GetTicks64()
  • Windows: Calls Sleep() and may unsafely call SDL_TicksInit() if SDL_Init() wasn't called beforehand (x86); Unsafely creates and then waits on dummy mutex on old versions of WinRT.

Conclusion: SDL_Delay() is thread-safe on all platforms except on versions of WinRT for ARM prior to 2013, as long as SDL_TicksInit() was called beforehand. It seems obvious now that SDL_Delay() was always intended to be thread-safe, and the race condition I expressed earlier on WinRT+ARM is only theoretical (would be really hard to hit even if trying to, and may not be practically possible at all), on a very old version of an old and dead platform, and can be fixed upstream in the C project if truly need be. In fact, I'm not sure the "dummy mutex" codepath would even compile on a compiler new enough to compile with Rust at all. (SDL vehemently prefers to use Sleep().) So essentially, SDL_Delay() is fully thread-safe so long as inited.

SDL_Delay() on Unix may make calls to SDL_GetTicks64() if nanosleep is not available.

  • Haiku: Calls system_time()
  • 3DS: Calls svcGetSystemTick()
  • Ngage: Calls User::TickCount()
  • OS/2: Calls thread-safe Dos* functions
  • PS2: Calls GetTimerSystemTime()
  • PSP: Calls gettimeofday()
  • Vita: Calls sceKernelGetProcessTimeWide()
  • Unix: Calls clock_gettime(), mach_absolute_time(), or gettimeofday()
  • Windows: Calls QueryPerformanceCounter()

Conclusion: SDL_GetTicks64() (and SDL_GetTicks() which is just &0xffffffff) is thread-safe on all platforms, as long as SDL_TicksInit() was called beforehand.

In the C codebase, ticks and timer are technically separate systems, but the timer subsystem may make calls to SDL_GetTicks64(), so we're grouping them all into the one TimerSubsystem instead of having the ticks methods in some other TicksSubsystem to prevent SDL_GetTicks64() from invoking an unsafe lazy SDL_TicksInit() if TimerSubsystem were to outlive a dropped TicksSubsystem that calls SDL_TicksQuit(). The documentation in the C codebase treats them as one subsystem anyway.

Proposal

  • Change pub fn ticks(&self) -> u32 in TimerSubsystem into pub fn ticks(&self) -> u64 and call sys::SDL_GetTicks64(). SDL_GetTicks() is legacy and is just a single-line call to the 64-bit variant with a bitmask.
  • Change pub fn delay(&mut self, ms: u32) in TimerSubsystem into pub fn delay(&self, ms: u32).
  • Remove the TODO: prove/disprove this comments for ticks() and delay()

Code-wise this should be a very simple fix, it just took a lot of time for me to audit it. :)

Thoughts?

@Cobrand
Copy link
Member

Cobrand commented Nov 7, 2022

Thanks for looking it up! Your proposal sounds good to me, feel free to drop a PR and add the changes to the changelog as well.

@jordanhalase
Copy link
Contributor Author

While fixing I noticed a timer callback could cause UB if it panics, so I added a FIXME note, but that could be opened as a separate issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants