Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance MAME HLSL To Do Sub-Refresh Processing On High-Hz Displays (CRT emulation, Reduces Blur, Lowers Lag, De-Stuttering) #6762

Closed
mdrejhon opened this issue May 30, 2020 · 12 comments
Labels
spike Discussion based issues with no clear "close" condition

Comments

@mdrejhon
Copy link

mdrejhon commented May 30, 2020

Here's the short TL;DR summary:

TL;DR: Enhance existing MAME HLSL to include sub-refresh behaviors, including simulating CRT phosphor fade in sub-refresh increments + optional rolling-scan BFI in sub-refresh increments. All achieved via software, with the only hardware requirement being a high-Hz display, even 120Hz or 240Hz. Made possible simply by a display having a higher Hz than the original emulated Hz

Bonus optional side effect 1: Reduce input lag. Since it's a rolling scan emulation via subdividing emulator Hz over multiple destination Hz, there is a simple cross-platform beam-race opportunity to reduce MAME lag even closer to 1:1 symmetry to original machine. For example, 240Hz would use 4 hardware refresh cycles to emulate a 60Hz emulator refresh cycle in quarters at a time. No raster knowledge of the destination display is needed, you're simply using the finer granularity of the higher output Hz.

Bonus optional side effect 2: Real-time standards conversion. Smooth 50fps on 60Hz displays! The same algorithm I propose, accidentally de-stutters frame rate conversions too, so can be used for standards conversion (e.g. ANIMATION: Software algorithm that makes 50fps look smooth without stutters on a 60hz non-VRR display). So this algorithm is accidentally useful on today's 60Hz displays too. Output Hz actually can also be lower than emulator Hz as the algorithm can also blend refresh cycles together as part of the flexible Hz-agnostic phosphor fade algorithm capable of overlapping refresh cycles, similiar to what I successfully did in the link above. Allowing smooth 60fps on 50Hz displays too!

Long Version Below

MAME HLSL revolutionized the spatial look of a CRT tube.

However, it doesn't yet look temporally like a CRT tube (zero blur, etc).

The vision is that with increasingly-higher monitor refresh rates, thanks to the arrival of 240Hz IPS (ASUS) and 360Hz IPS (DELL AW2521H) this summer, now it is time to start talking about emulating a CRT gun at the sub-refresh-cycle level.

I posted at GroovyMAME about the concept of "Temporal HLSL"
http://forum.arcadecontrols.com/index.php/topic,162926.0.html

The long-term futurist vision is that a 1000Hz display will be able to emulate a CRT electron gun, by displaying a "rolling bar" with a phosphor fade trail. Instead, we aim to do that in software, ala a "Temporal HLSL" algorithm of a rolling-bar software BFI.

This may be easier to imagine:
Much like playing back a 1000fps high-speed video of a CRT (with full dynamic range, not over exposed), back to a real-time 1000Hz sample-and-hold display. The rolling bar. Early tests show it ends up looking like the original CRT temporally with low persistence (1ms persistence). Now that experimential ultra-Hz displays are in the lab. 360Hz is the beginnings of the "wow" factor in software-based rolling-bar BFI algorithms. Finer-granularity Hz is even better, but 360Hz begins to start the "wow" in temporal look-n-feel.

ASUS has confirmed a long-term road map to 1000Hz displays (those unaware, can also read more; Blur Busters Law: The Amazing Journey To Future 1000Hz Displays, with scientific citations included), and with the push of 120Hz becoming more mainstream (iPhone/Galaxy), high-Hz is expected to be inexpensive inclusions in future panels in ever-increasing numbers with refresh rates doubling approximately 5-10 years.

While crazy numbers, the impetus of ultra-Hz is currently esports and VR. The need for screens to emulate real life (real life doesn't flicker, which requires blurless sample-and-hold. And blurless sample-and-hold requires insane Hz), thanks to VR and Holodecks, but a side effect of the refresh rate race is an improved ability to emulate a CRT electron gun at subrefresh timescales (zero-blur rolling scan), which is a boon for emulator originality.

Refresh rates are now getting high enough (360Hz) to make Temporal HLSL practical. At 360Hz, one can use a 1/6th screen-height "rolling phosphor bar emulation" (with a gamma-corrected alpha-blended phosphor trail), generating only 2.7ms of motion blur (1/360sec persistence). Bar height and bar fadetrails (as the bar scrolls downwards) can be programmable with existing options or new options in the configuration file. One would have have alphablend-overlap betwen adjacent refresh cycles so you don't have artifacts, but done correctly, it looks seamless once we're in the refresh rate stratospheres.

In addition, I also posted a permanent universal solution for software-BFI burn-in artifacts. in that thread. It is brighter, more colorful, no chessboard artifact, no banding or color depth loss, no burn in.

I'd like to see long-term discussion on incubating Temporal HLSL in what I envision is an approximately 5-year time window. I don't expect this to be added to MAME In near term but I would like to see the community do longterm-planing for sub-refresh processing.

CRTs will not last forever -- the factories have stopped. We will need to emulate CRTs at sub-refresh timescales. It has to happen within 10 years, when arcade tubes become valuable antiques.

Thankfully the refresh rate race has made that feasible for originality, the dream of Year 2030s bringing 1000Hz OLEDs and MicroLEDs replicating the motion blur of a CRT rolling scan. But we can get prepared now, with this year's 240Hz IPS and 360Hz IPS panels.

P.S. I am open to moving this discussion to a different forum, or keeping this as an incubator github thread. If this is the wrong place, open discussion on how to begin a "Temporal HLSL" collaborative initiative somewhere else is welcome. Let's either (A) continue this issue as incubate this concept, or (B) figure out appropriate venue before closing this issue. As MAME HLSL is incredibly well known for its studious attention to spatial detail, ultra-detailed configuration files, so it is the Gold Standard of CRT filters. And the discussion to eventually make it capable of subrefresh processing is desirable to duplicate the original temporal behaviours of a CRT (including zero CRT motion blur). Which ones of you are responsible for HLSL?

@mdrejhon mdrejhon changed the title Path Towards Temporal HLSL on upcoming 360Hz and 1000Hz Monitors (not just Spatial HLSL) Path Towards Temporal HLSL (beyond just Spatial HLSL) May 30, 2020
@mdrejhon mdrejhon changed the title Path Towards Temporal HLSL (beyond just Spatial HLSL) Plan Towards Temporal HLSL (Processing HLSL at Sub-Refresh Time Scales) May 30, 2020
@mdrejhon
Copy link
Author

mdrejhon commented Jun 1, 2020

BTW, this is a very cross-platform idea. This is much easier to conceptualize than beamraced VSYNC (while not needing to be mutually exclusive).

I have repackaged this idea into something easier to conceptually understand, if you have at least basic raster knowledge:

Can be beam raced without knowing platform-specific or display-specific behaviors

Basically sub-refresh latencies using ordinary VSYNC ON. You wouldn't have to care about how a display refreshes. For example, at 240Hz, you'd display 1/4th of an emulator refresh cycle (rasterplotted in real time), in 4 separate output frames.

For a 60Hz emulator module onto a 240Hz monitor

  • Emulate 1/240sec, output 1/4-height frame at top
  • Emulate 1/240sec, output 1/4-height frame just above center
  • Emulate 1/240sec, output 1/4-height frame just below center
  • Emulate 1/240sec, output 1/4-height frame at bottom

The rest of the frame would be black (except for any required alphablend overlaps to eliminate seams/artifacts)

Make top/bottom edges of bars fuzzy to reduce/eliminate tearing artifacts (avoid emulating the look of VSYNC OFF tearing). I have confirmed that you have to overlap the bars between output refresh cycles. Use alpha-blend-to-black slightly beyond the 1/4-height frames for the 240Hz situation example. This will prevent tearing artifacts. Make sure to gamma-correct the alphablend overlaps. Remember that RGB(128,128,128) is not exactly half the photons of RGB(255,255,255). Use a configurable gamma correction number in HLSL config file.

It can also acomplish beam raced latency without black frames too

The rolling scan BFI could in theory be configurable to full persistence. It would achieve beam raced latencies to the output refresh cycle granularity.

For a 60Hz emulator module onto a 240Hz monitor

  • Emulate 1/240sec, output frame with top 1/4 new refresh, bottom 3/4 old refresh
  • Emulate 1/240sec, output frame with top 1/2 new refresh, bottom 1/2 old refresh
  • Emulate 1/240sec, output frame with top 3/4 new refresh, bottom 1/4 old refresh
  • Emulate 1/240sec, output frame with whole screen new refresh

Make sure to alphablend the seams (blur the refresh overlap line) to prevent tearing artifacts, otherwise it looks like "a VSYNC OFF emulation in VSYNC ON". The alphablend fixes the tearing, especially at higher Hz (Seams at 360Hz will be less visible than seams on 240Hz, and will diminish to vanishing point, the higher the Hz you go)

This allows beam-raced latencies via sheer brute Hz. So, this module could be configurable to have full persistence or rolling low persistence (rolling black period).

This will scale very well to the future, in the refresh rate race to retina refresh rates too. A future 1000Hz display would emulate original machine latency to an error margin of 1ms, duplicating sub-refresh original-machine latencies, regardless of display technology (scan direction, display refresh pattern, etc). Most high-Hz gaming displays have a sub-refresh processing delay, so the higher Hz you go, the more it converges into original-machine latency.

Very little crossplatform dependencies, the only thing needed is VSYNC ON and the ability to do framerate=Hz.

Emulator Hz and destination Hz doesn't need to be divisible.

Situation Example of 60Hz CRT emulation onto a 200Hz LCD ... This formula is very Hz-agnostic.

Emulator Refresh Cycle 1:
....Real Refresh 1: full 60/200th height bar (30% screen height), at 0%-30% vertical position
....Real Refresh 2: full 60/200th height bar (30% screen height), at 30%-60% vertical position
....Real Refresh 3: fuill 60/200th height bar (30% screen height), at 60%-90% vertical position
....Real Refresh 4: 1/3 of 60/200th height bar (10% screen height), at 90%-100% vertical position

Emulator Refresh Cycle 2:
....Real Refresh 5: 2/3 of 60/200th height bar (20% screen height), at 0%-20% vertical position
....Real Refresh 6: full 60/200th height bar (30% screen height), at 20%-50% vertical position
....Real Refresh 7: full 60/200th height bar (30% screen height), at 50%-80% vertical position
....Real Refresh 8: 2/3 of of 60/200th height bar (20% screen height), at 80%-100% vertical position

Emulator Refresh Cycle 3:
....Real Refresh 9: 1/3 of 60/200th height bar (10% screen height), at 0%-10% vertical position
....Real Refresh 10: full 60/200th height bar (30% screen height), at 10%-40% vertical position
....Real Refresh 11: full 60/200th height bar (30% screen height), at 40%-70% vertical position
....Real Refresh 12: full 60/200th height bar (30% screen height), at 70%-100% vertical position

This is just a conceptual example of temporal compensation. Algorithmically, this can be used for black frame insertion (rolling bar, some bars with image data, other bars black, with alphablended bleed overlap) or for "beam racing via brute Hz" (all bars with image data, new emu Hz overwriting old emu Hz, alphablend/blur the seams) or both simultaneously (rolling BFI + beam racing simultaneously). You could adjust the alphablend factor up/down.

Theoretically can also acomplish stutterless standards conversion (50Hz onto 60Hz)

Using a SIMILAR formula above, combined with a huge overlap alphablend adjustment (i.e. 50% or 75% screen height of overlap, perhaps). Basically a scanout-enhanced version of a common alphablend standards-conversion algorithm. It would thus, eliminate stutters. The taller the alphablend overlap, the less stutter of standards-conversion. The bar-overlap should be a configurable parameter in the configuration file.

This would conceptually be a more advanced version of the software-based variable refresh animation: www.testufo.com/vrr ....where I'm able to play any framerate stutterlessly onto any refreshrate, with a very simple alphablend algorithm not too dissimilar from the common 50/60 standards conversion alphablend algorithm.

So we're just "abusing" the alphablend overlap feature of this theoretical "Temporal HLSL" as the method of destutter in much the same stutter-eliminating way.

Apparently, @TomHarte already tested something similar (some kind of scanout-alphablend algorithm) in his experiments with CLK.

Also, simulated scanout direction can be different from real display's scanout direction or refresh algorithm, i.e. high-Hz DLP. The software-based scanning direction can be made configurable too!

Temporal HLSL concept is a universal CRT emulator with amazing crossplatform flexibility

It scale up and down universally

  • Can temporally emulate a CRT electron gun at the granularity of destnation Hz
  • Map any Hz to any Hz without stutter
  • Benefits for stutterles Hz-conversion (50Hz onto 60Hz)
  • Benefits for high Hz (i.e. 60Hz onto 360Hz)
  • Multiple ways to be beam raceable (Temporal HLSL = a virtualized CRT tube)
  • Yet beamrace is optional (i.e. software BFI scan sequence run only after full emu frame ready)
  • Flickerless option (full persistence option)
  • Flicker option (simulate CRT phosphor, including decay, subject to output-Hz granularity)
  • Can emulate original scanout direction different from actual display scanout direction
  • No platform dependencies, it is Independent of display refreshing behavior, we simply worry about Hz granularity (more Hz = better)

OPTIONAL: Theoretically compatible with hardware beamraced VSYNC

  • OLEDs and LCDs already refresh raster based, as seen in high speed videos, www.blurbusters.com/scanout
  • Temporal HLSL could emulate a higher-Hz display internally independently of output Hz
  • A flywheel sync algorithm can optionally kick in whenever "emuHz close to outputHz" (within configurable margin). Allowing a separate module to frameslice-beamrace this virtualized internal display to a low-Hz output display.

Thus, it would produce latencies identical to the Lagless VSYNC / Beamraced VSYNC approaches, like the one already implemented in WinUAE tonioni/WinUAE#133

Certainly, actual-hardware beam racing would be 100% optional (since it requires minor platform-dependant and display-scanout-direction knowledge)

I only simply add this section to say that Temporal HLSL is 100% compatible with hardware-based beam racing simply by internally virtualizing a higher-Hz display internally, e.g. doing 600 frame-fragments per second of Temporal HLSL into RAM, and then doing 600-frameslice beamracing onto a real 60Hz display, subject to jitter safety margin algorithms (increased distance of emuraster below realraster where emu buffer raster is spatially ahead (physically) of real display raster) to hide emuraster-realraster artifacts that can be CRT-curvature algorithms, scaling algorithms, as well as computer performance jitter. So that it's curvature-independent, scaling-independent, etc. Or just ignore complexity and keep things simple (piggyback on brute Hz for beam racing).

Software-based beam racing: Output Hz massively above Emulator Hz. The art of emulating a CRT electron gun via brute Hz.
Hardware-based beam racing: Output Hz same as Emulator Hz. Beam raced synchronization of rasterplotting emulator's raster ahead of real display's raster. Using beam raced VSYNC OFF frameslices, that would work off a Temporal HLSL framebuffer too. (An approach successfully already implemented in a few emulators)

Basically the Temporal HLSL concept is compatible with both software-based beam racing and hardware-based beam racing.

Incubation Venue Needed

Discussion is welcome on how to long-term incubate a CRT electron gun emulator in the current refresh rate race to retina refresh rates.

@mdrejhon
Copy link
Author

mdrejhon commented Jul 18, 2020

Some information, @TomHarte appears to have alreaady done an implementation of a rolling-scan emulation (full-persistence version AFAIK) in the CRT emulator of CLK. Though I am not sure of its current suitability for handling MAME Temporal HLSL.

https://github.com/TomHarte/CLK

On the Mac platform, he tested some optional hardware-based beam racing code (not yet currently checked in) it has a flywheel algorithm that kicks in when emulator Hz is close to real Hz (to sync the emu-vs-real rasters) and only implemented on the Mac platform (using time offsets between CVDisplayLink callbacks as the VSYNC heartbeat, to estimate real raster position). Not implemented on the Linux port though, although it has much of the other CRT emulations. That part is purely optional, and doesn't need to be implemented. Nontheless, it's a rather neat concept to "optionally hardware-beamrace" an already "software based CRT beam emulator".

Also -- being founder of Blur Busters / TestUFO, I also have spare 240Hz monitors I may be able to loan/give out (North America only -- shipping is expensive) for a reputable developer fully committed to adding a high-Hz-aware CRT electron gun emulator. 240Hz is the entry level minimum where software based beam scanning becomes practical.

Mind you, it may be at least five to ten years before 240Hz becomes mainstreamed like 120Hz gradually slowly is (as it has one-quarter the web browser scrolling motion blur on sample-and-hold OLEDs/LCDs), but from a CRT-look preservation perspective, we might as well get started with current 240Hz LCDs while waiting for amazing 240Hz+ MicroLED/OLEDs that are even more capable of emulating CRT-look temporally.

@mdrejhon
Copy link
Author

mdrejhon commented Sep 29, 2020

Crossposting a potential "MAME Temporal HLSL" pre-requisite here from a RetroArch pull request, but this is such a simple universal algorithm applicable to all emulators wanting to futureproof for future "Better than 60Hz" technologies. The original text at libretro/RetroArch#11342 but it is a generic algorithm applicable to MAME.

This might or might not be potentially considered a pre-requisite for MAME Temporal HLSL. Although it is only a full-framebuffer workflow, it's very ideally adaptable to all sub-refresh workflows too such as Temporal HLSL.

Alternative workflows is a precision sync signalling thread, and keeping frame presentation in its original thread (which would listen/wait to this thread).

This is a universal generic crossplatform algorithm that should eventually become a best-practice for emulators in the next ten years.

Goals

  • Reduce existing stutters (VRR, triple buffer, VSYNC OFF, DWM, non-60Hz dislays)
  • Reduce existing flicker and eliminate artifacts (improved software-based BFI)
  • Improve user-friendliness (make things work more automatically on non-60Hz displays with fewer settings-fiddling)
  • Futureproofing to future display refreshing algorithms

Problem: Existing frame pacing algorithm not future proof enough

Currently, G-SYNC uses a software-based frame pacing but it is not currently optimized in a future-proof way yet:

  • Problem 1: Present imprecision create visible effects (stutters during VRR, flicker during BFI) especially in CPU heavy scenarios (slower systems, intensive emulators, big RunAhead settings)
  • Problem 2: Present imprecision makes it harder to add support for future enhancements (better VRR, >120Hz BFI, combine VRR+BFI, beam racing, rolling-scan BFI CRT emulators

The upgrade to existing frame pacing algorithm

I propose a separate thread responsible for frame presenting (e.g. Present() or glxxSwapBuffers() or whichever API) that does the following:

  • Allows frame presents to continue independently of rendering
    Presents will have no jitter even if emulator modules use darn near 100% CPU
  • Allows frame presents to be optionally ultra precise
    This can be done via busywait instead (or in addition to) of timer events -- because some algorithms (beamrace or BFI+VRR) have visibile artifacts with sub-millisecond errors. Also, one can also timer-event to 0.5ms prior, then busywait on high-precision-clock the rest of way.
  • Allows same frame pacing algorithms to work with all sync technologies
    (VSYNC ON, VSYNC OFF, DWM, triple buffer, AMD Enhanced Sync, NVIDIA Fast Sync, FreeSync, G-SYNC, VESA Adaptive-Sync, BFI, etc), making it much easier to combine them (e.g. BFI during VRR)
  • Allows easier future addition of new algorithms
    (e.g. rolling BFI CRT emulators, or lagless VSYNC beam racing), with little or no modification to emulator rendering
  • More future proof

Largely a Streamlining of Existing Workflow

The existing present call would be replaced by a wrapper that passes the frame to a frame presentation thread. The presentation thread will time the presentation itself.

Some of the workflow already exists, it just needs to be re-jigged into an official unified workflow with capability of improved precision.

  • "Rendering Thread" = Thread that runs the emulator and generates the emulator frames;
  • "Presenting Thread" = Thread that is now permanently responsible for frame presentation;
  • "Present Wrapper" = This replaces the existing frame present method (e.g. glxxSwapBuffers() or Present() or whatever platform API is used to pass frame to the graphics drivers). So that Rendering Thread can transfer (or copy) frame to the Presenting Thread

Suggested Stage 1 Workflow

  1. Presenting Thread only purpose is presenting frames (no rendering)
  2. Presenting Thread is always higher priority than rendering thread, for purpose of timing precision. Most of the time, presenting thread uses 0% CPU since it's just timing pre-rendered frames, so high precision becomes harmless to Rendering Thread
  3. Presenting Thread can optionally be forced to present immediately so it is backwards compatible with existing present workflow (this can ease iterative development) or for platforms not stable with separate-thread presenting (Fortunately, I don't think there's are any left).
  4. Rendering Thread should do all rendering, including CRT filters
  5. Present Wrapper inside the Rendering Thread can be a good way to hide/centralize all the final processing. Such as adding CRT filters, or rendering a whole sequence of BFI framebuffers (low emulator Hz on high real-display Hz). This hides the implementation details of many refreshing algorithms, and makes cross-platform easier.
  6. Presenting Thread will make sure that the time intervals between consecutive presents are as exact as possible. When this is achieved, the algorithm suddenly become universal (works with all sync technologies).
  7. Present Wrapper can still emulate the behavior of a 60Hz VSYNC ON waitable swapchain (regardless of whether underlying hardware is doing VSYNC ON or VSYNC OFF or VRR or BFI or whatever) by waiting for a heartbeat from the Present Thread

Metaphorically, this workflow is a metaphorically software-based VSYNC ON emulator, hiding the quirks of GPU drivers or destination displays away from emulator rendering. While simultaneously improving user-friendliness (things just works automatically upon startup) and making things less buggy (no VRR stutters, no BFI flicker) and future proofing (even BFI made VRR compatible, hardware-based beamrace, software-based beamrace, CRT beam emulators, not-yet-invented display algorithms).

In a 60Hz VSYNC ON scenario, this is just defacto passthrough behavior (Present Thread will immediately present), while allowing one framepacing algorithm to work with ALL sync technologies more reliably. And it adds no extra workflow lag.

Don't worry about BFI for now (libretro/RetroArch#10757 and/or libretro/RetroArch#10754), don't worry about beamracing for now (libretro/RetroArch#6984); those are solvable in future (e.g. wrappers for PresentScanLine() can be added later to pass one pixel row between Rendering Thread to the Present Thread, as an example). For now, just focus on generic crossplatform full-frame workflow.

Easy Debugging Tip for 60Hz-Only Developers: VSYNC OFF

Testing without VRR can be done via 60Hz VSYNC OFF while using CPU-heavy emulation/emulation settings. Use 60Hz VSYNC OFF, and use tearline jitter as a timing-precision debugger. If the tearline erratically moves or jitters/vibrates massively, your present timing is not "best-effort microsecond-accurate". If the tearline is stationary or rolls slowly up/down, your present timing is nearly microsecond-accurate.

1080p 60Hz is a horizontal scanrate of 67.5 kilohertz (approx 67500 pixel rows per second, including VBI). So a 1/67500th second delay moves a VSYNC OFF tearline downwards by 1 pixel. Modern displays still scan from top-to-bottom (high speed videos) and VSYNC OFF tearlines are a raster artifact.

So if your tearline is vibrating by 50 pixels up/down, that means you've got a 50/67500th second imprecision in your Present() or glxxSwapBuffers() timing. Thusly, VSYNC OFF 60Hz is an excellent timing debugger, since VSYNC OFF tearline is a real-display raster where the new real GPU framebuffer splices into the destination display's scanout position. Run a horizontal-panning videogame (such as a platformer) to find the tearline.

If you have a high-Hz display, you can also test 60fps at VSYNC OFF 120Hz or VSYNC OFF 240Hz for more sensitive timing-precision debugging (1/135000th second for a 120Hz tearline moving downwards by 1 pixel, for example).

When you succeed in generating a stable VSYNC OFF tearline, it automatically translates to VRR users get amazing framepacing, and BFI users getting artifactless flicker-free operation (even if you never test VRR or BFI) Thus, use 60 Hz VSYNC OFF as a clever easy debugger for frame-present timing precision if you don't have 144Hz or VRR or BFI!

@mdrejhon
Copy link
Author

mdrejhon commented Dec 1, 2020

Equivalence: High Speed Video of a CRT Tube Played Back On High-Hz Display

(Display Hz Matching the framerate of High Speed Video)

This is one conceptual way to more easily understand this github item:

A high speed video of a CRT tube. You see a rolling bar in those, with blurry edges. This is seen in many YouTube -- you see phosphor trailing behind. Now, adjust the dynamic range so that video is not overexposed, then you see "frame slices" of the CRT appearing in specific frames of high speed video of a CRT. Now, guess what:

Did you know that playback of a 1000fps high-speed video of a CRT tube -- in real time onto a true 1000Hz display -- makes that 1000Hz display to perceptually emulate the temporals of the original CRT?

(zero motion blur effect, rolling scan effect, phosphor decay effect, etc)

The "Temporal HLSL" concept, aims to emulate that behavior in software. For best emulation perfection requires retina resolution (spatial HLSL) + retina refresh rate (temporal HLSL) + retina HDR (to keep it beam-emulation bright).

A display that combines all of this will take some time to arrive, but -- this encompasses the venn diagram of capturing the look-feel of a CRT tube (at flat tubes).

(plus a slight amount of edge-alphablending for overlaps to prevent tearing artifacts from appearing, especially for the sharp bottom edge of the rolling-scan bar as seen in high speed videos.)

Now you understand what MAME Temporal HLSL aims to achieve. It's a software simulation of this concept.

@mdrejhon
Copy link
Author

mdrejhon commented Jan 20, 2021

Followup to my last message about educational DIY:

Re: Equivalence: High Speed Video of a CRT Tube Played Back On High-Hz Display

How to DIY Test Out This Concept

I have received word that successful tests were done at lower frame rates (240fps studio-quality camera videos of a CRT played back on a true-240Hz LCD) and it scales pretty remarkably well! This test was done using a 360-degree shutter (1/240sec) at 240fps, then the ISO of the individual frames were adjusted so that the CRT's electron beam dynamic range fit into the individual frames (very hard since the CRT beam is really bright). Then once the video file was saved, instructions at UltraHFR FAQ was done to allow the formerly slo-mo video file to playback in realtime 240fps onto a 240 Hz LCD.....

I'll see if I can get the video file, but anybody with a good manually-adjustable 240fps-capable camera can do this. Some are cheap; the GoPro HERO 8 works for this test, you need to dial your ISO correctly in GoPro's ProTune, while fixing the shutter speed to same as frame rate (to mimic 360-degree shutter) so that all photons emitted by the CRT is being integrated into the frames of the digital high speed video file.

Make sure ISO is dialed correctly to the point where per-frame overexposure disappears. If even at minimum ISO setting, and it's still overexposed (CRT bar is overexposed white in high speed video), lower your CRT's brightness setting until no longer overexposed. Do NOT test using a shutter speed faster than one frame. (The shutter need to be as continually open as possible)

Film some fast-panning retro game game such as SEGA Sonic Hedgehog (Genesis), running on an actual CRT tube, for best effect.

Be warned, most smartphone cameras aren't good enough, but the latest 2020 camera sensors might work if you download a manual-adjustment app that can force fixed exposure and fixed ISO per high speed video camera frames -- currently most smartphone APIs don't let you do this, so it's harder with a smartphone than with dedicated cameras (at the moment)

You may need a little Adobe Premiere or a Brightnes/Contrast/Saturation adjustment in a player (such as VLC) to compensate for any underexposure of dark colors, if using a cheaper 240fps camera and trying to fit the dynamic range of CRT into a high speed video (so that the beam isn't overexposed white lines in the high speed video)

Admittedly this is getting slightly offtopic; but this science/research is being added here to show that the Temporal HLSL concept works and is a valid algorithm.

General Rules of Thumb

The higher the frame rate (and refresh rate) far beyond emulator refresh rate -- the more accurately the digital flat panel temporally emulates a CRT, proving the vision science/physics works (confirming that Temporal HLSL a viable future temporal upgrade to the existing spatial HLSL).

One caveat is that the black periods does diminish lumens. A 400-nit 240Hz LCD falls to about 100-nit with 4-segment 60 Hz CRT emulation onto a 240Hz IPS LCD. Still brighter than many CRTs at non-overdriven settings.

Try to do tests on 240 Hz IPS, don't test a 240Hz TN LCD; it looks like crap (chessboard texture inversion artifacts). Recently, IPS screens are now getting more popular than TN screens in high refresh rates (the highest Hz screens are now IPS -- the current 360Hz LCDs are IPS)

A HDR 240Hz display is coming out later this year (or early 2022), from what I heard. The marriage of high-Hz and HDR should be very helpful for a temporally-accurate flat panel retro simulation of a CRT tube.

That said, future HDR high-Hz can be a workaround to emulate the bright CRT electron beam. Maximum HDR peaking is used only for a small portion of the frame (e.g. 10%) but rolling bar is only a small portion of a refresh cycle's frame, so that actually is workable. So HDR engineering (future HDR high-Hz displays) can help helps bring the necessary lumens surge to compensate for the blackness.

(Note: This DIY is simply an educational process, to prove validity of Temporal HLSL)

@angelosa
Copy link
Member

angelosa commented Feb 7, 2021

P.S. I am open to moving this discussion to a different forum, or keeping this as an incubator github thread.

First off: there's no such thing as an incubator github thread. This isn't a forum but a github issue where you [the user] want something out of MAME so that it benefit [to yourself and the MAME community].

Then I have a question about all this roadmap/mailinglist-like TL;DR: how much all this technical mumbo jumbo applies to MAME at feature request level? I read stuff that isn't even released atm (Asus 1000Hz monitors) or stuff that isn't emulated or interfaced to (any flavor of GoPro). Is there anything that can be pinned to your original first message as an index of brief feature requests in order to achieve this goal?

For now I'm gonna spike this out, as current things stands I'm not even sure it belongs to bgfx or ui/ux or just wontfix/invalid because it belongs to a very specific HLSL configuration that definitely won't fit in default scenario to the vast majority of users until 2060 or thereabouts.

@angelosa angelosa added the spike Discussion based issues with no clear "close" condition label Feb 7, 2021
@mdrejhon
Copy link
Author

mdrejhon commented Mar 2, 2021

P.S. I am open to moving this discussion to a different forum, or keeping this as an incubator github thread.

First off: there's no such thing as an incubator github thread. This isn't a forum but a github issue where you [the user] want something out of MAME so that it benefit [to yourself and the MAME community].

OK. Fair spike. Let me know if the below removes the spike.

Here's the short TL;DR summary:

TL;DR: Enhance existing MAME HLSL to include sub-refresh behaviors, including simulating CRT phosphor fade in sub-refresh increments + optional rolling-scan BFI in sub-refresh increments. All achieved via software, with the only hardware requirement being a high-Hz display, even 120Hz or 240Hz. Made possible simply by a display having a higher Hz than the original emulated Hz.

Bonus optional side effect 1: Reduce input lag. Since it's a rolling scan emulation via subdividing emulator Hz over multiple destination Hz, there is a simple cross-platform beam-race opportunity to reduce MAME lag even closer to 1:1 symmetry to original machine. For example, 240Hz would use 4 hardware refresh cycles to emulate a 60Hz emulator refresh cycle in quarters at a time. No raster knowledge of the destination display is needed, you're simply using the finer granularity of the higher output Hz.

Bonus optional side effect 2: Real-time standards conversion. Smooth 50fps on 60Hz displays! The same algorithm I propose, accidentally de-stutters frame rate conversions too, so can be used for standards conversion (e.g. ANIMATION: Software algorithm that makes 50fps look smooth without stutters on a 60hz non-VRR display). So this algorithm is accidentally useful on today's 60Hz displays too. Output Hz actually can also be lower than emulator Hz as the algorithm can also blend refresh cycles together as part of the flexible Hz-agnostic phosphor fade algorithm capable of overlapping refresh cycles, similiar to what I successfully did in the link above. Allowing smooth 60fps on 50Hz displays too!

I edited the first post with the TL;DR

Understood. I'll edit the top with a TL;DR a bit later -- I understand that this is a bit "complex" reading -- keep this issue around because we're under active discussion in other communities, since it likely will be incubated elsewhere before it arrives here.

I talk to many people elsewhere about this actually, as a hobby passion. Such as one of the co-authors of the 8088mph demoscene demo with 1024 colors on a 1981 CGA graphics card -- discussions of a software-based CRT electron beam emulator is in the comments section of his blog. Also, there's a BountySource of approx 500 dollars at RetroArch on a software-based CRT electron-beam-drawing emulator (using brute Hz to temporally emulate CRT electron beam in real time).

This Algorithm Begins To Be Practical at 240Hz+, and potentially 120Hz+

BTW, 1000Hz is just an end goal. The reality is this MAME HLSL algorithm begins to be useful beginning at ~240Hz, and those are getting more widespread. And there is news that Apple iPhone is coming out with a 240Hz iPhone, since 240Hz reduces scrolling motion blur.

120Hz is quickly becoming semi-mainstream (almost all new 4K HDTVs support 120Hz, all new gaming consoles now support 120Hz, new smartphones will soon come 120Hz standard, and DELL/HP is considering later this decade doubling Hz for office monitors for scrolling-ergonomic reasons). 120Hz then becomes near-freebie like retina screens.

After that, 240Hz is the next freebie mainstream step. This will definitely happen far before 2060.

Even though 240 Hz is very low granularity for emulating a moving CRT electron beam, it represents the entry of successful realistic-looking rolling scan. Where a 1/240sec frame can (via GPU shader) photon-stack a quarter of a refresh cycle's worth of 60Hz CRT electron beam movements/subrefresh phosphor decay,

Also, I am creating a new TestUFO software-based CRT emulator for 240Hz monitors (simulate rolling scan in web browser), so there will be a concept show-and-tell visible one click away later in 2021.

If the algorithm also is written carefully, even 120Hz+ will derive some useful benefit from this Temporal HLSL suggestion, though accuracy of the temporal emulation looks more and more accurate the higher the refresh rate.

This Algorithm also has /some/ use for below-240Hz

In a case of "one enhancement hits three or four birds with one stone", the same algorithm designed to map
"anyEmuHz"-to-"anyRealHz", can do standards-conversion destuttering too. The algorithm that makes this possible also can do a de-stuttered framerate conversion (60fps looking smooth at 50Hz, and 50fps looking smooth at 60Hz).

@mdrejhon mdrejhon changed the title Plan Towards Temporal HLSL (Processing HLSL at Sub-Refresh Time Scales) Enhance MAME HLSL To Do Sub-Refresh Processing On High-Hz Displays (Enhances One or More Of: CRT emulation, Reduces Blur, Lowers Lag, De-Stuttering) Mar 2, 2021
@mdrejhon mdrejhon changed the title Enhance MAME HLSL To Do Sub-Refresh Processing On High-Hz Displays (Enhances One or More Of: CRT emulation, Reduces Blur, Lowers Lag, De-Stuttering) Enhance MAME HLSL To Do Sub-Refresh Processing On High-Hz Displays (CRT emulation, Reduces Blur, Lowers Lag, De-Stuttering) Mar 2, 2021
@mdrejhon
Copy link
Author

mdrejhon commented Mar 2, 2021

Update:
In addition to adding a TL;DR at the top, I have renamed this github to the more self-explanatory name:

"Enhance MAME HLSL To Do Sub-Refresh Processing On High-Hz Displays (CRT emulation, Reduces Blur, Lowers Lag, De-Stuttering)"

Glossary: "Sub-Refresh Processing" = ability to process one emulator Hz over multiple output Hz (i.e. 60Hz emulator on 120Hz or 240Hz display).

Be noted, depending on how fully the algorithm is correctly implemented (and what output refresh rate you're using) only 2 or 3 out of the 4 benefits may occur. However, even just 1 benefit merits the existence of this enhancement, even for non-high-Hz displays. Making sure the same algorithm has all 4 benefits, can be incremental HLSL engine improvements over time, however.

However, the bottom line is that all four benefits appears when HLSL gains support for output-Hz-granularity processing rather than emulator-Hz-granularity processing, even for non-divisible Hz. But, even just 1 or 2 of the benefits merits worthiness of this item even for mainstream Hz displays (50fps PAL on 60Hz displays) as well as the in-progress commoditization of 120Hz.

Technically I could post several seprate github issues for all the pros -- to indicate the potentially incremental nature of this feature -- but decided to only submit one massive github item about this.

Let me know if the new title and the new TL;DR is sufficient to remove the spike (yet).

@stilett0
Copy link
Contributor

stilett0 commented Apr 7, 2021

This isn't a Pull Request. You used ten thousand words when ten would do. If it's a Pull Request, then there needs to be Code. Where's your code, @mdrejhon ? You don't have code, you at best have an algorithm. But really all you have is ten thousand words running off at the mouth. Start writing code, stop with the theorizing. It all sounds quasi-scientific, sure, but it's all a smokescreen to try to instigate someone else to do the actual work. It's all conceptual, nothing practical. And the fact that you haven't actually contributed written code here in this PR makes me suspect that you can't.

Between THAT, and the fact that some folks on Twitter conflated this "Pull Request" as something that MAMEdev as a team themselves had "officially" added as a project roadmap bulletpoint... (January: https://twitter.com/_daemons/status/1351450119112048640 ) I'm closing this. You don't get to slide your personal Feature Request agenda items in through the back door that way. Frankly, it's unwanted attention-seeking behavior.

This isn't a Pull Request, this is a Feature Request. At best, it belongs in Issues, not Pull Requests.

And I don't look kindly upon a Feature Request to try to pre-emptively add support for hardware, that practically doesn't exist yet, and that won't be mainstream absorption for quite some years to come. This does nothing for us. You're just being Master of the Obvious here. We know there will be gamechanging high-refresh displays coming out or out already, we don't need reminders.

@stilett0 stilett0 closed this as completed Apr 7, 2021
@happppp
Copy link
Member

happppp commented Apr 8, 2021

@stilett0 wrongly claims this is a Pull Request. Maybe he can clarify later.

IMO this issue can remain closed for now. ultra-Hz displays don't exist yet (for the consumer)
Likewise, we don't have an Issue about quantum computers enhancing accuracy for emulated netlist games.

@mdrejhon
Copy link
Author

mdrejhon commented Apr 15, 2021

In all friendliness,

I am mentioned in more than 20 peer reviewed research papers, so I have quite a fair bit of credentials.

  1. This is a feature request, not a pull request.

  2. True 120Hz and 240Hz displays exist already, and there is already a 360 Hz display. I simply extrapolated this to far future.

I would like to re-post the TL;DR to make sure there is no misunderstanding;

Here's the short TL;DR summary:
TL;DR: Enhance existing MAME HLSL to include sub-refresh behaviors, including simulating CRT phosphor fade in sub-refresh increments + optional rolling-scan BFI in sub-refresh increments. All achieved via software, with the only hardware requirement being a high-Hz display, even 120Hz or 240Hz. Made possible simply by a display having a higher Hz than the original emulated Hz

Bonus optional side effect 1: Reduce input lag. Since it's a rolling scan emulation via subdividing emulator Hz over multiple destination Hz, there is a simple cross-platform beam-race opportunity to reduce MAME lag even closer to 1:1 symmetry to original machine. For example, 240Hz would use 4 hardware refresh cycles to emulate a 60Hz emulator refresh cycle in quarters at a time. No raster knowledge of the destination display is needed, you're simply using the finer granularity of the higher output Hz.

Bonus optional side effect 2: Real-time standards conversion. Smooth 50fps on 60Hz displays! The same algorithm I propose, accidentally de-stutters frame rate conversions too, so can be used for standards conversion (e.g. ANIMATION: Software algorithm that makes 50fps look smooth without stutters on a 60hz non-VRR display). So this algorithm is accidentally useful on today's 60Hz displays too. Output Hz actually can also be lower than emulator Hz as the algorithm can also blend refresh cycles together as part of the flexible Hz-agnostic phosphor fade algorithm capable of overlapping refresh cycles, similiar to what I successfully did in the link above. Allowing smooth 60fps on 50Hz displays too!

All of the benefits listed above occurs on displays that are in use by millions today (120Hz and 240Hz displays).

@cuavas
Copy link
Member

cuavas commented Apr 15, 2021

This is a feature request, not a pull request.

It’s a pie-in-the-sky dream.

@mamedev mamedev locked as off-topic and limited conversation to collaborators Apr 15, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
spike Discussion based issues with no clear "close" condition
Projects
None yet
Development

No branches or pull requests

5 participants