v2.3.0 - Community Builds Thread #88
Replies: 9 comments 67 replies
-
If anyone wants to experiment with speeding up encoding with -ffast-math ... it sets -fno-honor-nans and generates a nan warning in enc_dec_process.c. This can be cirumenvented by using -Xclang -funsafe-math-optimizations which still sets a lot of fp optimizations, but not finite math only. I'm using Visual Studio with LLVM 19.1.3 https://releases.llvm.org/19.1.0/tools/clang/docs/UsersManual.html I still have no idea if this speed bump produces visible artifacts, or what other non-intended consequences there could be. Works just fine for me so far though. |
Beta Was this translation helpful? Give feedback.
-
I have made this from my smartphone while using my pc on remote, so if I made something wrong and anything doesn't work pls comment below. The zip includes the generic builds:
The zip is a Matryoshka cause github doesnt support 7z A PGO optimized binary for znver4 is below. Everyone is especially invited to contribuite about PGO and BOLT on Windows!! (several threads and comments exist on the Community Builds Thread for v2.2.1-A linked above) |
Beta Was this translation helpful? Give feedback.
-
I have made this from my smartphone while using my pc on remote, so if I made something wrong and anything doesn't work pls comment below. Znver4 optimized with PGO with Stefan Edberg clip
SvtAv1EncApp znver4 with PGO.zip On my R9 7900 it gives ~5% speed gains in presets that are not too low like p-1 (didn't test p0 nor p1; p2 and 4 are ~5% faster); if it crashes on u, pls comment below about it and use the non-pgo optimized binary |
Beta Was this translation helpful? Give feedback.
-
Full LTO build for Termux |
Beta Was this translation helpful? Give feedback.
-
I'm not very experienced with building/tuning stuff, but I finally managed to make a clang 17 znver2 -O2 LTO+PGO with Dolby Vision support build that was "trained" on 8 clips of various resolutions (from 540p to 2160p), that include Dolby Vision, regular HDR, and just general 8bit-to-10bit converted clips, with |
Beta Was this translation helpful? Give feedback.
-
I reused mpv-winbuild to provide a fully featured (with libdovi and libhdr10plus) statically linked Windows daily build of svtav1-psy. This build is cross-compiled on GHA using trunk clang (20.0.0git) and mingw-w64, with ThinLTO+IRPGO+Polly enabled and mimalloc integrated. Currently available in x86-64-v3 and cortex-a76 (arm64). PGO training is done via ffmpeg and is fairly arbitrary, so it may not provide performance comparable to other PGO builds. |
Beta Was this translation helpful? Give feedback.
-
Just for fun and giggles, as an additional benchmark: On my 6-core Zen4 system my own znver4 build (vs w/ llvm 19, pgo-cs) a test clip is 4.7 fps, your latest mpv-winbuild x86_64-v3 3.4 fps. For a more accurate comparison we'd have to use the same clip for pgo profiling and benchmarking. However, optimizing svt-av1 for a specific system seems to be worth it - though I'd rather like a generic msys2 auto-build to save me the time for compiling. |
Beta Was this translation helpful? Give feedback.
-
The results are IN :-) I've benchmarked all presets 2-9 for all IR'CS pgo'ed exe with presets 2-9 - I'm not using slower presets on my system. To somewhat account for jitter on my thermal managed laptop, the fps are averages of several encodings (by scripting to grep "Average Speed:" and divide by # of cycles). I'm sure there's still plenty of jitter left, I can't help it. It's using a git build between 2.3.0 and 2.3.0-A, before the commits that introduduced frequent crashes even on 10-bit. All builds use mimalloc. For reference, I've added the cross-compiled -psy exe from the github mpv-winbuild. My findings:
SvtAv1EncApp-PSY-znver4-tigerlake.zip ... preset 9 ... preset 8 ... preset 7 ... preset 6 ... preset 5 ... preset 4 ... preset 3 ... preset 2 |
Beta Was this translation helpful? Give feedback.
-
Unapologetically stole changes from mainline for film grain generation speed improvements: https://gitlab.com/AOMediaCodec/SVT-AV1/-/commit/dcd08a9eaa0f1248c1d6abb4535932e7e9ba76d0 Multiple marches, PGO, LTO, -O3, mimalloc, DoVi, HDR10+: Also, included znver1-5 builds with Upd: Just tested This makes me think that film grain generation is now a viable option for the whole range of presets, since faster presets are no longer held back by a massive overhead. Upd 2: Tested |
Beta Was this translation helpful? Give feedback.
-
Community Builds Thread
This is a place for the community to share unofficial tools not affiliated with the project — mainly consisting of binaries compiled by community members.
Trust
Architecture
When downloading pre-compiled binaries, you might see AVX, AVX2, AVX-512, x86-64-v3, etc. If you don't know exactly what ISA extensions your CPU supports, here is a chart to help you quickly understand your hardware's support:
Not Vendor-Specific
ISA extensions like AVX, AVX2, & AVX512 are for vector processing which SVT-AV1-PSY relies heavily on for its fast multithreaded performance. Below are some helpful charts to help you narrow down the best options if you have popular x64 hardware:
AMD
Intel (Desktop)
Included below is even more information about what is available when specifying
-march
&-mtune
on x64 CPUs:Known valid x64 arguments for
-march=
:Known valid x64 arguments for
-mtune=
:-march=foo
implies-mtune=foo
unless you also specify a different-mtune
. This is one reason why using-march
is better than just enabling options like-mavx
without doing anything about tuning.Antivirus
🛑 Be wary of antivirus software on Windows detecting EXEs distributed here as malicious software. While they may not always legitimately be malicious, it is important to maintain a healthy level of skepticism when running code that someone else has compiled.
Beta Was this translation helpful? Give feedback.
All reactions