v2.3.0 - Community Builds Thread #88

gianni-rosato · 2024-10-30T04:15:33Z

gianni-rosato
Oct 30, 2024
Maintainer

Community Builds Thread

This is a place for the community to share unofficial tools not affiliated with the project — mainly consisting of binaries compiled by community members.

Trust

⚠️ Community-built binaries are not affiliated with the SVT-AV1-PSY project & are to be used at your own risk. If you encounter bugs while using a community-built binary, ensure that you double-check with the person responsible for providing you with the binary before submitting an issue report.

Architecture

When downloading pre-compiled binaries, you might see AVX, AVX2, AVX-512, x86-64-v3, etc. If you don't know exactly what ISA extensions your CPU supports, here is a chart to help you quickly understand your hardware's support:

Not Vendor-Specific

x86-64    : Processors lacking SSE4.2 and newer features
x86-64-v2 : Processors with features up to and including SSE4.2
x86-64-v3 : Processors with features up to and including AVX and/or AVX2
x86-64-v4 : Processors with features up to and including AVX512

ISA extensions like AVX, AVX2, & AVX512 are for vector processing which SVT-AV1-PSY relies heavily on for its fast multithreaded performance. Below are some helpful charts to help you narrow down the best options if you have popular x64 hardware:

AMD

Generation                                             Vectors  Build
Zen   (1xxx)                                           AVX2     x86-64-v3 / znver1
Zen+  (1xxxAF-2xxx-3xxxG/H/U/C)                        AVX2     x86-64-v3 / znver1
Zen2  (3xxx-4xxxS/G/H/U-5700U-5500U-5300U-7520U-7320U) AVX2     x86-64-v3 / znver2
Zen3  (5xxx-7330U-7530U-7730U)                         AVX2     x86-64-v3 / znver3
Zen3+ (6xxx-7xxxH/U)                                   AVX2     x86-64-v3 / znver3
Zen4  (7xxx-8xxxG)                                     AVX512   x86-64-v4 / znver4

Intel (Desktop)

Generation                Vectors  Build
Haswell     (4th gen)     AVX2     x86-64-v3 / haswell
Broadwell   (5th gen)     AVX2     x86-64-v3 / broadwell
Skylake     (6th gen)     AVX2     x86-64-v3 / skylake
Kaby Lake   (7th gen)     AVX2     x86-64-v3 / skylake
Coffee Lake (8-9th gen)   AVX2     x86-64-v3 / skylake
Comet Lake  (10th gen)    AVX2     x86-64-v3 / skylake
Rocket Lake (11th gen)    AVX512   x86-64-v4 / rocketlake
Alder Lake  (12th gen)    AVX2     x86-64-v3 / alderlake
Raptor Lake (13-14th gen) AVX2     x86-64-v3 / raptorlake

Included below is even more information about what is available when specifying -march & -mtune on x64 CPUs:

Known valid x64 arguments for -march=:

i386 i486 i586 pentium lakemont pentium-mmx winchip-c6 winchip2 c3 samuel-2 c3-2 nehemiah c7 esther i686 pentiumpro pentium2 pentium3 pentium3m pentium-m pentium4 pentium4m prescott nocona core2 nehalem corei7 westmere sandybridge corei7-avx ivybridge core-avx-i haswell core-avx2 broadwell skylake skylake-avx512 cannonlake icelake-client rocketlake icelake-server cascadelake tigerlake cooperlake sapphirerapids emeraldrapids alderlake raptorlake meteorlake graniterapids graniterapids-d arrowlake arrowlake-s lunarlake pantherlake bonnell atom silvermont slm goldmont goldmont-plus tremont gracemont sierraforest grandridge clearwaterforest knl knm intel geode k6 k6-2 k6-3 athlon athlon-tbird athlon-4 athlon-xp athlon-mp x86-64 x86-64-v2 x86-64-v3 x86-64-v4 eden-x2 nano nano-1000 nano-2000 nano-3000 nano-x2 eden-x4 nano-x4 lujiazui yongfeng k8 k8-sse3 opteron opteron-sse3 athlon64 athlon64-sse3 athlon-fx amdfam10 barcelona bdver1 bdver2 bdver3 bdver4 znver1 znver2 znver3 znver4 znver5 btver1 btver2 generic native

Known valid x64 arguments for -mtune=:

generic i386 i486 pentium lakemont pentiumpro pentium4 nocona core2 nehalem sandybridge haswell bonnell silvermont goldmont goldmont-plus tremont sierraforest grandridge clearwaterforest knl knm skylake skylake-avx512 cannonlake icelake-client icelake-server cascadelake tigerlake cooperlake sapphirerapids alderlake rocketlake graniterapids graniterapids-d arrowlake arrowlake-s pantherlake intel lujiazui yongfeng geode k6 athlon k8 amdfam10 bdver1 bdver2 bdver3 bdver4 btver1 btver2 znver1 znver2 znver3 znver4 znver5

Note: -march=foo implies -mtune=foo unless you also specify a different -mtune. This is one reason why using -march is better than just enabling options like -mavx without doing anything about tuning.

Antivirus

🛑 Be wary of antivirus software on Windows detecting EXEs distributed here as malicious software. While they may not always legitimately be malicious, it is important to maintain a healthy level of skepticism when running code that someone else has compiled.

gitoss · 2024-10-30T18:00:32Z

gitoss
Oct 30, 2024

If anyone wants to experiment with speeding up encoding with -ffast-math ... it sets -fno-honor-nans and generates a nan warning in enc_dec_process.c.

This can be cirumenvented by using -Xclang -funsafe-math-optimizations which still sets a lot of fp optimizations, but not finite math only. I'm using Visual Studio with LLVM 19.1.3 https://releases.llvm.org/19.1.0/tools/clang/docs/UsersManual.html

I still have no idea if this speed bump produces visible artifacts, or what other non-intended consequences there could be. Works just fine for me so far though.

0 replies

ItachiUchiha-IU · 2024-10-30T22:00:18Z

ItachiUchiha-IU
Oct 30, 2024

I have made this from my smartphone while using my pc on remote, so if I made something wrong and anything doesn't work pls comment below.
Mainline changed a default cmake flag so we had to turn it off because it would've overwritten the default LTO setting (which is fat) and built with LTO thin which is slower.

The zip includes the generic builds: x86-64, x86-64-v2, x86-64-v3, x86-64-v4
and the march specific builds: znver1, znver2, znver4, haswell, broadwell, skylake, rocketlake, alderlake, raptorlake

cmake --fresh -B svt_build -T ClangCL -DBUILD_SHARED_LIBS=OFF -DSVT_AV1_LTO=OFF -DENABLE_AVX512=$ONOFF -DCMAKE_CXX_FLAGS_RELEASE="/DNDEBUG /clang:-O2 -flto -march=$allmarch" -DCMAKE_C_FLAGS_RELEASE="/DNDEBUG /clang:-O2 -flto -march=$allmarch"

The zip is a Matryoshka cause github doesnt support 7z
SvtAv1EncApp-windows-all-march-but-znver3.zip
For more infos, findings and discussions about the builds (the znver3 case, the parameters used and tested, w10-11, clang 17-19, ninja, PGO, BOLT), check the Community Builds Thread for v2.2.1-A

A PGO optimized binary for znver4 is below.
For a fast info on my R9 7900 it gives ~5% speed gains in presets that are not too low like p-1 (didn't test p0 nor p1; p2 and 4 are ~5% faster); if it crashes on u, pls comment below about it and use the non-pgo optimized binary

Everyone is especially invited to contribuite about PGO and BOLT on Windows!! (several threads and comments exist on the Community Builds Thread for v2.2.1-A linked above)

22 replies

ItachiUchiha-IU Dec 26, 2024

@ItachiUchiha-IU Do you plan for v2.3.0-A build?

I was planning to release them in the community build for v2.3.0-A but, being that it's not up yet, ~~I'll start putting them here.~~

They are now released here with all infos: #118 (comment)

But there's some informations that need to be given, because this new version ends up in non-stable binaries. (fast info: it's strongly advised to use av1an to keep the encoding running even if a chunk crashes)
Coincidentally I'm away from home during this new version too, so I'm posting them from phone in remote again, and later (this afternoon/evening EU) I'll update this message (and reply to gitoss too).

~~OK here's the infos we've gathered so far:~~
1) v2.3.0-A is slower than the previous version, especially if not using --psy-rd (or using very low values); as confirmed with the "PSY guys" (the devs that work on this fork), this is the expected behavior as the --psy-rd code runs and it's called quite often; thus, with it off or with such low values, from tests made on different machines on both linux and windows, it can be slower by 10-20%
- "solution" : use --psy-rd 0.1 or higher, and the speed difference will go down a lot (in some of my tests on windows I got down to 2-3% speed diff, while on linux it seems to be working a little better in some cases)

2) the binaries for v2.3.0-A built on windows crash; the cross-compiled binary doesn't seem to crash but it's quite slower than the windows builds: some got an 8% speed diff, some way more (I can't test myself if it crashes or what speed diff I get cause of the reasons mentioned above)
~~- "solution" : use av1an. av1an splits the encode per scene before encoding them, and if a chunk crashes, it simply re-encodes that single chunk and the encoding continues without any major problem~~
- the psy guys already know about the binaries crashing on windows cause the only ones reporting these crashes are the ones of us that are on windows. we have already started looking into it together with them but it could still end up being something not easy to fix, so it may be something that will get worked on tog with mainline after v3 is released.

~~3) I don't remember if I said everything I wanted to say, so if something else comes to mind later, or something else comes out, I'll edit it and/or add points.~~

svt-av1-psy_2024-12-25_20.29.49_MULTI-MARCH_clang19.1.5_2.zip
svt-av1-psy_2024-12-25_20.42.02_MULTI-MARCH_av1an-pgo_clang19.1.5.zip
svt-av1-psy_2024-12-25_21.08.08_MULTI-MARCH_dovi+hdr10plus_clang19.1.5_2.zip
svt-av1-psy_2024-12-25_23.13.06_MULTI-MARCH_dovi+hdr10plus_av1an-pgo_clang19.1.5_2.zip

igorbaryshev Dec 26, 2024

some speed regression in w11's 24H2 update

This one is probably due to Memory integrity feature being on by default. I don't recall it being enabled in prior versions, and disabling it netted almost 2% performance gain.

ItachiUchiha-IU Dec 26, 2024

I did find a way to boost p-1 too

Just for completeness' sake, could you please say how you boosted preset 1, too? Thanks.

For p-1/2/3 we mean the negative presets, thus for p-1 I meant --preset -1.
Basically I tried something random and it worked: I simply did PGO training with a 6 min clip (through av1an) and I got the previously mentioned speed gains in most contents, but the binary kept crashing. So I ended up trying several tricks and in the end I fixed it but merging the p-1 profile with a p10 profile.
The real problem tho is that training is a lot slower than encoding, thus resulting in it being basically useless considered the nearly negligible gain (using w11 and/or --ffastmath have a bigger impact). I even wanted to test on a 24min clip but the training would've been days long and I just gave up.

gitoss Dec 28, 2024

the binaries for v2.3.0-A built on windows crash; the cross-compiled binary doesn't seem to crash but it's quite slower than the windows builds: some got an 8% speed diff, some way more (I can't test myself if it crashes or what speed diff I get cause of the reasons mentioned above)

I am now able to verify this - your native build crashes, Andarwinux' cross-compibled build doesn't, and it's exactly the same with my own native and cross builds.

The crashes cease if --lp is lowered, using a mimalloc-injected exe doesn't help, setting --psy-rd 0 doesn't help.

This is annoying because up to now compiling native pgo builds were just fine, and I have no idea what commit between the 2.3.0 and 2.3.0-A release is the culprit @gianni-rosato

Patman86 Dec 28, 2024

Have a look here:

#114

The commit that causes the crash ist mantioned here.

ItachiUchiha-IU · 2024-10-30T22:12:43Z

ItachiUchiha-IU
Oct 30, 2024

I have made this from my smartphone while using my pc on remote, so if I made something wrong and anything doesn't work pls comment below.
More infos in the comment above.

Znver4 optimized with PGO with Stefan Edberg clip

cmake --fresh -B svt_build -T ClangCL -DBUILD_SHARED_LIBS=OFF -DSVT_AV1_LTO=OFF -DENABLE_AVX512=ON -DCMAKE_CXX_FLAGS_RELEASE="/DNDEBUG /clang:-O2 -flto -march=znver4" -DCMAKE_C_FLAGS_RELEASE="/DNDEBUG /clang:-O2 -flto -march=znver4"

SvtAv1EncApp znver4 with PGO.zip

On my R9 7900 it gives ~5% speed gains in presets that are not too low like p-1 (didn't test p0 nor p1; p2 and 4 are ~5% faster); if it crashes on u, pls comment below about it and use the non-pgo optimized binary

1 reply

gitoss Nov 5, 2024

For Intel cpu, using their 2025.0 compiler could be beneficial: https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-dpc-c-compiler-release-notes.html

"Hardware Profile Guided Optimization (HWPGO): Key improvements include enhanced profile propagation for better accuracy, additional profile-driven optimizations to further boost performance, and early support for "pseudo probes" on Windows as an alternative to DWARF for profiling. Additionally, HWPGO has introduced selective function outlining, allowing for specific functions to be optimized based on profiling data, further enhancing runtime efficiency."

Uranite · 2024-10-31T02:18:28Z

Uranite
Oct 31, 2024

Full LTO build for Termux
Contains arm and aarch64
Pick one according to your arch, and install via dpkg -i, afterwards, you can encode via libsvtav1 in FFmpeg

termux-svt-av1-psy-v2.3.0.zip

0 replies

igorbaryshev · 2024-12-16T11:42:30Z

igorbaryshev
Dec 16, 2024

I'm not very experienced with building/tuning stuff, but I finally managed to make a clang 17 znver2 -O2 LTO+PGO with Dolby Vision support build that was "trained" on 8 clips of various resolutions (from 540p to 2160p), that include Dolby Vision, regular HDR, and just general 8bit-to-10bit converted clips, with preset 6 and added film grain, --dolby-vision-rpu and HDR with other related settings provided by StaxRip automatically where applicable, and some other custom settings that I usually use for encoding. The resulting LTO+PGO build is ~4%+ faster than just LTO.
Unfortunately, it didn't want to compile with HDR10+ support on clang, so I just dropped it 😔as I don't have anything with HDR10+ that needs converting, yet.
To measure performance, I used chunked encoding (2 chunks) in StaxRip for full saturation
LTO-only time 00:06:07 + 00:06:16
LTO+PGO time 00:05:53 + 00:06:00
The resulting build is surprisingly slim, just 5.45 MiB for LTO+PGO, +353 KiB for DoVi.
Test for yourself!
svt-av1-psy_znver2_LTO[+PGO].zip
If there're any tips, suggestions, requests, please chime in 🤗

0 replies

Andarwinux · 2024-12-26T13:37:33Z

Andarwinux
Dec 26, 2024

I reused mpv-winbuild to provide a fully featured (with libdovi and libhdr10plus) statically linked Windows daily build of svtav1-psy.
https://github.com/Andarwinux/mpv-winbuild/releases

This build is cross-compiled on GHA using trunk clang (20.0.0git) and mingw-w64, with ThinLTO+IRPGO+Polly enabled and mimalloc integrated. Currently available in x86-64-v3 and cortex-a76 (arm64).

PGO training is done via ffmpeg and is fairly arbitrary, so it may not provide performance comparable to other PGO builds.

35 replies

Andarwinux Jan 13, 2025

I use presets 2-6 for grain and 2-8 for no grain profiling, and for each have tunes 1-3. The clip I use for profiling is a 55 frames scene from iron man downscaled to 768x320. So even on preset 2 the profiling encode takes maybe 20 seconds at most.

For benchmarking I use preset 4 on a 5-minute 1080p movie clip.

thanks!

@Andarwinux v2.3.0 znver2_mno-sse4a.zip

tigerlake.zip

The znver2 version is 20-30% faster with all presets than my mingw build, it's so crazy. But tigerlake is just slower than my mingw builds.

gitoss Jan 13, 2025

But tigerlake is just slower than my mingw builds.

If you dare to use Visual Studio, you can try Intel's llvm-based compiler that should-could be optimized for Intel targets. https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?packages=cpp-essentials&cpp-essentials-os=windows&cpp-essentials-win=offline

Edit: The Intel compiler includes hardware profiling - I'll have to try that on my tigerlake laptop sometime. https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-dpc-c-compiler-release-notes.html

gitoss Jan 13, 2025

The clip I use for profiling is a 55 frames scene from iron man downscaled to 768x320.

Did you compare yet if these make any difference for encoding performance...

profiling longer clips?
profiling the same resolution (for example full hd) as encoding?

I'm currently using the full hd intro from "Vox Machina" for profiling, as I'm mostly encoding anime later on.

igorbaryshev Jan 13, 2025

@gitoss I refer you to my earlier reply here #118 (reply in thread)
resolution doesn't matter to some extent, and I've tried higher frame count, which gave the same result

Andarwinux Feb 3, 2025

I found some problems from my toolchain, and after fixing them, the performance gap between my build and @igorbaryshev's znver2 build has been reduced to 9% by using a new profdata, which was probably caused by the lack of CSIR (MinGW doesn't support CSIR). Today's latest build already includes some fixes, but the profdata hasn't been updated yet, I'll update it in a few days.

gitoss · 2025-01-13T12:21:39Z

gitoss
Jan 13, 2025

Today's build uses a new profdata and performance should be significantly improved, can you try again?
Unfortunately, your build is 14.96% slower than my own PGO build

Just for fun and giggles, as an additional benchmark: On my 6-core Zen4 system my own znver4 build (vs w/ llvm 19, pgo-cs) a test clip is 4.7 fps, your latest mpv-winbuild x86_64-v3 3.4 fps.

For a more accurate comparison we'd have to use the same clip for pgo profiling and benchmarking.

However, optimizing svt-av1 for a specific system seems to be worth it - though I'd rather like a generic msys2 auto-build to save me the time for compiling.

3 replies

Andarwinux Jan 13, 2025

Single digit fps sounds like some kind of lower preset, I didn't train for that, it's too time consuming.

GitHub Action's cache limit is only 10GB, so I can't add more microarchitectures. Otherwise GitHub would start dropping the LLVM toolchain cache and paralyze the entire workflow.

gitoss Jan 13, 2025

Single digit fps sounds like some kind of lower preset, I didn't train for that, it's too time consuming.

I'm was using --preset 3, for archiving lower presets make a big difference because more encoding tools are enabled. Profiling a short clip even with CS-IR isn't that time-consuming, but I'm not using GitHub actions.

I'll try benchmarking pgo-optimized .exe with fast --preset and encoding with --slow preset and vice versa, I'm interested how preset-specific profiling is. Right now, I'm profiling a single exe for each preset I'm encoding with.

igorbaryshev Jan 13, 2025

My finding is that using the whole 2-6(even up to 8) preset range for profiling for a single build is pretty safe.
I only noticed a small slowdown on my preset 4 benchmarking when I started adding presets 1 and down in addition to 2-6.

gitoss · 2025-01-13T22:46:04Z

gitoss
Jan 13, 2025

The results are IN :-)

I've benchmarked all presets 2-9 for all IR'CS pgo'ed exe with presets 2-9 - I'm not using slower presets on my system.

To somewhat account for jitter on my thermal managed laptop, the fps are averages of several encodings (by scripting to grep "Average Speed:" and divide by # of cycles). I'm sure there's still plenty of jitter left, I can't help it.

It's using a git build between 2.3.0 and 2.3.0-A, before the commits that introduduced frequent crashes even on 10-bit. All builds use mimalloc. For reference, I've added the cross-compiled -psy exe from the github mpv-winbuild. My findings:

Matching the same preset for profiling and encoding is fastest (as should be expected) - but for most presets only marginally. Still, if always using the same preset...
The significant "gap" is between presets 9-5 and 4-2 - based on that, if profiling a single preset it's reasonable to use 2x exe for "fast" (profile preset ~7) and "slow" (profile preset ~3).
For an all-in-one targeting all presets, a dual-profiled exe using presets 7+3 seems to fine. I'm attaching this for the systems I'm using (znver4 and tigerlake).

SvtAv1EncApp-PSY-znver4-tigerlake.zip

... preset 9
(pgo .exe -psy => 53.9 fps)
pgo .exe 7+3 => 62.9 fps
pgo .exe 9 => 63.5 fps

... preset 8
(pgo .exe -psy => 38.6 fps)
pgo .exe 7+3 => 41.6 fps
pgo .exe 8 => 44.9 fps

... preset 7
(pgo .exe -psy => 28.1 fps)
pgo .exe 7+3 => 31.5 fps
pgo .exe 7 => 32.0 fps

... preset 6
(pgo .exe -psy => 20.7 fps)
pgo .exe 7+3 => 24.4 fps
pgo .exe 6 => 24.6 fps

... preset 5
(pgo .exe -psy => 13.7 fps)
pgo .exe 7+3 => 17.0 fps
pgo .exe 5 => 17.2 fps

... preset 4
(pgo .exe -psy => 6.6 fps)
pgo .exe 7+3 => 8.1 fps
pgo .exe 4 => 8.0 fps

... preset 3
(pgo .exe -psy => 3.7 fps)
pgo .exe 7+3 => 4.7 fps
pgo .exe 3 => 4.7 fps

... preset 2
(pgo .exe -psy => 2.2 fps)
pgo .exe 7+3 => 2.4 fps
pgo .exe 2 => 2.5 fps

0 replies

igorbaryshev · 2025-01-21T21:16:12Z

igorbaryshev
Jan 21, 2025

Unapologetically stole changes from mainline for film grain generation speed improvements: https://gitlab.com/AOMediaCodec/SVT-AV1/-/commit/dcd08a9eaa0f1248c1d6abb4535932e7e9ba76d0
This gives ~10% improvement for encodes with film grain generation in my benchmarking with preset 4 at 1080p.

Multiple marches, PGO, LTO, -O3, mimalloc, DoVi, HDR10+:
msvc.v2.3.0+DoVi+HDR10PLUS+mimalloc+O3+flto+PGO.Clang.19.1.7.7z.zip

Also, included znver1-5 builds with -mno-sse4a for those who want to try them on Intel.

Upd: Just tested preset 5 with this change and got a whopping ~40% improvement!
I imagine that faster presets will see even more improvement.

This makes me think that film grain generation is now a viable option for the whole range of presets, since faster presets are no longer held back by a massive overhead.

Upd 2: Tested preset 10, got a ~75% improvement

6 replies

gitoss Feb 1, 2025

I was surprised by how much faster the vanilla version is.

That should't come as a surprise because the -psy stuff takes computation time? Most of -psy pull requests to mainline haven't been merged yet. In the mainline git log, except for neon I don't see anything that would justify a bit performance enhancement vs. ceteris paribus -psy.

Alas, because -psy seems to be semi-abandoned now as a fork and is more of a testing ground, it makes sense to switch back to mainline sooner or later. Esp. considering the crashes with recent -psy and the statement that Windows isn't supported.

Uranite Feb 1, 2025

In pursuit of performance, I now switched to building mainline, also added DoVi + HDR10+ integration to it as well, so not missing out on those features there anymore. I was surprised by how much faster the vanilla version is. If anyone cares, I can share, won't be just posting it here if no one needs it, since that would be off topic. Also changed the version name to include SVT-AV1-PSY next to SVT-AV1 to make StaxRip expose DoVi and HDR10+ options.

How did you test svt-av1-psy? svt-av1 and svt-av1-psy have different default GOP size, which could cause a big performance difference, set --keyint to 5s or 10s when testing. Also, try to use svt-av1-psy defaults when testing svt-av1. Disable features not present in svt-av1. I'm not saying that svt-av1-psy is faster, but that the speed difference should be minimal that the efficiency and fidelity gains from using svt-av1-psy is worth it over svt-av1.

Andarwinux Feb 1, 2025

Is there a diff list of default options for svtav1-psy?

silverbacknet Feb 1, 2025

Is there a diff list of default options for svtav1-psy?

This is what I see now; the second set of options doesn't exist in mainline, but are there in the patches, and there has been discussion of adjusting the defaults in the patches.

-    config_ptr->encoder_bit_depth        = 8;
+    config_ptr->encoder_bit_depth        = 10;
-    config_ptr->tune            = 1;
+    config_ptr->tune            = 2;
-    config_ptr->enable_qm    = 0;
+    config_ptr->enable_qm    = 1;
-    config_ptr->min_qm_level = 8;
+    config_ptr->min_qm_level = 0;
-    config_ptr->enable_variance_boost             = FALSE;
+    config_ptr->enable_variance_boost             = TRUE;

+    config_ptr->min_chroma_qm_level = 8;
+    config_ptr->max_chroma_qm_level = 15;
+    config_ptr->enable_alt_curve                  = FALSE;
+    config_ptr->sharpness                         = 1;
+    config_ptr->extended_crf_qindex_offset        = 0;
+    config_ptr->qp_scale_compress_strength        = 1;
+    config_ptr->frame_luma_bias                   = 0;
+    config_ptr->max_32_tx_size                    = FALSE;
+    config_ptr->adaptive_film_grain               = TRUE;
+    config_ptr->tf_strength                       = 1;
+    config_ptr->kf_tf_strength                    = 1;
+    config_ptr->noise_norm_strength               = 0;

igorbaryshev Feb 3, 2025

How did you test svt-av1-psy? svt-av1 and svt-av1-psy have different default GOP size, which could cause a big performance difference, set --keyint to 5s or 10s when testing. Also, try to use svt-av1-psy defaults when testing svt-av1. Disable features not present in svt-av1.

I indeed did not test disabling features not present in svt-av1, the performance impact was just a bit surprising. I mainly just set keyint to 321 in svt-av1, the same value -psy has by default.

Also, as a note, the percentage improvements I posted were relative to svt-av1 itself.

v2.3.0 - Community Builds Thread #88

gianni-rosato Oct 30, 2024 Maintainer

Community Builds Thread

Trust

Architecture

Not Vendor-Specific

AMD

Intel (Desktop)

Antivirus

Replies: 9 comments · 67 replies

gianni-rosato
Oct 30, 2024
Maintainer

Replies: 9 comments 67 replies