-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: add experimental support for using mimalloc allocator #404
Open
wincent
wants to merge
7
commits into
main
Choose a base branch
from
wincent/mimalloc
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes: ``` luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:199: attempt to call field 'nvim_buf_is_valid' (a nil value) ```
Fixes: ``` luajit: ...and-t/bin/benchmarks/../../lua/wincent/commandt/init.lua:244: attempt to index field 'scanners' (a nil value) ```
Vendoring from: - https://github.com/microsoft/mimalloc and specifically: - https://github.com/microsoft/mimalloc/releases/tag/v2.0.6 I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is: du -sh lua/wincent/commandt/lib/vendor/github/microsoft 4.8M lua/wincent/commandt/lib/vendor/github/microsoft As it is not clear whether this is going to be a great idea or not, it only takes effect if you call `make` with `USE_MIMALLOC` set. You can verify that it actually _is_ overriding the standard `malloc()` etc calls by running a command with `MIMALLOC_VERBOSE`, which will cause it to print some extra info out: env MIMALLOC_VERBOSE=1 TIMES=1 bin/benchmarks/scanner.lua Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines). On mid-2015 MacBook Pro ======================= These numbers are all over the map due to thermal throttling. best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.04094 0.04178 0.00278 [-0.6%] (0.04100) (0.04186) (0.00287) [-0.6%] file 0.30707 0.31436 0.02486 [-1.0%] 0.05 (0.30735) (0.31473) (0.02499) [-1.0%] 0.05 find 0.05827 0.06678 0.01162 [+1.5%] 0.05 (0.92013) (0.93752) (0.04453) [-1.0%] 0.025 git 0.05163 0.06000 0.01115 [+3.3%] 0.0005 (1.00993) (1.02469) (0.04072) [-0.7%] 0.025 rg 0.06419 0.07229 0.01203 [+3.8%] 0.005 (1.61018) (1.66326) (0.08803) [+0.3%] watchman 0.01095 0.01121 0.00068 [+0.2%] (1.16830) (1.17605) (0.01835) [+0.6%] 0.005 total 0.54387 0.56643 0.04391 [+0.4%] (5.09873) (5.15811) (0.15328) [-0.1%] best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.44648 0.48275 0.19826 [-10.0%] 0.01 (0.44705) (0.48350) (0.19793) [-10.0%] 0.01 command-t 0.41205 0.44292 0.21658 [+3.8%] 0.005 (0.41255) (0.44364) (0.21681) [+3.8%] 0.005 chromium (subset) 2.75724 2.99017 0.47925 [-1.3%] (0.51232) (0.55960) (0.17228) [-1.5%] chromium (whole) 3.18933 3.63241 0.64392 [-0.7%] (0.41821) (0.49571) (0.14853) [-0.3%] 0.05 big (400k) 4.90155 5.51271 1.20748 [-1.0%] (0.65297) (0.74723) (0.23045) [-4.5%] 0.05 total 11.74815 13.06097 2.16866 [-1.2%] (2.47007) (2.72968) (0.54795) [-2.8%] 0.025 M1 MacBook Pro ============== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.04407 0.05368 0.01123 [-1.4%] 0.025 (0.04433) (0.05413) (0.01150) [-1.6%] 0.025 file 0.20902 0.21428 0.01060 [+1.0%] 0.01 (0.20902) (0.21511) (0.01219) [+1.1%] 0.005 find 0.02687 0.03006 0.01015 [+3.9%] 0.05 (0.63141) (0.64156) (0.03483) [+0.7%] 0.05 git 0.02693 0.02995 0.00980 [+2.2%] (0.71734) (0.72825) (0.04266) [-0.4%] rg 0.02916 0.03318 0.01136 [+2.9%] (0.90193) (0.91710) (0.07157) [+1.4%] 0.005 watchman 0.01100 0.01156 0.00165 [-0.7%] (1.18802) (1.21274) (0.13422) [+1.5%] 0.005 total 0.36119 0.37272 0.03632 [+1.1%] (3.71713) (3.76889) (0.18577) [+0.9%] 0.005 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.28526 0.29636 0.08356 [-4.0%] 0.025 (0.28527) (0.29647) (0.08343) [-4.0%] 0.025 command-t 0.23759 0.24616 0.07356 [+1.6%] (0.23760) (0.24618) (0.07354) [+1.6%] chromium (subset) 1.56761 1.58469 0.03655 [-0.3%] (0.41376) (0.42040) (0.02032) [-0.4%] chromium (whole) 1.87180 1.88726 0.06174 [-0.4%] 0.025 (0.31695) (0.32809) (0.03497) [+0.4%] big (400k) 2.90455 2.92204 0.07185 [-0.2%] (0.48384) (0.50533) (0.07608) [-0.0%] total 6.88851 6.93650 0.15002 [-0.4%] 0.025 (1.74550) (1.79647) (0.14517) [-0.5%] M3 MacBook Pro ============== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.01255 0.01400 0.00409 [+2.0%] (0.01260) (0.01447) (0.00635) [-3.3%] file 0.14749 0.15026 0.00629 [+38.1%] 0.0005 (0.14843) (0.15115) (0.00626) [+37.9%] 0.0005 find 0.20783 0.27306 0.12796 [+15.8%] 0.0005 (1.13360) (1.38588) (0.55490) [+15.3%] 0.0005 git 0.21748 0.25155 0.10398 [+13.0%] 0.0005 (1.17693) (1.40937) (0.54965) [+9.1%] 0.0005 rg 0.20640 0.26983 0.12977 [+12.2%] 0.0005 (1.55310) (1.78037) (0.55921) [+6.9%] 0.0005 watchman 0.01813 0.01980 0.00287 [+6.1%] 0.0005 (1.19740) (1.21007) (0.02198) [-0.2%] total 0.81542 0.97850 0.33560 [+17.1%] 0.0005 (5.23262) (5.95132) (1.66475) [+8.7%] 0.0005 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.21079 0.22604 0.10943 [+4.8%] 0.025 (0.21107) (0.22640) (0.10972) [+4.7%] 0.025 command-t 0.16694 0.17164 0.04923 [-0.6%] (0.16716) (0.17228) (0.05253) [-0.5%] chromium (subset) 1.35310 1.36239 0.02010 [+0.1%] (0.28797) (0.29255) (0.01108) [+0.3%] chromium (whole) 1.11148 1.11599 0.01258 [+0.3%] 0.01 (0.12167) (0.12478) (0.00828) [-0.2%] big (400k) 1.67454 1.68249 0.05630 [+0.6%] 0.0005 (0.18195) (0.18487) (0.00876) [+0.0%] total 4.52863 4.55855 0.15573 [+0.5%] 0.01 (0.97644) (1.00087) (0.12712) [+1.0%] Ryzen 5950X Arch Linux ====================== best avg sd +/- p (best) (avg) (sd) +/- p buffer 0.02465 0.02544 0.01098 [-0.4%] (0.02467) (0.02546) (0.01099) [-0.5%] file 0.09906 0.09948 0.00124 [-0.1%] (0.09943) (0.09995) (0.00130) [-0.2%] find 0.01852 0.01885 0.00084 [+0.5%] (0.25137) (0.25430) (0.00762) [+0.1%] git 0.01718 0.01811 0.00210 [+0.6%] (0.22095) (0.22468) (0.01156) [-0.6%] rg 0.01748 0.01792 0.00105 [+0.5%] (0.60575) (0.61077) (0.01562) [-0.1%] watchman 0.00178 0.00186 0.00033 [-5.6%] (0.02282) (0.02717) (0.02826) [-11.5%] total 0.17975 0.18165 0.01018 [-0.0%] (1.23025) (1.24233) (0.04061) [-0.4%] 0.05 best avg sd +/- p (best) (avg) (sd) +/- p pathological 0.26186 0.27703 0.10940 [-4.4%] 0.0005 (0.26196) (0.27715) (0.10946) [-4.4%] 0.0005 command-t 0.19271 0.20058 0.05044 [-3.0%] 0.0005 (0.19279) (0.20065) (0.05047) [-3.0%] 0.0005 chromium (subset) 1.83627 1.89158 0.25631 [-3.8%] 0.01 (0.45977) (0.49985) (0.21028) [-15.7%] 0.005 chromium (whole) 1.36877 1.38916 0.06031 [+2.6%] 0.0005 (0.12129) (0.12530) (0.01659) [-0.4%] big (400k) 2.39053 2.43636 0.11813 [+1.8%] 0.0005 (0.19600) (0.20396) (0.02644) [-0.1%] total 6.09256 6.19472 0.33431 [-0.2%] (1.24139) (1.30690) (0.25114) [-7.5%] 0.005
The .prettierignore change is because there are a couple of things in the Markdown files that Prettier doesn't like. The clang-format thing comes from a tip here: - https://stackoverflow.com/a/57272592/2103996 Should prevent CI failures like this one: - https://github.com/wincent/command-t/actions/runs/2979207632
Wasn't needed on clang, but is needed with gcc: /usr/bin/ld: mimalloc-override.o: relocation R_X86_64_TPOFF32 against `recurse' can not be used when making a shared object; recompile with -fPIC
I can't see a changelog or release notes in the repo, so here is the diff: - microsoft/mimalloc@v2.0.6...v2.1.7
wincent
force-pushed
the
wincent/mimalloc
branch
from
August 13, 2024 17:40
0938103
to
836698d
Compare
Quick test of Hoard, for comparison:
Results (relative to
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Vendoring from microsoft/mimalloc and specifically the
v2.0.6 tagv2.1.7 tag.mimalloc is a simple allocator focused on performance and it is easy to drop in as a replacement for
malloc()
and friends as described in its README. So as not to bring in a dependency on CMake, we just build thestatic.c
version. Sadly, the performance delta (see numbers below) is not a clear win; the numbers are a bit all over the place. This probably isn't that surprising because most of the heavy memory allocation in Command-T is already micro-managed internally (but simply, with little overhead) using big slabs allocated withmmap()
. Nevertheless, parking this here as a possible idea.I added a script to pull down the release archive and dump it into a directory, because I don't want to use a submodule for this (people installing a Vim plugin from a Git repo shouldn't have to know/worry about whether it needs or uses submodules). Space on disk for this set of files (some of which are obviously redundant in our context) is:
As it is not clear whether this is going to be a great idea or not, it only takes effect if you call
make
withUSE_MIMALLOC
set. You can verify that it actually is overriding the standardmalloc()
etc calls by running a command withMIMALLOC_VERBOSE
, which will cause it to print some extra info out:Impact (unfortunately, a bit inconclusive) on scanner and matcher benchmarks follows. Note that numbers shouldn't be compared across machines because they were produced at different times (for example, the M3 numbers are from a different version of the OS, and the branch was rebased, compared with the other machines).
On mid-2015 MacBook Pro
These numbers are all over the map due to thermal throttling.
M1 MacBook Pro
M3 MacBook Pro
Ryzen 5950X Arch Linux