-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add neat #487
Conversation
jinyus
commented
Dec 27, 2023
•
edited
Loading
edited
Hey @FeepingCreature, hope you have the time to take a look to make sure everything was setup correctly. I just copied the code from your gist. Doesn't seem to scale too well but it's very young so that isn't surprising...but it might be worthwhile to do some profiling. |
I'll check more tomorrow, but note: |
Oh damn I'm an idiot, of course it's a flag, I specifically added support for checking Bleh. At any rate, try |
Disabling bounds check is against the rules but it does give a decent speed up. |
Huh, now I'm wondering how D avoids the bounds check. Probably because the array size is known statically. |
Yea, languages without fixed-size arrays are at a disadvantage. |
Hm. Well at any rate, I see some good places I can optimize the refcounter with this, there's some bad things going on with array appends spamming reference inc/decs even if it's single-owner. I'll go look into it. |
Okay, the array append issue (though fixed) wasn't actually on the critical path. But there is something fishy going on with the range checks; LLVM should easily be able to erase those, it's all inlined anyways. For some reason it assigns a number to a field and then doesn't realize that the number stays constant throughout. I'm on it. |
Rules clarification question: We can change the language to remove specific internal bounds checks that are demonstrably redundant, right? |
Yes, that's fine... as long as it's a general improvement and not only useful in this benchmark. |
Great, cause I just noticed that I was doing bounds checks for element loads for array loops. Which are uh, impossible to ever be violated. Ie. I'm pretty sure that's the main reason Neat was slow. (This optimization is safe because if you're doing a loop and you append to the variable, the loop still only runs to the original length; similarly if you truncate) Could you retry with 0.5.1 please? |
I'm seeing a 2x speed up. 0.5.0:
0.5.1:
|
Hooray! \o/ I think I'd need to look at D's assembly to see where to get more, but it's a start. |
llvm yielded a greater improvement. Though installation is a bit cumbersome as it expects llvm to be in a specific location. I think a
|
Oh, you were using the gcc backend? Yeah the LLVM one is the one I use and optimize the most, the gcc is mostly a fallback for bootstrapping. The locations are hardcoded cause I need a specific LLVM version on multi-LLVM systems. How to resolve this is completely unstandardized, sadly. Heh, looking at the diff this was pretty much the biggest possible speedup I could have got without advancing in the ranking, lol. |
Hey, um. You may want to try 0.5.2 as well :) I think the speedup should be impressive. Turns out it wasn't the bounds checking at all; I was just doing stupid things with argument parsing. |
Nice! Right up there with go and java. Neat:
Processing time (w/o IO): 18.870001ms
total: 0.11s memory: 59616k
Processing time (w/o IO): 18.753000ms
total: 0.15s memory: 59616k
Processing time (w/o IO): 18.900000ms
total: 0.11s memory: 59616k
Processing time (w/o IO): 18.914000ms
total: 0.11s memory: 59488k
Processing time (w/o IO): 19.087000ms
total: 0.11s memory: 59616k
Processing time (w/o IO): 18.981001ms
total: 0.11s memory: 59488k
Processing time (w/o IO): 18.892000ms
total: 0.11s memory: 59572k
Processing time (w/o IO): 19.103001ms
total: 0.11s memory: 59484k
Processing time (w/o IO): 19.546000ms
total: 0.11s memory: 59364k
Processing time (w/o IO): 19.062000ms
total: 0.11s memory: 59616k
Neat:
Processing time (w/o IO): 251.744995ms
total: 0.68s memory: 227720k
Processing time (w/o IO): 250.667007ms
total: 0.81s memory: 227724k
Processing time (w/o IO): 252.091995ms
total: 0.75s memory: 227592k
Neat:
Processing time (w/o IO): 2141.670898ms
total: 3.81s memory: 534540k
Processing time (w/o IO): 2144.925049ms
total: 3.69s memory: 534412k
Processing time (w/o IO): 2140.822998ms
total: 3.84s memory: 534288k |
That's what I like to see :) Turns out, if you have three-pointer (24-byte) arrays, you really don't want to pass them as structs on AMD64 - as opposed to D's 16 byte arrays, 24-byte arrays force an alloca and having a few dozen allocas rolling around being assembled and dissembled really messes up LLVM's ability to optimize register moves something fierce. Luckily since it's not a C type, I can just decide to pass them differently. And suddenly we're twice as fast. :) There's some other perf things in 0.5.2 - also technically I could switch to static arrays now - but "just pass arrays as individual arguments rather than stack pointers" was the big one. |
To clarify, there's basically no point to static arrays, it gives no appreciable speedup. With the extraneous bounds checks being removed in the last version, it now only has to check the Also, |
Just for fun, I would also like to draw your attention to this commit. Note the disparity between the size of the debugging effort and the size of the fix. I had 0.5.2 ready this morning if not for that. :) (Was it worth it? Absolutely. Moving up in benchmark placement makes the brain meats secrete the endorphins something fierce. - I think I'll leave it there though, D v2 is a bit silly.) |
Haha, I love to see it, thanks for the effort. I realized that this benchmark has help several language designers to find inefficiencies in their implementations. dart,lobster, inko The D and Rust guys had a little tit for tat going on, but D ultimately came out on top. As for |