Add neat #487

jinyus · 2023-12-27T02:23:04Z

Neat:

        Processing time (w/o IO): 177.988000ms
        total: 0.77s memory: 108904k
        Processing time (w/o IO): 197.040000ms
        total: 0.79s memory: 108904k
        Processing time (w/o IO): 177.947000ms
        total: 0.77s memory: 108900k
        Processing time (w/o IO): 196.555000ms
        total: 0.79s memory: 108904k
        Processing time (w/o IO): 180.425000ms
        total: 0.77s memory: 108900k
        Processing time (w/o IO): 178.002000ms
        total: 0.77s memory: 108772k
        Processing time (w/o IO): 196.600000ms
        total: 0.79s memory: 108776k
        Processing time (w/o IO): 178.117000ms
        total: 0.77s memory: 108900k
        Processing time (w/o IO): 196.533000ms
        total: 0.79s memory: 108772k
        Processing time (w/o IO): 178.200000ms
        total: 0.77s memory: 108900k

Neat:

        Processing time (w/o IO): 2656.659000ms
        total: 5.18s memory: 424464k
        Processing time (w/o IO): 2657.508000ms
        total: 5.17s memory: 424464k
        Processing time (w/o IO): 2953.203000ms
        total: 5.35s memory: 424336k

Neat:

        Processing time (w/o IO): 25958.542000ms
        total: 33.93s memory: 1660584k
        Processing time (w/o IO): 25959.446000ms
        total: 34.16s memory: 1660736k
        Processing time (w/o IO): 25995.728000ms
        total: 33.79s memory: 1660860k

jinyus · 2023-12-27T02:34:34Z

Hey @FeepingCreature, hope you have the time to take a look to make sure everything was setup correctly. I just copied the code from your gist. Doesn't seem to scale too well but it's very young so that isn't surprising...but it might be worthwhile to do some profiling.

FeepingCreature · 2023-12-27T04:58:43Z

I'll check more tomorrow, but note: -optimize is not actually a flag, I have no idea why it works. You want -O -release, like D.

FeepingCreature · 2023-12-27T19:02:40Z

Oh damn I'm an idiot, of course it's a flag, I specifically added support for checking -flag against longflags before shortflags.

Bleh.

At any rate, try -release. Array index checks are really hitting the example hard.

jinyus · 2023-12-27T19:24:06Z

Disabling bounds check is against the rules but it does give a decent speed up. 160ms -> 105ms

FeepingCreature · 2023-12-27T21:22:52Z

Huh, now I'm wondering how D avoids the bounds check. Probably because the array size is known statically.

jinyus · 2023-12-27T21:28:51Z

Yea, languages without fixed-size arrays are at a disadvantage.

FeepingCreature · 2023-12-27T21:45:11Z

Hm. Well at any rate, I see some good places I can optimize the refcounter with this, there's some bad things going on with array appends spamming reference inc/decs even if it's single-owner. I'll go look into it.

FeepingCreature · 2024-01-01T13:27:01Z

Okay, the array append issue (though fixed) wasn't actually on the critical path. But there is something fishy going on with the range checks; LLVM should easily be able to erase those, it's all inlined anyways. For some reason it assigns a number to a field and then doesn't realize that the number stays constant throughout. I'm on it.

FeepingCreature · 2024-01-01T21:05:39Z

Rules clarification question: We can change the language to remove specific internal bounds checks that are demonstrably redundant, right?

jinyus · 2024-01-01T21:08:27Z

Yes, that's fine... as long as it's a general improvement and not only useful in this benchmark.

FeepingCreature · 2024-01-01T22:02:19Z

Great, cause I just noticed that I was doing bounds checks for element loads for array loops. Which are uh, impossible to ever be violated. Ie. for (key, value in array) was doing a bounds check for every loop.

I'm pretty sure that's the main reason Neat was slow.

(This optimization is safe because if you're doing a loop and you append to the variable, the loop still only runs to the original length; similarly if you truncate)

Could you retry with 0.5.1 please?

jinyus · 2024-01-01T22:24:06Z

I'm seeing a 2x speed up.

0.5.0:

Neat | 185.74 ms | 2.76 s | 25.97 s

0.5.1:

Neat:

        Processing time (w/o IO): 84.985234ms
        total: 0.37s memory: 59620k
        Processing time (w/o IO): 85.004227ms
        total: 0.37s memory: 59752k
        Processing time (w/o IO): 85.150147ms
        total: 0.39s memory: 59748k
        Processing time (w/o IO): 85.087656ms
        total: 0.37s memory: 59752k
        Processing time (w/o IO): 85.107234ms
        total: 0.37s memory: 59880k
        Processing time (w/o IO): 85.150078ms
        total: 0.37s memory: 59620k
        Processing time (w/o IO): 85.060516ms
        total: 0.37s memory: 59880k
        Processing time (w/o IO): 84.853219ms
        total: 0.37s memory: 59748k
        Processing time (w/o IO): 84.757633ms
        total: 0.37s memory: 59880k
        Processing time (w/o IO): 84.843406ms
        total: 0.37s memory: 59748k

Neat:

        Processing time (w/o IO): 1154.516875ms
        total: 2.30s memory: 227980k
        Processing time (w/o IO): 1156.726875ms
        total: 2.32s memory: 227988k
        Processing time (w/o IO): 1154.512250ms
        total: 2.43s memory: 227988k

Neat:

        Processing time (w/o IO): 9915.724000ms
        total: 13.54s memory: 539836k
        Processing time (w/o IO): 9916.659000ms
        total: 13.59s memory: 540088k
        Processing time (w/o IO): 11525.339000ms
        total: 15.33s memory: 539964k

FeepingCreature · 2024-01-01T22:24:31Z

Hooray! \o/ I think I'd need to look at D's assembly to see where to get more, but it's a start.

jinyus · 2024-01-01T22:58:38Z

llvm yielded a greater improvement. Though installation is a bit cumbersome as it expects llvm to be in a specific location. I think a which llvm-config would be better in this case.

Neat:

        Processing time (w/o IO): 58.419998ms
        total: 0.30s memory: 59756k
        Processing time (w/o IO): 58.604000ms
        total: 0.33s memory: 59624k
        Processing time (w/o IO): 58.535999ms
        total: 0.30s memory: 59752k
        Processing time (w/o IO): 50.978001ms
        total: 0.29s memory: 59884k
        Processing time (w/o IO): 50.511002ms
        total: 0.29s memory: 59628k
        Processing time (w/o IO): 58.518002ms
        total: 0.30s memory: 59756k
        Processing time (w/o IO): 58.595001ms
        total: 0.30s memory: 59632k
        Processing time (w/o IO): 58.584999ms
        total: 0.30s memory: 59752k
        Processing time (w/o IO): 50.693001ms
        total: 0.29s memory: 59752k
        Processing time (w/o IO): 50.667000ms
        total: 0.29s memory: 59624k

Neat:

        Processing time (w/o IO): 832.085022ms
        total: 1.86s memory: 227876k
        Processing time (w/o IO): 833.153015ms
        total: 1.93s memory: 227880k
        Processing time (w/o IO): 833.741028ms
        total: 1.91s memory: 227876k

Neat:

        Processing time (w/o IO): 7259.328125ms
        total: 10.39s memory: 529236k
        Processing time (w/o IO): 6195.353027ms
        total: 9.35s memory: 529368k
        Processing time (w/o IO): 7255.119141ms
        total: 10.11s memory: 529236k

FeepingCreature · 2024-01-02T04:24:28Z

Oh, you were using the gcc backend? Yeah the LLVM one is the one I use and optimize the most, the gcc is mostly a fallback for bootstrapping.

The locations are hardcoded cause I need a specific LLVM version on multi-LLVM systems. How to resolve this is completely unstandardized, sadly.

Heh, looking at the diff this was pretty much the biggest possible speedup I could have got without advancing in the ranking, lol.

FeepingCreature · 2024-01-05T01:04:15Z

Hey, um. You may want to try 0.5.2 as well :) I think the speedup should be impressive.

Turns out it wasn't the bounds checking at all; I was just doing stupid things with argument parsing.

jinyus · 2024-01-05T01:17:05Z

Nice! Right up there with go and java.

Neat:

        Processing time (w/o IO): 18.870001ms
        total: 0.11s memory: 59616k
        Processing time (w/o IO): 18.753000ms
        total: 0.15s memory: 59616k
        Processing time (w/o IO): 18.900000ms
        total: 0.11s memory: 59616k
        Processing time (w/o IO): 18.914000ms
        total: 0.11s memory: 59488k
        Processing time (w/o IO): 19.087000ms
        total: 0.11s memory: 59616k
        Processing time (w/o IO): 18.981001ms
        total: 0.11s memory: 59488k
        Processing time (w/o IO): 18.892000ms
        total: 0.11s memory: 59572k
        Processing time (w/o IO): 19.103001ms
        total: 0.11s memory: 59484k
        Processing time (w/o IO): 19.546000ms
        total: 0.11s memory: 59364k
        Processing time (w/o IO): 19.062000ms
        total: 0.11s memory: 59616k

Neat:

        Processing time (w/o IO): 251.744995ms
        total: 0.68s memory: 227720k
        Processing time (w/o IO): 250.667007ms
        total: 0.81s memory: 227724k
        Processing time (w/o IO): 252.091995ms
        total: 0.75s memory: 227592k

Neat:

        Processing time (w/o IO): 2141.670898ms
        total: 3.81s memory: 534540k
        Processing time (w/o IO): 2144.925049ms
        total: 3.69s memory: 534412k
        Processing time (w/o IO): 2140.822998ms
        total: 3.84s memory: 534288k

FeepingCreature · 2024-01-05T01:35:37Z

That's what I like to see :)

Turns out, if you have three-pointer (24-byte) arrays, you really don't want to pass them as structs on AMD64 - as opposed to D's 16 byte arrays, 24-byte arrays force an alloca and having a few dozen allocas rolling around being assembled and dissembled really messes up LLVM's ability to optimize register moves something fierce. Luckily since it's not a C type, I can just decide to pass them differently. And suddenly we're twice as fast. :)

There's some other perf things in 0.5.2 - also technically I could switch to static arrays now - but "just pass arrays as individual arguments rather than stack pointers" was the big one.

FeepingCreature · 2024-01-05T01:37:50Z

To clarify, there's basically no point to static arrays, it gives no appreciable speedup. With the extraneous bounds checks being removed in the last version, it now only has to check the posts array anyways - LLVM can already tell that the 5-entry array has five entries and elide the bounds.

Also, -release has basically no effect anymore.

FeepingCreature · 2024-01-05T01:51:54Z

Just for fun, I would also like to draw your attention to this commit. Note the disparity between the size of the debugging effort and the size of the fix. I had 0.5.2 ready this morning if not for that. :)

(Was it worth it? Absolutely. Moving up in benchmark placement makes the brain meats secrete the endorphins something fierce. - I think I'll leave it there though, D v2 is a bit silly.)

jinyus · 2024-01-05T02:04:50Z

Haha, I love to see it, thanks for the effort. I realized that this benchmark has help several language designers to find inefficiencies in their implementations. dart,lobster, inko

The D and Rust guys had a little tit for tat going on, but D ultimately came out on top. As for Julia HO, it puzzles me how it's so fast, but I plan on doing a deep dive on whatever wizardry is going on under the hood.

jinyus added 2 commits December 26, 2023 21:22

add neat

be85b4c

update dockerfile

b889f7c

jinyus merged commit 42e50a7 into main Dec 27, 2023

jinyus deleted the neat branch February 9, 2024 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add neat #487

Add neat #487

jinyus commented Dec 27, 2023 •

edited

Loading

jinyus commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

jinyus commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

jinyus commented Dec 27, 2023 •

edited

Loading

FeepingCreature commented Dec 27, 2023

FeepingCreature commented Jan 1, 2024

FeepingCreature commented Jan 1, 2024

jinyus commented Jan 1, 2024

FeepingCreature commented Jan 1, 2024 •

edited

Loading

jinyus commented Jan 1, 2024 •

edited

Loading

FeepingCreature commented Jan 1, 2024

jinyus commented Jan 1, 2024 •

edited

Loading

FeepingCreature commented Jan 2, 2024 •

edited

Loading

FeepingCreature commented Jan 5, 2024

jinyus commented Jan 5, 2024 •

edited

Loading

FeepingCreature commented Jan 5, 2024

FeepingCreature commented Jan 5, 2024 •

edited

Loading

FeepingCreature commented Jan 5, 2024

jinyus commented Jan 5, 2024 •

edited

Loading

Add neat #487

Add neat #487

Conversation

jinyus commented Dec 27, 2023 • edited Loading

jinyus commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

jinyus commented Dec 27, 2023

FeepingCreature commented Dec 27, 2023

jinyus commented Dec 27, 2023 • edited Loading

FeepingCreature commented Dec 27, 2023

FeepingCreature commented Jan 1, 2024

FeepingCreature commented Jan 1, 2024

jinyus commented Jan 1, 2024

FeepingCreature commented Jan 1, 2024 • edited Loading

jinyus commented Jan 1, 2024 • edited Loading

FeepingCreature commented Jan 1, 2024

jinyus commented Jan 1, 2024 • edited Loading

FeepingCreature commented Jan 2, 2024 • edited Loading

FeepingCreature commented Jan 5, 2024

jinyus commented Jan 5, 2024 • edited Loading

FeepingCreature commented Jan 5, 2024

FeepingCreature commented Jan 5, 2024 • edited Loading

FeepingCreature commented Jan 5, 2024

jinyus commented Jan 5, 2024 • edited Loading

jinyus commented Dec 27, 2023 •

edited

Loading

jinyus commented Dec 27, 2023 •

edited

Loading

FeepingCreature commented Jan 1, 2024 •

edited

Loading

jinyus commented Jan 1, 2024 •

edited

Loading

jinyus commented Jan 1, 2024 •

edited

Loading

FeepingCreature commented Jan 2, 2024 •

edited

Loading

jinyus commented Jan 5, 2024 •

edited

Loading

FeepingCreature commented Jan 5, 2024 •

edited

Loading

jinyus commented Jan 5, 2024 •

edited

Loading