Some compiler optimizations #8529

asterite · 2019-11-28T03:04:37Z

Some optimizations from musings in #8527 (comment)

The first two commits are self-explanatory. The third commits has a big commit message explaining the description of the problem. After reading that you can understand how it applies to this particular example:

require "spec"

macro assert(call, file = __FILE__, line = __LINE__)
  {% obj = call.receiver %}
  {% name = call.name %}
  %obj = {{obj}}
  %call = %obj.{{name}}
  unless %call
    fail "Expected #{%obj} to be {{name[0...-1]}}"
  end
end

describe "Foo" do
  {% for i in 0..12_000 %}
    it "bar {{i}}" do
      a = [1, 2, 3]
      assert a.empty?
    end
  {% end %}
end

In the above code the assert macro will define the local variables %obj and %call which will actually expand to variables named something like __temp1 and so on, and they'll all be different. Because of the above compiler bug, all these local variables will be seen in each assert macro invocation. So the first assert expansion can only see the a local variable. The second one will see that plus the two local variables introduced by the previous macro invocation (which shouldn't be seen, but that's the bug the third commit is fixing). On each invocation a Set is created with these local vars. That has something like quadratic complexity which explains why the above code was taking 12 seconds. Now it takes 0.4s (well, at least the "main" semantic phase).

I always thought macro expansion was slow, but apparently it's not. Or at least not that slow.

Once we merge this PR we can see if #3095 is still slow for the snippet I found it to be slow.

This increases the performance of Set#dup

You might know this or not, but macros have access to local variables of the scope where they are called. For example: ```crystal macro id(obj) {{obj}} end a = 1 puts id(obj) ``` The above macro expands to `a`. If that's seen as a program by itself, `a` parses to a call. However, because the macro is invoked in a context where `a` is a local variable, `a` is parsed as a local var. For this to work when expanding macros in the top-level phase, the compiler keeps track of existing local variables: clearing them when entering a new `def`, etc. A similar logic should be done for call blocks: after a block is analyzed all local vars declared inside it should not be visible outside it. Except that this wasn't the case: the compiler just cleared block args after a block, but not all variables. That means that a macro evaluated at one point in the program will have access to each and every local variable defined before it, even if it's not actually visible. This isn't that terrible, but in some cases it can lead to a lot of extra logic. Specifically, a Set with the local vars is created on each macro exapnsion and in a macro loop that keeps creating local vars inside blocks it can lead to huge sets.

asterite · 2019-11-28T03:08:35Z

Well, actually, it seems power_assert is still slow to compile but because it generates a ton of code. I think the assert could be made simpler, I'll see.

jkthorne · 2019-11-28T06:18:49Z

Do you have any benchmarks for this?

vlazar · 2019-11-28T08:32:01Z

Well, actually, it seems power_assert is still slow to compile but because it generates a ton of code. I think the assert could be made simpler, I'll see.

Maybe this simpler generated assert code proposed by RX14 might help

straight-shoota · 2019-11-28T09:26:41Z

I suppose there's a mistake in the example code from the third commit? Or else I just don't get it.

asterite · 2019-11-28T11:02:04Z

Ah, yes, there's a mistake. It should be puts id(a). I'll fix it later.

The benchmark is the snippet with that assert. Compile it with the existing compiler and with a compiler with this PR and you'll see a difference (of about 12 seconds less on my machine).

yxhuvud · 2019-11-29T19:48:49Z

src/compiler/crystal/semantic/top_level_visitor.cr

-      @vars.delete(arg.name) unless old_vars_keys.includes?(arg.name)
+    # After the block we should have the same number of local vars
+    # (blocks can't declare inject local vars to the outer scope)
+    while @vars.size > old_vars_size


Would there be any difference if this was (old_vars_size - @vars.size).times do ? I doubt it is noticeable but the method is called a lot of times, so who knows?

I wish we had an ordered map type, then we could have just nuked the last n entries in a more efficient manner.

EDIT: Hmm. Isn't the insertion order kept track of? Perhaps it is possible to do bulk operations. 🤔

/probably not worth the trouble though

All excellent observations. Yeah, Hash is ordered and so is Set so we could nuke the last entries, though because it's a Hash we still need to find the key hash to see in which bucket it's located, clear it and so on, so probably the same performance in the end.

RX14

❤️

This is amazing, thank you!!

I think there's likely as lot of low-hanging performance fruit in the compiler which could push out the scaling problem until it's a company with more resources's problem :P

asterite · 2019-12-04T21:18:40Z

I'd merge but there's no newer "milestone", so I guess we can wait until that exists... or we create one now? Can we merge before 0.32.0 is released?

bcardiff · 2019-12-04T22:52:34Z

Merging it for 0.32 is fine on my end. As long as there are no breaking changes :-)

asterite · 2019-12-04T23:24:55Z

Cool, let's do it!

asterite added 3 commits November 27, 2019 22:24

Set#dup: use Hash#dup

6490dda

This increases the performance of Set#dup

Compiler: avoid array allocation when creating Set

e5ba0e2

asterite added kind:bug A bug in the code. Does not apply to documentation, specs, etc. performance topic:lang:macros labels Nov 28, 2019

yxhuvud reviewed Nov 29, 2019

View reviewed changes

RX14 approved these changes Dec 4, 2019

View reviewed changes

asterite added this to the 0.32.0 milestone Dec 4, 2019

asterite merged commit ef53f4c into master Dec 4, 2019

asterite deleted the macro-optimizations branch December 4, 2019 23:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some compiler optimizations #8529

Some compiler optimizations #8529

asterite commented Nov 28, 2019

asterite commented Nov 28, 2019

jkthorne commented Nov 28, 2019

vlazar commented Nov 28, 2019

straight-shoota commented Nov 28, 2019

asterite commented Nov 28, 2019

yxhuvud Nov 29, 2019

asterite Nov 29, 2019

RX14 left a comment •

edited

Loading

asterite commented Dec 4, 2019

bcardiff commented Dec 4, 2019

asterite commented Dec 4, 2019

Some compiler optimizations #8529

Some compiler optimizations #8529

Conversation

asterite commented Nov 28, 2019

asterite commented Nov 28, 2019

jkthorne commented Nov 28, 2019

vlazar commented Nov 28, 2019

straight-shoota commented Nov 28, 2019

asterite commented Nov 28, 2019

yxhuvud Nov 29, 2019

Choose a reason for hiding this comment

asterite Nov 29, 2019

Choose a reason for hiding this comment

RX14 left a comment • edited Loading

Choose a reason for hiding this comment

asterite commented Dec 4, 2019

bcardiff commented Dec 4, 2019

asterite commented Dec 4, 2019

RX14 left a comment •

edited

Loading