Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some compiler optimizations #8529

Merged
merged 3 commits into from
Dec 4, 2019
Merged

Some compiler optimizations #8529

merged 3 commits into from
Dec 4, 2019

Conversation

asterite
Copy link
Member

Some optimizations from musings in #8527 (comment)

The first two commits are self-explanatory. The third commits has a big commit message explaining the description of the problem. After reading that you can understand how it applies to this particular example:

require "spec"

macro assert(call, file = __FILE__, line = __LINE__)
  {% obj = call.receiver %}
  {% name = call.name %}
  %obj = {{obj}}
  %call = %obj.{{name}}
  unless %call
    fail "Expected #{%obj} to be {{name[0...-1]}}"
  end
end

describe "Foo" do
  {% for i in 0..12_000 %}
    it "bar {{i}}" do
      a = [1, 2, 3]
      assert a.empty?
    end
  {% end %}
end

In the above code the assert macro will define the local variables %obj and %call which will actually expand to variables named something like __temp1 and so on, and they'll all be different. Because of the above compiler bug, all these local variables will be seen in each assert macro invocation. So the first assert expansion can only see the a local variable. The second one will see that plus the two local variables introduced by the previous macro invocation (which shouldn't be seen, but that's the bug the third commit is fixing). On each invocation a Set is created with these local vars. That has something like quadratic complexity which explains why the above code was taking 12 seconds. Now it takes 0.4s (well, at least the "main" semantic phase).

I always thought macro expansion was slow, but apparently it's not. Or at least not that slow.

Once we merge this PR we can see if #3095 is still slow for the snippet I found it to be slow.

This increases the performance of Set#dup
You might know this or not, but macros have access to local variables of
the scope where they are called. For example:

```crystal
macro id(obj)
  {{obj}}
end

a = 1
puts id(obj)
```

The above macro expands to `a`. If that's seen as a program by itself,
`a` parses to a call. However, because the macro is invoked in a context
where `a` is a local variable, `a` is parsed as a local var.

For this to work when expanding macros in the top-level phase, the
compiler keeps track of existing local variables: clearing them when
entering a new `def`, etc. A similar logic should be done for call
blocks: after a block is analyzed all local vars declared inside it
should not be visible outside it. Except that this wasn't the case: the
compiler just cleared block args after a block, but not all variables.
That means that a macro evaluated at one point in the program will have
access to each and every local variable defined before it, even if it's
not actually visible. This isn't that terrible, but in some cases it can
lead to a lot of extra logic. Specifically, a Set with the local vars is
created on each macro exapnsion and in a macro loop that keeps creating
local vars inside blocks it can lead to huge sets.
@asterite asterite added kind:bug A bug in the code. Does not apply to documentation, specs, etc. performance topic:lang:macros labels Nov 28, 2019
@asterite
Copy link
Member Author

Well, actually, it seems power_assert is still slow to compile but because it generates a ton of code. I think the assert could be made simpler, I'll see.

@jkthorne
Copy link
Contributor

Do you have any benchmarks for this?

@vlazar
Copy link
Contributor

vlazar commented Nov 28, 2019

Well, actually, it seems power_assert is still slow to compile but because it generates a ton of code. I think the assert could be made simpler, I'll see.

Maybe this simpler generated assert code proposed by RX14 might help

@straight-shoota
Copy link
Member

I suppose there's a mistake in the example code from the third commit? Or else I just don't get it.

@asterite
Copy link
Member Author

Ah, yes, there's a mistake. It should be puts id(a). I'll fix it later.

The benchmark is the snippet with that assert. Compile it with the existing compiler and with a compiler with this PR and you'll see a difference (of about 12 seconds less on my machine).

@vars.delete(arg.name) unless old_vars_keys.includes?(arg.name)
# After the block we should have the same number of local vars
# (blocks can't declare inject local vars to the outer scope)
while @vars.size > old_vars_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be any difference if this was (old_vars_size - @vars.size).times do ? I doubt it is noticeable but the method is called a lot of times, so who knows?

I wish we had an ordered map type, then we could have just nuked the last n entries in a more efficient manner.

EDIT: Hmm. Isn't the insertion order kept track of? Perhaps it is possible to do bulk operations. 🤔

/probably not worth the trouble though

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All excellent observations. Yeah, Hash is ordered and so is Set so we could nuke the last entries, though because it's a Hash we still need to find the key hash to see in which bucket it's located, clear it and so on, so probably the same performance in the end.

Copy link
Member

@RX14 RX14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️

This is amazing, thank you!!

I think there's likely as lot of low-hanging performance fruit in the compiler which could push out the scaling problem until it's a company with more resources's problem :P

@asterite
Copy link
Member Author

asterite commented Dec 4, 2019

I'd merge but there's no newer "milestone", so I guess we can wait until that exists... or we create one now? Can we merge before 0.32.0 is released?

@bcardiff
Copy link
Member

bcardiff commented Dec 4, 2019

Merging it for 0.32 is fine on my end. As long as there are no breaking changes :-)

@asterite asterite added this to the 0.32.0 milestone Dec 4, 2019
@asterite
Copy link
Member Author

asterite commented Dec 4, 2019

Cool, let's do it!

@asterite asterite merged commit ef53f4c into master Dec 4, 2019
@asterite asterite deleted the macro-optimizations branch December 4, 2019 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug A bug in the code. Does not apply to documentation, specs, etc. performance topic:lang:macros
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants