-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some compiler optimizations #8529
Conversation
This increases the performance of Set#dup
You might know this or not, but macros have access to local variables of the scope where they are called. For example: ```crystal macro id(obj) {{obj}} end a = 1 puts id(obj) ``` The above macro expands to `a`. If that's seen as a program by itself, `a` parses to a call. However, because the macro is invoked in a context where `a` is a local variable, `a` is parsed as a local var. For this to work when expanding macros in the top-level phase, the compiler keeps track of existing local variables: clearing them when entering a new `def`, etc. A similar logic should be done for call blocks: after a block is analyzed all local vars declared inside it should not be visible outside it. Except that this wasn't the case: the compiler just cleared block args after a block, but not all variables. That means that a macro evaluated at one point in the program will have access to each and every local variable defined before it, even if it's not actually visible. This isn't that terrible, but in some cases it can lead to a lot of extra logic. Specifically, a Set with the local vars is created on each macro exapnsion and in a macro loop that keeps creating local vars inside blocks it can lead to huge sets.
Well, actually, it seems power_assert is still slow to compile but because it generates a ton of code. I think the |
Do you have any benchmarks for this? |
Maybe this simpler generated assert code proposed by RX14 might help |
I suppose there's a mistake in the example code from the third commit? Or else I just don't get it. |
Ah, yes, there's a mistake. It should be The benchmark is the snippet with that assert. Compile it with the existing compiler and with a compiler with this PR and you'll see a difference (of about 12 seconds less on my machine). |
@vars.delete(arg.name) unless old_vars_keys.includes?(arg.name) | ||
# After the block we should have the same number of local vars | ||
# (blocks can't declare inject local vars to the outer scope) | ||
while @vars.size > old_vars_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would there be any difference if this was (old_vars_size - @vars.size).times do
? I doubt it is noticeable but the method is called a lot of times, so who knows?
I wish we had an ordered map type, then we could have just nuked the last n entries in a more efficient manner.
EDIT: Hmm. Isn't the insertion order kept track of? Perhaps it is possible to do bulk operations. 🤔
/probably not worth the trouble though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All excellent observations. Yeah, Hash is ordered and so is Set so we could nuke the last entries, though because it's a Hash we still need to find the key hash to see in which bucket it's located, clear it and so on, so probably the same performance in the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
This is amazing, thank you!!
I think there's likely as lot of low-hanging performance fruit in the compiler which could push out the scaling problem until it's a company with more resources's problem :P
I'd merge but there's no newer "milestone", so I guess we can wait until that exists... or we create one now? Can we merge before 0.32.0 is released? |
Merging it for 0.32 is fine on my end. As long as there are no breaking changes :-) |
Cool, let's do it! |
Some optimizations from musings in #8527 (comment)
The first two commits are self-explanatory. The third commits has a big commit message explaining the description of the problem. After reading that you can understand how it applies to this particular example:
In the above code the
assert
macro will define the local variables%obj
and%call
which will actually expand to variables named something like__temp1
and so on, and they'll all be different. Because of the above compiler bug, all these local variables will be seen in eachassert
macro invocation. So the firstassert
expansion can only see thea
local variable. The second one will see that plus the two local variables introduced by the previous macro invocation (which shouldn't be seen, but that's the bug the third commit is fixing). On each invocation aSet
is created with these local vars. That has something like quadratic complexity which explains why the above code was taking 12 seconds. Now it takes 0.4s (well, at least the "main" semantic phase).I always thought macro expansion was slow, but apparently it's not. Or at least not that slow.
Once we merge this PR we can see if #3095 is still slow for the snippet I found it to be slow.