-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POC of Proposal: Refactor values.Value into a struct #3437
Comments
I don't think that's quite true. I think that issue was pointing out that the "cost" (as calculated by the compiler) of a
I'd also include |
BTW, I think the proposal should also suggest what static methods might be defined on the |
I took another look at my prototype and I suspect the reason why this wasn't getting inlined was because of an issue fixed in #3436. I'll give this another try with those changes but I suspect that it will inline fine since the current cost is 83 without the change to |
Let's do a timebox POC for this proposal and compare it directly to the vectoritization proposal for the compiler package. The hypothesis is that the large gains we saw from the vectoritization POC were because we fixed/reduce unnecessary allocations or because we were taking advantage of cache locality on the CPU? This value struct proposal should fix the unnecessary allocations in a different way and will help us understand where the gains are coming from. DOD for the POC:
|
A few basic changes were made to the compiler in order to determine if there were other optimizations that would help to improve the compiler. The following changes were attempted:
For comparison, the benchmark for vectorization produced a 95% speedup over the base case. This comparison was only done on filter, but the optimizations should be pretty similar for map expressions. In particular, map expression benefit from optimizing the creation of object expressions which is similar to the second optimization discussed above. The memory optimizations for value produced a 75% speedup over the base case. It reduced the total memory allocations from 9,010 allocs/op to 6 allocs/op. The amount of memory used went from 273,564 B/op to 1,412 B/op. In comparison, vectorization produced 101 allocs/op and 31,568 B/op. While the value optimizations produced a larger drop of allocs/op, I believe that after proper attention is paid, the difference in allocations is likely minimal in effect on the overall performance of the system. Vectorization is also not exclusive with these memory optimizations. This does signify a pretty significant gain from paying more attention to memory allocations. The second optimization was attempted after noticing in flamegraphs that a substantial amount of time was spent in identifier and member expression access. The second optimization takes advantage of type inference to retrieve the exact index of a member of a record or the exact index of a value in the scope. For these lookups, we can store and retrieve them directly from an index instead of doing that dynamically at runtime. When these optimizations were put into place, an improvement of 78% over the base case was observed. A third optimization was performed to improve how values were set in arguments. The current version was accessing the array repeatedly and then accessing the value from the column reader. If you access the array once, wrap it in an interface, and then set the values using that interface, it improves the speedup to 82%. The fourth optimization involved adding typed evaluators for the primitives. The binary expression was also explicitly created rather than relying on an indirect function call to implement the operation. This last optimization kept the same memory improvements, but improved the speed further to a 87% speedup from the base version. These typed evaluators were an original portion of the design but they were removed because type inference could not support it and debugging the system was difficult when this was the case. Some areas where I think further work would be beneficial and could indicate other areas of speedup:
I have specific ideas on how to do each of these. I am going to prepare some thoughts on where I think we should go with this information for next steps. In summary, we can get an estimated 87% speedup through optimizing Go code related to the compiler. |
I'm still going to write out the thoughts for next steps but I have pushed up the proof of concept that I created that demonstrates the performance improvements mentioned above to the |
Not having unions/ADTs makes me sad but I think we can still do a bit better here by storing type Value struct {
v uint64
data HasNature
}
type HasNature interface {
Nature() semantic.Nature
} Since values stored in |
The problem with using an interface for the singleton access is I fear it will likely end up killing the performance. There's still a virtual indirect call to go through the interface. It might be worth trying, but I think we want the direct check of the enum type for the nature. |
Problem
Value
is used in some of the most performance sensitive areas of flux. It is used in any place where a user writes code and it is also used as an intermediary for many transformations that manipulate data. For that reason, it is very important that this struct be as fast as possible since it is commonly in the most performance sensitive areas.While the current implementation of
Value
isn't slow, it is unfortunately not performance-critical fast. A few of the ways it is slow:As a separate problem,
Value
is presently a very large interface. It presently has 16 methods and will get a new method each time a new type is added. Adding a method, such asDict()
, involves finding every implementation of the interface and adding a panic with unexpected kind to most of them. This can be very difficult and a maintainability problem. In addition to the large number of methods, it is very infrequent that any of these newValue
implementations implement anything new. Most of the alternative implementations end up being very similar to other implementations and usually involve more complex data types like functions rather than something simple like integers.Proposal
I think we should refactor the
values.Value
interface into a struct. Instead of an interface, I think we can create a struct with the following signature:I think this simple structure will increase the speed and maintainability of the values package in some very performance sensitive areas. For one, the
uint64
can be used to store the value forfloat64
,int64
, anduint64
. These three would benefit from being inlined and would have faster data access. As a comparison, #3436 for integers got the call toInt()
down to 3.36 ns/op. An implementation using the above is 0.469 ns/op.As an aside, we would not experience this benefit for floats until a future version of Go. Possibly go 1.17.
math.Float64frombits()
was previously marking itself as more expensive by the inliner and would not be inlined. That has been modified on the master branch: golang/go#42788 (comment).For value types that are more complex and an allocation would be expected, the
data interface{}
can be used to hold the value like it presently does. These functions would still benefit from inlining.The text was updated successfully, but these errors were encountered: