-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise line number iteration #276
Conversation
Before: test bench_executing_line_number_programs ... bench: 379,536 ns/iter (+/- 12,044) test bench_parsing_line_number_program_opcodes ... bench: 271,593 ns/iter (+/- 9,285) After: test bench_executing_line_number_programs ... bench: 246,375 ns/iter (+/- 11,192) test bench_parsing_line_number_program_opcodes ... bench: 90,493 ns/iter (+/- 2,722)
Before: test bench_executing_line_number_programs ... bench: 246,375 ns/iter (+/- 11,192) After: test bench_executing_line_number_programs ... bench: 173,505 ns/iter (+/- 4,188)
Before: test bench_executing_line_number_programs ... bench: 173,505 ns/iter (+/- 4,188) After: test bench_executing_line_number_programs ... bench: 161,251 ns/iter (+/- 4,392)
Otherwise we get a lot of extra baggage even when it isn't used. Before: test bench_executing_line_number_programs ... bench: 161,251 ns/iter (+/- 4,392) After: test bench_executing_line_number_programs ... bench: 116,779 ns/iter (+/- 4,008)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff :)
(op_index_with_advance / maximum_operations_per_instruction); | ||
|
||
self.row.registers.op_index = op_index_with_advance % maximum_operations_per_instruction; | ||
if maximum_operations_per_instruction == 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, a little surprised this is such a large speed up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. It definitely helped in addr2line too though, so it's not just a synthetic benchmark.
@@ -503,7 +503,7 @@ pub enum Opcode<R: Reader> { | |||
UnknownStandard1(constants::DwLns, u64), | |||
|
|||
/// An unknown standard opcode with multiple operands. | |||
UnknownStandardN(constants::DwLns, Vec<u64>), | |||
UnknownStandardN(constants::DwLns, R), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, the issue is the size of Vec
bloating the decoded instruction size, not that we are hitting a lot of unknown standard opcodes that also have more than one operand, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, we could save another word (assuming this is still the widest variant) by boxing R
, since R
is usually two words with EndianBuf
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not the size of the Vec
. The largest variant is currently DefineFile
at 48 bytes, and UnknownStandardN
was already only 32 before this change.
We're not hitting unknown standard opcodes.
I haven't investigated the assembly in great depth (there's too much), but there's a few inlined Vec
functions that are no longer needed, and I guess this helps the optimiser. It also means there's no code to run to drop the Opcode
.
Useful for addr2line. Adds another
inline(always)
unfortunately.