-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rearrange inlining of ensureroom
and speed it up
#27874
Conversation
base/iobuffer.jl
Outdated
|
||
@inline ensureroom(io::GenericIOBuffer, nshort::Int) = ensureroom(io, UInt(nshort)) | ||
@inline function ensureroom(io::GenericIOBuffer, nshort::UInt) | ||
if !io.writable || !io.seekable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this !io.writable || (!io.seekable && io.ptr > 1)
? One extra comparison and branch here shouldn't be much extra cost, but that would restore the expected fastpath for !seekable
.
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
String join is probably a test of this, and shows a large regression, but within the variance sometimes observed. So may need to check that locally |
@nanosoldier |
Your benchmark job has completed - no performance regressions were detected. A full report can be found here. cc @ararslan |
Hmm, this now seems to have an unusually large effect on sysimg build time. Will need to investigate; it might turn up something generally interesting. |
Indeed, including Base goes from 27 to 74 seconds with this... |
rebase this and see if it's better now? |
|
FWIW, I found that almost all slowdown is from including |
In fact, evaluating the following method definition seems to account for the full slowdown: Lines 1567 to 1586 in c162219
Hopefully that is enough to figure out what is going on. |
In fact, replacing that function definition with g() = @assert 1==1 is enough to trigger the slowdown so it has something to do with the string stuff in the Note that g() = @assert true does not lead to a slowdown. |
defining |
Liked @vtjnash said, just putting a |
We have string(xs...) = print_to_string(xs...) and julia> @which string(:(1==1))
string(xs...) in Base at strings/io.jl:155 but evaluating print_to_string(:(1==1)) does not give a slowdown while string(:(1==1)) does. I think I am reaching my limit here :) |
Quite strange. You could try enabling |
Commenting out the https://github.com/JuliaLang/julia/pull/27874/files#diff-1cac367ddf750f99a052d3119eb8381fR288 makes go from 56 seconds to 0.4 seconds. Also inserting a precompile(Tuple{typeof(Base.compact), Base.GenericIOBuffer{Array{UInt8,1}}}) before the call to precompile(Tuple{typeof(Base.string), Expr}) makes it fast. Forcing interrupt (for the long case) gives
over and over. I tried to do some profiling but didn't get anything reasonable |
Commenting out the
in |
2e0dd24
to
9c406a6
Compare
I pushed a commit that works around the inference problem. Good ol' outlining to the rescue... @nanosoldier runbenchmarks(ALL, vs = ":master") |
9c406a6
to
198e452
Compare
Nanosoldier needs code quoting around the command for whatever reason. @nanosoldier |
@vtjnash noted that this PR didn't follow the styleguide proposed in #29727 and therefore gave the compiler a bit of extra headache (fixing it didn't remove the need of the first workaround, however):
Fixed in the last commit. |
Im curious if that changed the benchmarks anything @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
On 1.0.1 I don't see master: julia> @btime buf();
3.334 ms (20 allocations: 514.55 KiB) 1.0.1: julia> @btime buf();
3.311 ms (19 allocations: 514.58 KiB) PR: julia> @btime buf();
3.313 ms (19 allocations: 514.53 KiB) But less code bloat is good anyway? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. do we need a nanosoldier run or any such other test?
There hasn't been any change here since the last nanosoldier run so presumably not. |
If you look at the IR for
write(::IOBuffer, ::UInt8)
, it is absurdly large since bothensureroom
andcompact
get inlined, plus there are spurious error checks for cases that don't happen. This separates the rare part ofensureroom
into its own noinline function, and tightens up some of the rest of the code. This benchmark:gets almost 2x faster plus code size is significantly reduced.