Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3x slow down in string benchmarks due to Fix annotated join with non-concrete eltype iters (#54919) #55279

Closed
Zentrik opened this issue Jul 28, 2024 · 8 comments · Fixed by #55424
Assignees
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request performance Must go faster strings "Strings!"
Milestone

Comments

@Zentrik
Copy link
Member

Zentrik commented Jul 28, 2024

The ["string", "readuntil", "target length 1"] and ["string", "readuntil", "target length 2"] benchmarks got 3x slower recently. I bisected to 462d7f4, @tecosaur.

See https://github.com/JuliaCI/NanosoldierReports/blob/master/benchmark/by_date/2024-07/26/report.md or https://tealquaternion.camdvr.org/compare.html?start=6b08e80bd1217c5c2eb6c89604c21faa5b2a156f&end=42a8efff5a6b96394a36af541cf8a7a4725632fb&stat=min-wall-time&name=string for the benchmark results

@Zentrik Zentrik changed the title String regression due to Fix annotated join with non-concrete eltype iters (#54919) 3x slow down in string benchmarks due to Fix annotated join with non-concrete eltype iters (#54919) Jul 28, 2024
@tecosaur tecosaur self-assigned this Jul 28, 2024
@tecosaur
Copy link
Contributor

Thanks for investigating this @Zentrik.

Regarding the eyebrow-raising regressions you highlighted (string/readuntil/target length {1,2}), I can reproduce them comparing 1.10 to nightly:

# 1.10
julia> using Chairmarks

julia> buffer = IOBuffer("A" ^ 50000)
IOBuffer(data=UInt8[...], readable=true, writable=false, seekable=true, append=false, size=50000, maxsize=Inf, ptr=1, mark=-1)

julia> @b readuntil(seekstart($buffer), "AA")
13.762 ns (1 allocs: 64 bytes)

# Nightly
julia> @b readuntil(seekstart($buffer), "AA")
92.853 ns (4 allocs: 208 bytes)

From a quick look though, it's hard for me to see how the commit you linked (changing some of the join code) affects them. It also seems like most of the additional time is from the seekstart(...) call, which I find strange? I tested using BenchmarkTools's @btime as well, and that also gave ~90ns for that expression, and it also recorded ~70ns for seek($buffer, 0).

I'll dig further into this, but if anybody else would like to take a look at this, that would be appreciated.

@nsajko nsajko added performance Must go faster strings "Strings!" labels Jul 28, 2024
@nsajko
Copy link
Contributor

nsajko commented Jul 28, 2024

julia/base/strings/io.jl

Lines 357 to 371 in 197295c

if isconcretetype(eltype(iterator)) && !_isannotated(eltype(iterator)) && !any(_isannotated, args)
sprint(join, iterator, args...)
else
io = AnnotatedIOBuffer()
join(io, iterator, args...)
# If we know (from compile time information, or dynamically in the case
# of iterators with a non-concrete eltype), that the result is annotated
# in nature, we extract an `AnnotatedString`, otherwise we just extract
# a plain `String` from `io`.
if isconcretetype(eltype(iterator)) || !isempty(io.annotations)
read(seekstart(io), AnnotatedString{String})
else
String(take!(io.io))
end
end

Probably not relevant for performance, but it'd be nice to deduplicate the code here, e.g., eltype(iterator) is in triplicate:

@tecosaur
Copy link
Contributor

Mmm, the to me the crux of it is that the logic I added in the code you link @nsajko should be fully resolved using compile-time information, except in the case of type-unstable iterators.

It may be the case that @code_warntype lied to me about what would be resolved at compile-time and what wouldn't (that's happened a few times now), but regardless I have a hard time seeing how _join_preserve_annotations is called in the path of the ["string", "readuntil", "target length 2"] benchmark, which seems to just be:

buffer = IOBuffer("A" ^ 50000) # setup
readuntil(seekstart($buffer), "AA") # readuntil 2 benchmark

@KristofferC
Copy link
Member

I have a hard time seeing

The easiest might be to just revert locally and check if it fixes things. In that case, you just get the correct answer, no matter if it is hard to see or not.

@vtjnash
Copy link
Member

vtjnash commented Jul 29, 2024

Don't know if significant here, but usually all(!f, c) would be preferred to !any(f, c), I think, given that any and all are shortcircuiting.

Since they are both short circuiting, neither is preferred, and so it isn't something that 'should' be changed (absent other considerations such as profiling)

@KristofferC
Copy link
Member

My guess is that this has something to do with #35800 and that inference of seekstart(::IOBuffer) somehow gets poisoned by the call at

read(seekstart(io), AnnotatedString{String})

@KristofferC KristofferC added this to the 1.11 milestone Aug 2, 2024
@tecosaur tecosaur added the help wanted Indicates that a maintainer wants help on an issue or pull request label Aug 6, 2024
@simeonschaub
Copy link
Member

simeonschaub commented Aug 6, 2024

I tracked down the performance regression in seekstart to #54070, if I revert that commit the performance difference of @b seekstart($buffer) goes away. Since that's a bugfix though, I'm not sure we can just revert it.

It also doesn't fully fix the issue here since readuntil is still slower:

# master with #54070 reverted
julia> @b readuntil(seekstart($buffer), "AA")
44.819 ns (3 allocs: 192 bytes)

# 1.10
julia> @b readuntil(seekstart($buffer), "AA")
23.820 ns (1 allocs: 64 bytes)

@vtjnash
Copy link
Member

vtjnash commented Aug 8, 2024

The remaining delta turns out to be because of an artifact of the benchmark definition itself: because the return value is a zero length string, the runtime could cheat and skip much of the work. However, the newly added copyuntil now preallocates a buffer of 70 bytes (previously it did not) and adds an extra IOBuffer wrapper to simplify the interface. So essentially the benchmark is measuring the ratio of the penalty of the hardcoded size on the guess for the output (0 or 70 bytes) against the actual observed bytes. That is entirely synthetic though. I measure that they have the same performance on this benchmark if I remove that pre-allocation, but that there is little real-world advantage to removing it.

KristofferC added a commit that referenced this issue Aug 9, 2024
The `clamp` function was defined in Base.Math, but required to be in
Base now, so move it to intfuncs with other similar functions

Fixes #55279
lazarusA pushed a commit to lazarusA/julia that referenced this issue Aug 17, 2024
This is a more apt description, since it is not floating point related,
and used earlier (such as in IOBuffer).

Fixes JuliaLang#55279
KristofferC added a commit that referenced this issue Aug 19, 2024
The `clamp` function was defined in Base.Math, but required to be in
Base now, so move it to intfuncs with other similar functions

Fixes #55279

(cherry picked from commit 3db1d19)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request performance Must go faster strings "Strings!"
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants