-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Stateful in String code #25736
Use Stateful in String code #25736
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, this is certainly a showcase for the expressive power of this iterator. Damn! The combination of zip
with all
to write generic string matching code is really impressive.
base/iterators.jl
Outdated
7 | ||
```` | ||
""" | ||
struct Stateful{VS,T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make more sense for T
to come before VS
? Or are there cases where you want to dispatch on the iterator state type but not the iterator type itself?
base/iterators.jl
Outdated
Stateful{VS, T}(itr, Ref{Union{VS, Nothing}}(vs), Ref{Int}(0)) | ||
end | ||
|
||
function Stateful{VS}(itr::T) where {VS,T} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I guess this is where having VS
first is handy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as I mentioned in the Stateful PR, the iterator assumes that the type of the val/state tuple won't change from the first one. You might want to use Stateful{Any}
if you can't guarantee that.
c, i = next(a, i) | ||
d, j = next(b, j) | ||
a, b = Iterators.Stateful(a), Iterators.Stateful(b) | ||
for (c, d) in zip(a, b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooo, this is really nice.
base/strings/basic.jl
Outdated
c ≠ d && return ifelse(c < d, -1, 1) | ||
end | ||
return ifelse(done(b, j), 0, -1) | ||
isempty(a) && return ifelse(isempty(b), 0, 1) | ||
return -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe write this as isempty(b) - isempty(a)
?
if c in esc | ||
print(io, '\\', c) | ||
elseif isascii(c) | ||
c == '\0' ? print(io, escape_nul(s,j)) : | ||
c == '\0' ? print(io, escape_nul(peek(a))) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beware: implementation of peek
is very spotty across different IO types. We should probably fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this only calls, peek on Stateful
though, not on the IO object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. That's fine then.
base/strings/search.jl
Outdated
return i | ||
a = Iterators.Stateful(trest) | ||
matched = all(zip(SubString(s, ii), a)) do (c,d) | ||
c == d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😻 I would define eq((a,b)) = a == b
and then write this:
matched = all(eq, zip(SubString(s, ii), a))
base/strings/util.jl
Outdated
c, i = next(a,i) | ||
d, j = next(b,j) | ||
a, b = Iterators.Stateful(a), Iterators.Stateful(b) | ||
for (c, d) in zip(a, b) | ||
(c != d) && (return false) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then
all(eq, zip(a, b)) && isempty(b)
base/strings/util.jl
Outdated
(c != d) && (return false) | ||
i = prevind(a,i) | ||
j = prevind(b,j) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And
all(eq, zip(a, b)) && isempty(b)
d1f48aa
to
ed5c124
Compare
The codegen issues were #25752. This passes tests for me locally now. Need to finish up the Stateful PR and make sure performance doesn't regress here, but then I think this is good to go. |
@nanosoldier |
base/strings/search.jl
Outdated
@@ -106,12 +106,10 @@ function findnext(testf::Function, s::AbstractString, i::Integer) | |||
z = ncodeunits(s) + 1 | |||
1 ≤ i ≤ z || throw(BoundsError(s, i)) | |||
@inbounds i == z || isvalid(s, i) || string_index_err(s, i) | |||
while !done(s,i) | |||
d, j = next(s,i) | |||
for (j, d) in pairs(s[i:end]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enumerate(SubString(s, i))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the same thing. This needs string indices not ordinals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably use a SubString
though to avoid copying, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was under the impression that it did (because it does for AbstractString), but turns out I'm wrong for String
, so yeah, needs to be SubString
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks useful. A few minor fixes. Remove WIP from title?
base/strings/search.jl
Outdated
end | ||
ii = nextind(s, i) | ||
a = Iterators.Stateful(trest) | ||
matched = all(splat(==), zip(s[ii:end], a)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubString(z, ii)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing, indexing into a string gives a SubString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind, I'm wrong about this.
base/strings/search.jl
Outdated
end | ||
a = Iterators.Stateful(trest) | ||
ii = prevind(s, i) | ||
matched = all(zip(Iterators.reverse(pairs(s[1:ii])), a)) do ((idx, c), d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SubString
base/strings/search.jl
Outdated
a = Iterators.Stateful(trest) | ||
ii = prevind(s, i) | ||
matched = all(zip(Iterators.reverse(pairs(s[1:ii])), a)) do ((idx, c), d) | ||
i = idx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compiler can't optimize this (a do block closure). Also hard to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough, I'll change it to explicitly take the next iteration and compute nextind.
base/strings/util.jl
Outdated
j = prevind(b,j) | ||
end | ||
j < b1 | ||
a, b = Iterators.Stateful.(Iterators.reverse.((a,b))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal isn't to write exceptionally clever code - this doesn't need to use broadcast
, it can just be explicit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal isn't to write exceptionally clever code
but if it were, I feel like I'd be winning with this PR. I'll change it back.
base/essentials.jl
Outdated
@@ -765,3 +765,6 @@ Indicate whether `x` is [`missing`](@ref). | |||
""" | |||
ismissing(::Any) = false | |||
ismissing(::Missing) = true | |||
|
|||
function take! end | |||
function peek end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
editor is forgetting to put trailing newlines?
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
ed5c124
to
cd78ebd
Compare
cd78ebd
to
1cfe250
Compare
Rebased to the latest version of the |
@nanosoldier |
base/iterators.jl
Outdated
fixpoint_iter_type(itrT, | ||
fieldtype(nextvalstate, 1), | ||
fieldtype(nextvalstate, 2))) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😮
@@ -441,6 +442,31 @@ julia> collect(Iterators.rest([1,2,3,4], 2)) | |||
""" | |||
rest(itr,state) = Rest(itr,state) | |||
|
|||
""" | |||
peel(iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cute name, but I wonder if this shouldn't be called headtail
or something more explicit like that. Is "peel" a traditional name for this kind of iterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://en.wikipedia.org/wiki/Loop_splitting#Loop_peeling standard among compiler people I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe call it henchmen
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, but that term has always meant because of type stability for me, while this is just a generic iteration peel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was joking, of course. But I still think headtail
might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's not exported though so it doesn't matter that much at this point.
1cfe250
to
5cc5041
Compare
@nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
Many cases can now be handled by the new `Stateful` iteration wrapper, other can handled by `pairs(SubString)`. The latter could use some specialized code to avoid decoding UTF-8 twice, but that's not done in this PR.
5cc5041
to
43ce899
Compare
I've rebased this on top of the tuning I did on master. Hopefully that should limit the performance regressions here. I'm planning to merge this even if there are minor regressions, since it's a prerequisite for the iteration stuff, which is on the 1.0 critical path. The only remaining issue with the stateful iterator I know of is that is sometimes doesn't get allocation elided. That's a decently straightforward optimization that we should be able to do at any time. @nanosoldier |
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan |
The 15% mem alloc regression is real and due to the fact that the Stateful iterator doesn't get elided properly in that case. Either fixing the getfield elim pass or stack allocating values with references would be able to fix that. I'll work on that after 0.7-alpha, but for now I'm gonna merge this to make progress on iterate. |
As we've shifted to inferring less aggressively, julia> code_typed(Iterators.Stateful, (AbstractString,))
1-element Vector{Any}:
CodeInfo(
1 ─ %1 = invoke Base.Iterators.approx_iter_type($(Expr(:static_parameter, 1))::Type)::Type
│ %2 = Core.apply_type(Base.Iterators.Stateful, $(Expr(:static_parameter, 1)), %1)::Type{Base.Iterators.Stateful{_A,_B}} where _B where _A
│ %3 = Base.convert($(Expr(:static_parameter, 1)), itr)::Any
│ %4 = Core.fieldtype(%2, 2)::Type{var"#s439"} where var"#s439"<:(Union{Nothing, _B} where _B)
│ %5 = Base.Iterators.iterate(itr)::Any
│ Core.typeassert(%5, %1)::Any
│ %7 = Base.convert(%4, %5)::Any
│ %8 = %new(%2, %3, %7, 0)::Base.Iterators.Stateful{_A,_B} where _B where _A
└── return %8
) => Base.Iterators.Stateful{_A,_B} where _B where _A xref #36454. Given the changing landscape, should we reconsider the balance here? |
Builds on top of #25731 to remove a couple of explicit uses of the iteration protocol from the string code. Currently upsets codegen, but I figured I'd put it up early anyway to show usage examples for #25731.