Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Stateful in String code #25736

Merged
merged 1 commit into from
Jan 30, 2018
Merged

Use Stateful in String code #25736

merged 1 commit into from
Jan 30, 2018

Conversation

Keno
Copy link
Member

@Keno Keno commented Jan 24, 2018

Builds on top of #25731 to remove a couple of explicit uses of the iteration protocol from the string code. Currently upsets codegen, but I figured I'd put it up early anyway to show usage examples for #25731.

@Keno Keno requested a review from StefanKarpinski January 24, 2018 21:08
@Keno Keno added iteration Involves iteration or the iteration protocol strings "Strings!" labels Jan 24, 2018
Copy link
Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, this is certainly a showcase for the expressive power of this iterator. Damn! The combination of zip with all to write generic string matching code is really impressive.

7
````
"""
struct Stateful{VS,T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make more sense for T to come before VS? Or are there cases where you want to dispatch on the iterator state type but not the iterator type itself?

Stateful{VS, T}(itr, Ref{Union{VS, Nothing}}(vs), Ref{Int}(0))
end

function Stateful{VS}(itr::T) where {VS,T}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I guess this is where having VS first is handy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I mentioned in the Stateful PR, the iterator assumes that the type of the val/state tuple won't change from the first one. You might want to use Stateful{Any} if you can't guarantee that.

c, i = next(a, i)
d, j = next(b, j)
a, b = Iterators.Stateful(a), Iterators.Stateful(b)
for (c, d) in zip(a, b)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooo, this is really nice.

c ≠ d && return ifelse(c < d, -1, 1)
end
return ifelse(done(b, j), 0, -1)
isempty(a) && return ifelse(isempty(b), 0, 1)
return -1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe write this as isempty(b) - isempty(a)?

if c in esc
print(io, '\\', c)
elseif isascii(c)
c == '\0' ? print(io, escape_nul(s,j)) :
c == '\0' ? print(io, escape_nul(peek(a))) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beware: implementation of peek is very spotty across different IO types. We should probably fix that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this only calls, peek on Stateful though, not on the IO object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. That's fine then.

return i
a = Iterators.Stateful(trest)
matched = all(zip(SubString(s, ii), a)) do (c,d)
c == d
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😻 I would define eq((a,b)) = a == b and then write this:

matched = all(eq, zip(SubString(s, ii), a))

c, i = next(a,i)
d, j = next(b,j)
a, b = Iterators.Stateful(a), Iterators.Stateful(b)
for (c, d) in zip(a, b)
(c != d) && (return false)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then

all(eq, zip(a, b)) && isempty(b)

(c != d) && (return false)
i = prevind(a,i)
j = prevind(b,j)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And

all(eq, zip(a, b)) && isempty(b)

@Keno Keno force-pushed the kf/statefulstring branch 3 times, most recently from d1f48aa to ed5c124 Compare January 26, 2018 20:26
@Keno
Copy link
Member Author

Keno commented Jan 26, 2018

The codegen issues were #25752. This passes tests for me locally now. Need to finish up the Stateful PR and make sure performance doesn't regress here, but then I think this is good to go.

@Keno
Copy link
Member Author

Keno commented Jan 26, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@@ -106,12 +106,10 @@ function findnext(testf::Function, s::AbstractString, i::Integer)
z = ncodeunits(s) + 1
1 ≤ i ≤ z || throw(BoundsError(s, i))
@inbounds i == z || isvalid(s, i) || string_index_err(s, i)
while !done(s,i)
d, j = next(s,i)
for (j, d) in pairs(s[i:end])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enumerate(SubString(s, i))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the same thing. This needs string indices not ordinals.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use a SubString though to avoid copying, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that it did (because it does for AbstractString), but turns out I'm wrong for String, so yeah, needs to be SubString.

Copy link
Member

@vtjnash vtjnash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks useful. A few minor fixes. Remove WIP from title?

end
ii = nextind(s, i)
a = Iterators.Stateful(trest)
matched = all(splat(==), zip(s[ii:end], a))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubString(z, ii)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing, indexing into a string gives a SubString

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I'm wrong about this.

end
a = Iterators.Stateful(trest)
ii = prevind(s, i)
matched = all(zip(Iterators.reverse(pairs(s[1:ii])), a)) do ((idx, c), d)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SubString

a = Iterators.Stateful(trest)
ii = prevind(s, i)
matched = all(zip(Iterators.reverse(pairs(s[1:ii])), a)) do ((idx, c), d)
i = idx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler can't optimize this (a do block closure). Also hard to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough, I'll change it to explicitly take the next iteration and compute nextind.

j = prevind(b,j)
end
j < b1
a, b = Iterators.Stateful.(Iterators.reverse.((a,b)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal isn't to write exceptionally clever code - this doesn't need to use broadcast, it can just be explicit

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal isn't to write exceptionally clever code

but if it were, I feel like I'd be winning with this PR. I'll change it back.

@@ -765,3 +765,6 @@ Indicate whether `x` is [`missing`](@ref).
"""
ismissing(::Any) = false
ismissing(::Missing) = true

function take! end
function peek end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

editor is forgetting to put trailing newlines?

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno Keno force-pushed the kf/statefulstring branch from ed5c124 to cd78ebd Compare January 27, 2018 20:25
@Keno Keno changed the title WIP: Use Stateful in String code Use Stateful in String code Jan 27, 2018
@Keno Keno force-pushed the kf/statefulstring branch from cd78ebd to 1cfe250 Compare January 27, 2018 20:59
@Keno
Copy link
Member Author

Keno commented Jan 27, 2018

Rebased to the latest version of the Stateful PR. Let's see

@Keno
Copy link
Member Author

Keno commented Jan 27, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

fixpoint_iter_type(itrT,
fieldtype(nextvalstate, 1),
fieldtype(nextvalstate, 2)))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😮

@@ -441,6 +442,31 @@ julia> collect(Iterators.rest([1,2,3,4], 2))
"""
rest(itr,state) = Rest(itr,state)

"""
peel(iter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cute name, but I wonder if this shouldn't be called headtail or something more explicit like that. Is "peel" a traditional name for this kind of iterator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Loop_splitting#Loop_peeling standard among compiler people I guess?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe call it henchmen then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but that term has always meant because of type stability for me, while this is just a generic iteration peel.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was joking, of course. But I still think headtail might be better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's not exported though so it doesn't matter that much at this point.

@Keno Keno force-pushed the kf/statefulstring branch from 1cfe250 to 5cc5041 Compare January 27, 2018 21:52
@Keno
Copy link
Member Author

Keno commented Jan 27, 2018

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno Keno mentioned this pull request Jan 29, 2018
Many cases can now be handled by the new `Stateful` iteration wrapper,
other can handled by `pairs(SubString)`. The latter could use some
specialized code to avoid decoding UTF-8 twice, but that's not done
in this PR.
@Keno Keno force-pushed the kf/statefulstring branch from 5cc5041 to 43ce899 Compare January 30, 2018 15:49
@Keno
Copy link
Member Author

Keno commented Jan 30, 2018

I've rebased this on top of the tuning I did on master. Hopefully that should limit the performance regressions here. I'm planning to merge this even if there are minor regressions, since it's a prerequisite for the iteration stuff, which is on the 1.0 critical path. The only remaining issue with the stateful iterator I know of is that is sometimes doesn't get allocation elided. That's a decently straightforward optimization that we should be able to do at any time.

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @ararslan

@Keno
Copy link
Member Author

Keno commented Jan 30, 2018

The 15% mem alloc regression is real and due to the fact that the Stateful iterator doesn't get elided properly in that case. Either fixing the getfield elim pass or stack allocating values with references would be able to fix that. I'll work on that after 0.7-alpha, but for now I'm gonna merge this to make progress on iterate.

@Keno Keno merged commit f11d820 into master Jan 30, 2018
@ararslan ararslan deleted the kf/statefulstring branch January 30, 2018 21:26
@timholy
Copy link
Member

timholy commented Aug 17, 2020

As we've shifted to inferring less aggressively, Stateful is starting to turn up as a frequent trouble spot due to

julia> code_typed(Iterators.Stateful, (AbstractString,))
1-element Vector{Any}:
 CodeInfo(
1%1 = invoke Base.Iterators.approx_iter_type($(Expr(:static_parameter, 1))::Type)::Type%2 = Core.apply_type(Base.Iterators.Stateful, $(Expr(:static_parameter, 1)), %1)::Type{Base.Iterators.Stateful{_A,_B}} where _B where _A
│   %3 = Base.convert($(Expr(:static_parameter, 1)), itr)::Any%4 = Core.fieldtype(%2, 2)::Type{var"#s439"} where var"#s439"<:(Union{Nothing, _B} where _B)%5 = Base.Iterators.iterate(itr)::Any
│        Core.typeassert(%5, %1)::Any%7 = Base.convert(%4, %5)::Any%8 = %new(%2, %3, %7, 0)::Base.Iterators.Stateful{_A,_B} where _B where _A
└──      return %8
) => Base.Iterators.Stateful{_A,_B} where _B where _A

xref #36454. Given the changing landscape, should we reconsider the balance here? cmp seems to be the biggest trouble-spot, so if we do decide to make a change just going back to the raw iteration protocol there might be all that's strictly necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iteration Involves iteration or the iteration protocol strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants