Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose findfirst findnext for UInt8 vector #37283

Merged
merged 27 commits into from
Oct 26, 2020

Conversation

Moelf
Copy link
Contributor

@Moelf Moelf commented Aug 30, 2020

fix #37280

@Moelf Moelf force-pushed the findfirst_searchindex branch from 3cd123d to 847d58b Compare August 30, 2020 02:18
@KristofferC
Copy link
Member

Should these be moved away from the string file now?

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This makes sense. I have a few remarks:

  • It sounds like this method should eventually work for any AbstractVector. The fact that only UInt8/Int8 are supported is due to an implementation detail. It's probably OK to start with a restricted method though, and generalize it later.
  • We should measure the consequences of this addition for the API if we want to generalize this. With this PR, findfirst(pattern, collection) will mean "look for the sequence pattern in collection". Currently, we have findfirst(pred::Function, A), findfirst(ch::AbstractChar, string::AbstractString) and findfirst(pattern::AbstractString, string::AbstractString). The methods added by this PR are consistent with the third method, i.e. the two arguments collections, and we are looking for a sequence. Do note that it is at odds with the second one, where the first argument is a single element. This is OK for strings as there's no ambiguity, but in general with findfirst(x::AbstractArray, A::AbstractArray) you cannot know whether x is supposed to be a sequence or an element since A could be an AbstractArray{<:AbstractArray}. So if we go with this PR, we will renounce to the possibility to treat x as a single element, e.g. we'll have to keep writing findfirst(==(1), [1, 2]). I think that's OK since the syntax is relatively short, but let's be aware of this decision.
  • Finally, findlast and findprev should probably be implemented in this PR for consistency (I think _rsearch allows doing this).

base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
test/strings/search.jl Outdated Show resolved Hide resolved
test/strings/search.jl Outdated Show resolved Hide resolved
test/strings/search.jl Outdated Show resolved Hide resolved
@nalimilan nalimilan added the search & find The find* family of functions label Aug 30, 2020
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
@stevengj
Copy link
Member

So if we go with this PR, we will renounce to the possibility to treat x as a single element

Not as long as this is restricted to a small set of element types. I agree that this will come up in a hypothetical future PR where we generalize this to find(pattern::AbstractArray, a::AbstractArray), however, but we could discuss that question in a future issue.

@Moelf Moelf force-pushed the findfirst_searchindex branch 2 times, most recently from 9fe84a2 to 6439107 Compare August 30, 2020 15:39
@Moelf
Copy link
Contributor Author

Moelf commented Aug 30, 2020

As suggested by @stevengj , shrinking original _nthbyte and then defined a general version for AbstractVector to minimize the changes for _searchindex. Added tests for OffsetArray. @KristofferC would you suggest a proper place for these code? Since it's no longer just String, I'm thinking base/array.jl maybe

base/strings/search.jl Outdated Show resolved Hide resolved
@Moelf
Copy link
Contributor Author

Moelf commented Aug 30, 2020

@nalimilan I have added findlast and findprev and their tests now, thanks for the comment. Yes, I was surprised when I first encountered that, for example, findfirst(3, [1,2,3]) doesn't work. But I think it is fine since ==(x) is a nice and clear way of saying "treat x as a single element".

@Moelf Moelf force-pushed the findfirst_searchindex branch from 300c4df to f073830 Compare August 30, 2020 20:06
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
@Moelf
Copy link
Contributor Author

Moelf commented Aug 30, 2020

Took the suggested approach by @stevengj : shifting _i by i = Int(_i) - (firstindex(s) - 1) inside _searchindex and _rsearchindex to keep most of the logic unchanged. In any case please let me know if more edge case tests are needed.

@Moelf Moelf force-pushed the findfirst_searchindex branch from 032b93c to 2d4b684 Compare August 30, 2020 22:49
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
base/strings/search.jl Outdated Show resolved Hide resolved
@Moelf Moelf force-pushed the findfirst_searchindex branch from d303612 to ff679b0 Compare September 13, 2020 22:54
@Moelf
Copy link
Contributor Author

Moelf commented Sep 27, 2020

I believe this can be merged now (before another news merge conflict occur)

@Moelf Moelf requested a review from stevengj October 21, 2020 20:14
shorten NEWS
@stevengj stevengj merged commit 5d8225a into JuliaLang:master Oct 26, 2020
@stevengj
Copy link
Member

Thanks!

@vtjnash
Copy link
Member

vtjnash commented Oct 27, 2020

When squashing and merging these commits, please try to fix up the commit message to contain only the relevant information. This one just had a rather long list of repeated:

    * Update base/strings/search.jl
    
    Co-authored-by: Milan Bouchet-Valat <[email protected]>

and it makes git log a slight bit harder to scroll past.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
search & find The find* family of functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

expose findfirst(pattern::AbstractVector{T}, A::AbstractVector{T}) where {T<:Union{Int8,UInt8}}
5 participants