Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Deprecate ismatch(r, s) in favor of contains(s, r)" #26211

Closed
wants to merge 1 commit into from

Conversation

fredrikekre
Copy link
Member

This reverts commit 6ca43fc.

Fixes(?) #25584

Every time I have changed code from ismatch(r, s) to contains(s, r) it feels a bit odd.

@KristofferC KristofferC added the triage This should be discussed on a triage call label Feb 26, 2018
@nalimilan
Copy link
Member

I'm not strongly opposed, but I think this needs more arguments than "I don't like it". ;-)

In particular, why do we need a special function for regular expressions here, but in find* functions we allow passing either a regex, a character or a string? Maybe contains should be renamed instead?

@StefanKarpinski
Copy link
Member

It's kind of semantically weird to say that a string "contains" a pattern. It contains a substring which matches that pattern, but it doesn't contain the pattern. I agree that this feels really awkward every time I use it, both for the linguistics of it and for the argument order which is annoying too.

@quinnj
Copy link
Member

quinnj commented Feb 26, 2018

a smattering of what other languages do for the contains and ismatch functions

contains

c++
s1.find(s2)

c#
s1.contains(s2)

clojure
(. s1 contains s2)

coffeescript
matchAt s1, s2, i

D
s1.find(s2)

Erlang
contains(s1, s2)

F#
s1.contains(s2)

java
s1.contains(s2)

javascript
s1.indexOf(s2)

kotlin
s1.indexOf(s2)

lua
string.find(s1, s2)

python
s2 in s1

ruby
s1.include?(s2)
s1[s2]
s1.index(s2)

rust
s1.find(s2)

scala
s1.contains(s2)

swift
s1.containsString(s2)

ismatch

c++
regex_search(str, re)

C#
re.IsMatch(str)

clojure
re-find re str

D
str.match(re)

Erlang
re:run(str, re)

go
regexp.MatchString(re, str)

java
str.matches(re)

javascript
re.test(str)

lua
str:match(re)

nim
str.find(re)

perl 6
str ~~ /string$/

python
re.search(re, str)

swift
str.rangeOfString(re)

@quinnj
Copy link
Member

quinnj commented Feb 26, 2018

and having looked over all those, what if we had:

find(str1::String, str2::String)
find(str::String, re::Regex)

?

@nalimilan
Copy link
Member

We already have findfirst(re::Regex, str::String) and a similar findnext method. contains/ismatch is really just a convenience wrapper around that.

@ararslan
Copy link
Member

Our find function doesn't return a Bool, so we'd have to change calls to something like findfirst(re, str) !== nothing, which is a nontrivial loss of convenience. I think we should just reinstate ismatch.

@quinnj
Copy link
Member

quinnj commented Feb 26, 2018

if I understand all the find* business correctly, we actually have no methods of find anymore, so they're actually available for this. But I understand that it may be confusing given all the find* revamping.

@JeffBezanson
Copy link
Member

Whaa? Y'all are fickle.

I still think contains is a reasonable name for this. I also still think, as discussed when this was changed, that ismatch sounds more like the whole string has to match.

Another possible name is hasmatch, which clarifies that we're checking whether a match exists, not whether the string is a match for the pattern.

@oscardssmith
Copy link
Member

what about matches?

@Sacha0
Copy link
Member

Sacha0 commented Feb 26, 2018

I still think contains is a reasonable name for this.

Likewise. I read the contains incantation as "this string contains instances of this pattern", which seems sound. Best!

@ararslan
Copy link
Member

I like hasmatch.

@JeffBezanson
Copy link
Member

I also think there is a genericness argument here. Should you have to change function name just because you've expanded what you're searching for from a single string to a regex?

@StefanKarpinski
Copy link
Member

Nobody's arguing for taking away the ability to write contains(string, regex) in generic code. The desire is for a verb that's less awkward for code that is specifically matching regular expressions.

@Keno
Copy link
Member

Keno commented Mar 1, 2018

Suggestion from triage. Since findfirst already works for the regex, we could have a generic:

found(needle, haystack) = findfirst(needle, haystack) !== nothing

predicate, that would then work for regexes.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 1, 2018

That would work for this case nicely: if found(needle, haystack) .... Reads well, makes sense, generic, matches the new find nomenclature.

@ararslan
Copy link
Member

ararslan commented Mar 1, 2018

Should probably be isfound for consistency with other predicates, which begin with is*.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 1, 2018

It's occurred to me that contains is kind of problematic. We currently only officially define it for strings. This implies that it does contiguous subsequence matching. However, we don't define it for arrays and their subsequences. Moreover, the word "contains" suggests that contains(Set([1,3,2,4]), Set([2,3])) should be true since the first set contains the second set. So there's an ambiguity baked into the very term "contains": does it mean subset containment or subsequence containment? So I propose that we nip this in the bud and use the existing term issubset for unordered containment and introduce issubseq for ordered containment. In this scheme, this would become issubseq(needle, haystack).

Slight wrinkle: a mathematical subsequence is generally not required to be contiguous. Do we care?

@ararslan
Copy link
Member

ararslan commented Mar 1, 2018

Slight wrinkle: a mathematical subsequence is generally not required to be contiguous. Do we care?

I'd argue no; while more often than not we're precise in our mathematical details and terminology, I think in this case it's not really problematic, as this isn't really a mathematical function. We can simply document that the given subsequence should be contiguous.

@Keno
Copy link
Member

Keno commented Mar 1, 2018

Writing it down here, because it came up during triage, but probably not directly related. If we wanted to be clearer in separating out searching for subsequences, we could have a sequences function that turns an indexable container into a dictionary-like view of all of its subsequences, such that:

julia> findfirst([1,2], sequences([2,1,2]))
2:3

would be consistent.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 1, 2018

General triage consensus is that we should go with isfound(needle, haystack) as a general predicate for findfirst(needle, haystack) == nothing with performance optimized implementation for a regex needle. It was also noticed that Regex is callable which should be deprecated. The contains verb should be deprecated in favor of isfound.

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Mar 1, 2018
@fredrikekre fredrikekre deleted the fe/ismatch branch May 5, 2018 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants