-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace function should support multiple replacements for strings #29849
Comments
If
I think that's preferable to starting to support various data structures to describe replacements. |
ref #25732 |
Ah, I had forgotten that #25732 never got merged. |
That's simply work : Base.replace(s::String, oldnews::Pair...) = foldl(replace, oldnews, init=s)
using Test
@test Base.replace("a & b > c", "&"=>"&",">"=>">") == "a & b > c" PS: edited to use |
There seems to be a couple of closed PR (#25732, #35414) about this, the main objection being that it's tricky to implement in a performant way. This is very frustrating for people like me who really don't care about performance in strings but like easy convenience functions to do their scripting, and for whom this really feels like an API oversight. Is it really that bad to have an unoptimal implementation in Base? People serious about string performance can use a profiler and figure out that's the bottleneck, and work around it if needed. |
I agree that it would be good to have this as an API even if the implementation isn't the fastest possible and then once that exists, people can use it and if we improve the performance their code gets faster automatically. Seems like a fine situation here. The main question is just the semantics. Presumably |
A simple use cause is DNA to RNA Transcription where codes are replaced RNAT3(s)=replace(replace(replace(replace(s,"G" => "C"),"C" => "G"),"T" => "A"),"A" => "U") would become RNAT4=replace(s, "G"=>"C", "C"=>"G", "T"=>"A", "A"=>"U") |
Is this implemented yet? It doesn't quite seem to work with my use case. function _latex_escape_string(str::AbstractString)
subs = Dict{AbstractString, AbstractString}(
"&" => "\\&",
" - " => "---",
"_" => "\\_",
"#" => "\\#"
)
return replace(lstrip(str), subs...)
end ERROR: LoadError: MethodError: no method matching similar(::SubString{String}, ::Type{Any}) |
It is not — the issue would be closed if it was implemented. |
@StefanKarpinski good point, my bad. FWIW, +1 regarding @antoine-levitt's comment. |
This topic came up in the Julia slack recently, and I just wanted to add my crack at an implementation in case it helps someone think about the problem: function replace(x::AbstractString, old_new::Pair...; count::Union{Int, Nothing} = nothing)
count isa Nothing && (count = Inf)
# find all matches of "old" within x; Array of "new" => replacement range
locs = vcat([new .=> findall(old, x) for (old, new) = old_new]...)
# sort replacement locations by start and length
locs = sort(locs, by = i -> (first(i.second), -last(i.second)))
# reduce replacement locations, only applies replacement if the match
# - is found in the original string "x"
# - starts before any other overlapping matches
# - is the longest of any other overlapping matches
# - is matched by the first positional argument of all identical matches
str, n = reduce(locs; init = ("", 1)) do (str, n), (new, loc)
if count > 0 && first(loc) >= n
count -= 1
(str * x[n:first(loc)-1] * new, last(loc) + 1)
else
(str, n)
end
end
return convert(typeof(x), str * x[n:end])
end What I like about this implementation is that it only applies replacements found in the original string, not intermediate strings as replacements are sequentially applied. In my opinion, applying replacements to these intermediate results is best done via reduce, where it's clear that they're being applied sequentially. Restricting the behavior to applying only to the original string serves a unique purpose that isn't just a shorthand for a reduce call. |
Since this has come up a few times already on both Slack and Discourse, I want to add that I think both behaviors (only replacing on the original string and replacing on intermediates) make intuitive sense and seem equal in usefulness to me. Favoring one over the other seems like a weird choice to me. Another point that should be decided is what to do when two replacements match at the same position in the input string - which should prevail? E.g. However, if the API were restricted to |
Looking back at one of the issues that started this, I would like to point out that we have already defined an API here for non-strings: |
The existing API for arrays makes sense though, since each replacement only modifies one index, not multiple as is the case for strings. There, replacements of the form As mentioned above, limiting it to |
I think that the limited version of |
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. #25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes #35327 fixes #39061 fixes #35414 fixes #29849 fixes #30457 fixes #25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
Hello,
It will be nice if
could be done instead of
unfortunately
replace
function currently doesn't supportAbstractDict
Kind regards
The text was updated successfully, but these errors were encountered: