replace function should support multiple replacements for strings #29849

s-celles · 2018-10-30T10:20:57Z

Hello,

It will be nice if

replace("The quick fox is beautiful.", Dict("quick" => "slow", "fox" => "turtle"))

could be done instead of

replace(replace("The quick fox is beautiful.", "quick" => "slow"), "fox" => "turtle")

unfortunately replace function currently doesn't support AbstractDict

Kind regards

The text was updated successfully, but these errors were encountered:

StefanKarpinski · 2018-10-30T14:24:55Z

If replace supported multiple replacement pairs (I thought it already did, but it doesn't seem to; it only supports a single replacement pair), then this would just work:

replace("The quick fox is beautiful.", Dict("quick" => "slow", "fox" => "turtle")...)

I think that's preferable to starting to support various data structures to describe replacements.

fredrikekre · 2018-10-30T14:40:43Z

If replace supported multiple replacement pairs (I thought it already did, but it doesn't seem to; it only supports a single replacement pair),

ref #25732

StefanKarpinski · 2018-10-30T14:58:58Z

Ah, I had forgotten that #25732 never got merged.

o314 · 2018-12-22T01:38:36Z

That's simply work :

Base.replace(s::String, oldnews::Pair...) = foldl(replace, oldnews, init=s)

using Test
@test Base.replace("a & b > c", "&"=>"&amp;",">"=>"&gt;") == "a &amp; b &gt; c"

PS: edited to use foldl instead of reduce

antoine-levitt · 2020-05-26T11:51:03Z

There seems to be a couple of closed PR (#25732, #35414) about this, the main objection being that it's tricky to implement in a performant way. This is very frustrating for people like me who really don't care about performance in strings but like easy convenience functions to do their scripting, and for whom this really feels like an API oversight. Is it really that bad to have an unoptimal implementation in Base? People serious about string performance can use a profiler and figure out that's the bottleneck, and work around it if needed.

StefanKarpinski · 2020-05-26T18:41:58Z

I agree that it would be good to have this as an API even if the implementation isn't the fastest possible and then once that exists, people can use it and if we improve the performance their code gets faster automatically. Seems like a fine situation here. The main question is just the semantics. Presumably replace(thing, a =>b, c => d) should be equivalent to doing replace(replace(thing, a => b), c => d) but ideally only doing a single pass.

vcanaran · 2020-08-02T04:36:04Z

A simple use cause is DNA to RNA Transcription where codes are replaced

RNAT3(s)=replace(replace(replace(replace(s,"G" => "C"),"C" => "G"),"T" => "A"),"A" => "U")

would become

RNAT4=replace(s, "G"=>"C", "C"=>"G", "T"=>"A", "A"=>"U")

jakewilliami · 2020-12-01T11:05:06Z

Is this implemented yet? It doesn't quite seem to work with my use case.

function _latex_escape_string(str::AbstractString)
	subs = Dict{AbstractString, AbstractString}(
		"&" => "\\&",
		" - " => "---",
		"_" => "\\_",
		"#" => "\\#"
	)
	
        return  replace(lstrip(str), subs...)
end

ERROR: LoadError: MethodError: no method matching similar(::SubString{String}, ::Type{Any})

StefanKarpinski · 2020-12-03T17:09:53Z

It is not — the issue would be closed if it was implemented.

jakewilliami · 2020-12-05T15:38:10Z

@StefanKarpinski good point, my bad. FWIW, +1 regarding @antoine-levitt's comment.

dgkf · 2020-12-19T20:54:10Z

This topic came up in the Julia slack recently, and I just wanted to add my crack at an implementation in case it helps someone think about the problem:

function replace(x::AbstractString, old_new::Pair...; count::Union{Int, Nothing} = nothing)
    count isa Nothing && (count = Inf)

    # find all matches of "old" within x; Array of "new" => replacement range
    locs = vcat([new .=> findall(old, x) for (old, new) = old_new]...)

    # sort replacement locations by start and length
    locs = sort(locs, by = i -> (first(i.second), -last(i.second)))
    
    # reduce replacement locations, only applies replacement if the match
    #   - is found in the original string "x"
    #   - starts before any other overlapping matches
    #   - is the longest of any other overlapping matches
    #   - is matched by the first positional argument of all identical matches
    str, n = reduce(locs; init = ("", 1)) do (str, n), (new, loc)
        if count > 0 && first(loc) >= n
            count -= 1
            (str * x[n:first(loc)-1] * new, last(loc) + 1)
        else
            (str, n)
        end
    end

    return convert(typeof(x), str * x[n:end])
end

What I like about this implementation is that it only applies replacements found in the original string, not intermediate strings as replacements are sequentially applied. In my opinion, applying replacements to these intermediate results is best done via reduce, where it's clear that they're being applied sequentially. Restricting the behavior to applying only to the original string serves a unique purpose that isn't just a shorthand for a reduce call.

Seelengrab · 2020-12-21T18:24:07Z

Since this has come up a few times already on both Slack and Discourse, I want to add that I think both behaviors (only replacing on the original string and replacing on intermediates) make intuitive sense and seem equal in usefulness to me. Favoring one over the other seems like a weird choice to me.

Another point that should be decided is what to do when two replacements match at the same position in the input string - which should prevail? E.g. replace("aabbaa", "aabb" => "cc", "aa" => "zz") could result in either ccaa or zzbbaa. The result is implementation dependent, which immediately leaks the abstraction of the loop and would break with a potential docstring of "replace each occurence of the first element of each pair with the second element of the same pair".

However, if the API were restricted to replace(s::AbstractString, Vararg{Pair{Char,Char}}), no such conflict could occur, as Char can't span multiple indices of Chars (just as it works in the multi-arg replace on arrays, which also only ever works on one index at a time).

oscardssmith · 2020-12-31T23:42:13Z

Looking back at one of the issues that started this, I would like to point out that we have already defined an API here for non-strings: replace([1, 2, 1, 3], 1=>0, 0=>4)===[0, 2, 0, 3]. As such, I think there is only 1 plausible api and that we should implement
it since we are getting about 1 new issue per month on this topic.

Seelengrab · 2021-01-01T00:18:59Z

The existing API for arrays makes sense though, since each replacement only modifies one index, not multiple as is the case for strings. There, replacements of the form "aa" => 'B' would span over multiple indices.

As mentioned above, limiting it to Char => Char replacements would keep the same behaviour, though it'd probably lead to "why can't I replace substrings"-style questions 🤷‍♂️

oscardssmith · 2021-01-01T02:32:21Z

I think that the limited version of Char => Char is unambiguous so we should do that now. We'd still get questions, but the questions we'd get would be about the non-obvious case which is an improvement.

This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396

This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. #25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes #35327 fixes #39061 fixes #35414 fixes #29849 fixes #30457 fixes #25396

This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396

s-celles changed the title ~~replace function for strings should also support AbstractDict as replacement parameter~~ replace function should support multiple replacement for strings Oct 30, 2018

s-celles changed the title ~~replace function should support multiple replacement for strings~~ replace function should support multiple replacements for strings Oct 30, 2018

diegozea mentioned this issue Nov 27, 2018

Inconsistent behaviour for replace #28967

Closed

o314 mentioned this issue Oct 7, 2019

Allow multiple pair arguments in replace on strings. #30457

Closed

mattBrzezinski mentioned this issue Jul 31, 2020

Handle hyphens in URI parameters JuliaCloud/AWS.jl#142

Merged

oscardssmith added the strings "Strings!" label Dec 31, 2020

JeffBezanson closed this as completed in 70771b2 Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace function should support multiple replacements for strings #29849

replace function should support multiple replacements for strings #29849

s-celles commented Oct 30, 2018

StefanKarpinski commented Oct 30, 2018 •

edited

Loading

fredrikekre commented Oct 30, 2018

StefanKarpinski commented Oct 30, 2018

o314 commented Dec 22, 2018 •

edited

Loading

antoine-levitt commented May 26, 2020

StefanKarpinski commented May 26, 2020 •

edited

Loading

vcanaran commented Aug 2, 2020

jakewilliami commented Dec 1, 2020 •

edited

Loading

StefanKarpinski commented Dec 3, 2020

jakewilliami commented Dec 5, 2020

dgkf commented Dec 19, 2020

Seelengrab commented Dec 21, 2020 •

edited

Loading

oscardssmith commented Dec 31, 2020

Seelengrab commented Jan 1, 2021 •

edited

Loading

oscardssmith commented Jan 1, 2021

replace function should support multiple replacements for strings #29849

replace function should support multiple replacements for strings #29849

Comments

s-celles commented Oct 30, 2018

StefanKarpinski commented Oct 30, 2018 • edited Loading

fredrikekre commented Oct 30, 2018

StefanKarpinski commented Oct 30, 2018

o314 commented Dec 22, 2018 • edited Loading

antoine-levitt commented May 26, 2020

StefanKarpinski commented May 26, 2020 • edited Loading

vcanaran commented Aug 2, 2020

jakewilliami commented Dec 1, 2020 • edited Loading

StefanKarpinski commented Dec 3, 2020

jakewilliami commented Dec 5, 2020

dgkf commented Dec 19, 2020

Seelengrab commented Dec 21, 2020 • edited Loading

oscardssmith commented Dec 31, 2020

Seelengrab commented Jan 1, 2021 • edited Loading

oscardssmith commented Jan 1, 2021

StefanKarpinski commented Oct 30, 2018 •

edited

Loading

o314 commented Dec 22, 2018 •

edited

Loading

StefanKarpinski commented May 26, 2020 •

edited

Loading

jakewilliami commented Dec 1, 2020 •

edited

Loading

Seelengrab commented Dec 21, 2020 •

edited

Loading

Seelengrab commented Jan 1, 2021 •

edited

Loading