Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacer impl for FnMut impossible lifetime requirements #601

Closed
kornelski opened this issue Jul 19, 2019 · 10 comments
Closed

Replacer impl for FnMut impossible lifetime requirements #601

kornelski opened this issue Jul 19, 2019 · 10 comments
Labels

Comments

@kornelski
Copy link

Continuing the thread from the forum. Rust gives a warning that this is potentially UB:

let r = Regex::new("aaa")?;
r.replace_all("aaa", |c: &Captures<'_> | {
    c.get(0).unwrap().as_str()
});

In the Replacer trait, there's no place to define that the returned T: AsRef<str> lives at least as long as '_ in Captures<'_>.

This becomes more evident if you try to use explicit &str as the return type. There's just no way to compile it with 'a: 't or similar requirement:

impl<'a, F> Replacer for F where F: for<'t> FnMut(&Captures<'t>) -> &'a str {
    fn replace_append<'t>(&mut self, caps: &Captures<'t>, dst: &mut String) {
        dst.push_str((*self)(caps).as_ref());
    }
}

As far as I understand, the difficulty here is that the lifetime in question is inside of function arguments in the trait, but needs to be named outside of it in the closure type. There's just no syntax for such thing.

Ability to return some of the captures for replacement seems quite useful, but having a scary warning for this, and potentially UB or fatal error in the future, is not good.

  • Is there some clever syntax/generics/lifetimes solution that would make it work?

  • Can closures be made to work some other way, via wrapper type, or another function call?

  • Should Rust add some syntax, or make the borrow checker more clever, to make the existing solution acceptable?

@RustyYato
Copy link

I don't think this is UB in replace_all, but with using the closure. I think lifetime inference was wrong in previous versions of the compiler, leading it to infer the wrong lifetimes for the closure. But using normal functions this code fails to compile regardless of how the lifetimes are plugged together.

This is because of the bound FnMut(&Captures) -> T on the blanket impl for Replacer. This blanket impl states that T cannot be derived from Captures in any way. But previous versions of Rust erroneously allows this to happen, now with nll it is able to detect this bug an emit a warning.

So all in all, this is not a bug in regex, rather it is a bug in your usage of replace_all.

Note how both of the following do not compile.
playground

use regex::{Regex, Captures};

fn main() {
    fn foo<'a>(c: &Captures<'a>) -> &'a str {
        c.get(0).unwrap().as_str()
    }

    fn bar<'a>(c: &'a Captures<'_>) -> &'a str {
        c.get(0).unwrap().as_str()
    }
    let r = Regex::new("aaa").unwrap();
    
    // r.replace_all("aaa", foo); // does not compile, 
    // r.replace_all("aaa", bar); // does not compile, 
}

@kornelski
Copy link
Author

I accept that with the current definition I can't use strings from Captures.

However, I think using strings from Captures is useful and something that could be reasonably expected to work. These strings are taken straight from the input, and outlive actual usage of the closure's return value. The problem is that the code doesn't tell that to the compiler, so usage is needlessly limited.

If Replacer closure can't work with Captures, could regex add, for example, something like replace_all_callback with less abstract arguments that could define the lifetimes in a way that allows use of Captures's values?

@BurntSushi
Copy link
Member

This is indeed unfortunate. It seems to me like there ought to be a way to express this, since from my point of view, it is totally safe to write this particular closure. Or at least, I don't understand why it isn't. So this might be one of those rare cases where NLL winds up rejecting what was a previously valid---and still valid---program. But this is not my area of expertise, so I am not super confident here. cc @nikomatsakis Do you have any insight into this issue?

If Replacer closure can't work with Captures, could regex add, for example, something like replace_all_callback with less abstract arguments that could define the lifetimes in a way that allows use of Captures's values?

I'm not necessarily opposed to this, but I'm not quite sure how it would work. Certainly, the existing replace_append method is sufficiently generic to handle any use case here---it just lacks ergonomics. So in order to get ergonomics back, you'd have to provide another FnMut impl, no? And that would likely overlap with the existing one, so I'm not sure how it would work.

It's worth noting that up until a year ago, the Replacer closure could only return a String. It wasn't until #509 was merged that it got generalized into its current form. So this is a somewhat niche concern, and there are of course work-arounds by implementing Replacer directly if someone absolutely needs this for performance. But still, I do of course agree it would be nice to be able to do this from the closure directly.

@kornelski
Copy link
Author

I can't think of another implementation that would solve this, and Rust is unlikely to have a feature for this anytime soon, so I'm closing the issue.

@jochenleidner
Copy link

jochenleidner commented Feb 13, 2021

Does anyone have an example of a working (Rust 2018) regex replacement code snippet that avoids the above error, with and without use of variable bindings (Captures)? Thanks (the code fragment of the top-5 Google results are all broken and mostly suffer from the error above)!

@kornelski
Copy link
Author

@jochenleidner you need to implement the trait on a struct, not a closure.

struct ExampleReplacer;
impl regex::Replacer for ExampleReplacer {
    fn replace_append(&mut self, capts: &Captures<'_>, dst: &mut String) {
        dst.push_str(some_replacement.unwrap_or_else(|| &capts[0]));
    }
}

https://github.com/kornelski/gh-emoji/blob/1366c9a2b4c659565a17f434e25aab294a0a0a7d/src/lib.rs#L55-L60

@jochenleidner
Copy link

Thanks a lot for this rapid reply, that works.

BTW, this is the first case where I've seen Rust code that is less readable and considerably more complex than its C++ counterpart:

string s("there is a subsequence in the string\n");
regex e("\\b(sub)([^ ]*)");
cout << regex_replace(s,e,"sub-$2"); // no need to implement a function

It may be merited as the Rust solution may buy one more flexibility, but perhaps it's worth adding a "convenience function" implementation like the one you quote above in the Rust std library.

@kornelski
Copy link
Author

kornelski commented Feb 13, 2021

Code identical to C++ works too — replacer is implemented for strings. The struct/closure alternative is for cases when you want to have something with higher performance or custom replacement logic.

@BurntSushi
Copy link
Member

BTW, this is the first case where I've seen Rust code that is less readable and considerably more complex than its C++ counterpart

And in addition to what @kornelski said, simple replacements like you're after are one of the first examples in the crate docs: https://docs.rs/regex/1.4.3/regex/#example-replacement-with-named-capture-groups

And there are even more in the docs of the replace method itself: https://docs.rs/regex/1.4.3/regex/struct.Regex.html#examples-2

@jochenleidner
Copy link

Thanks, @kornelski and @BurntSushi - I agree those examples are very useful, and along the lines of what I had in mind.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants