Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

named capture #16

Closed
wants to merge 2 commits into from
Closed

named capture #16

wants to merge 2 commits into from

Conversation

tdhock
Copy link

@tdhock tdhock commented Dec 25, 2011

Hi, I saw that you recently added perl functionality to stringr. I just hacked a little change to str_match() that switches to regexpr(,perl=TRUE) instead of regexec() for perl regexps. That way we can take advantage of named capture regular expressions in stringr, and so I added 1 test to make sure the capture group names get copied to the colnames() of the resulting match matrix.

@hadley
Copy link
Member

hadley commented Jan 2, 2012

Do named captures only work with perl regular expressions?

@tdhock
Copy link
Author

tdhock commented Jan 2, 2012

Yes, named captures only work with perl regexp because it uses a
special regexp named group syntax (?pattern) that is only valid
using the PCRE library. Those named groups give syntax errors under
regular regexps.

@hadley
Copy link
Member

hadley commented Jan 2, 2012

And regexec doesn't automatically name the results? Annoying. Maybe it would be a good idea to send that as a request to r-devel?

@tdhock
Copy link
Author

tdhock commented Jan 2, 2012

Well, regexec doesn't support perl regexps, and perl is required for
named capture, so it is normal that regexec doesn't name the results.

The g?regexpr(,perl=TRUE) functions DO return the name specified in
the (?pattern), but the match+group start and end locations are
in a different format.

@rubenarslan
Copy link

Maybe isn't completely clear from this focus on named captures, but currently the reliance on regexec means "perl()" is broken for str_match (and maybe other functions, didn't test).
That's not just named captures but also lookahead etc. Had me stumped and maybe it isn't intended, because it's not documented? Tdhock's commit fixes this for me, i.e. regexp with lookahead works as expected now.

Perl regexps not supported by regexec
Error in regexec("(([a-zA-Z0-9_]+)(?=q[1-4])|([a-zA-Z0-9]+))(_q([1-4]))?y([0-9][0-9])?(m([0-1][0-9]))?", :
regcomp error: 'Invalid regexp'

@gagolews
Copy link
Contributor

gagolews commented Jan 9, 2013

Confirming, this is a good fix - could you please merge that?

@hadley
Copy link
Member

hadley commented Nov 26, 2014

Closing since ICU doesn't appear to support this syntax. @gagolews can you please confirm?

@hadley hadley closed this Nov 26, 2014
@gagolews
Copy link
Contributor

@hadley Unfortunately: YES. No capture group names support in ICU regex engine yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants