Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regex match issue when string contains <CR> or <LF> #1632

Closed
arnaldop opened this issue Jun 9, 2021 · 3 comments
Closed

Regex match issue when string contains <CR> or <LF> #1632

arnaldop opened this issue Jun 9, 2021 · 3 comments
Labels

Comments

@arnaldop
Copy link

arnaldop commented Jun 9, 2021

There are backslashes in the failing cases, but this is not the same issue as #988.

When there are '\r' or '\n' characters in the string being matched, the regex match fails. This is probably because Pattern is not being compiled with the MULTILINE flag. See https://github.com/intuit/karate/blob/ae199b270b9f566847d275760d779456ff7d5289/karate-robot/src/main/java/com/intuit/karate/robot/StringMatcher.java#L45

Without testing, it seems that passing the MULTILINE flag should fix this issue.

A possible workaround might be to trim the string before the problem characters. See below.

Here is a simple feature file showcasing 2 cases that pass, 2 cases that fail, and a sample workaround.
(Could not attach .feature file.)

Feature: regex bug

  Scenario: pass 1 - no special characters

    * def body = 'Word'
    * match body == '#regex ^.*o.*$'

  Scenario: pass 2 - no new line characters

    * def body = 'Word\\\\'
    * match body == '#regex ^.*o.*$'

  Scenario: fail 1 - carriage return

    * def body = 'Word\r'
    * match body == '#regex ^.*o.*$'

  Scenario: fail 2 - line feed

    * def body = 'Word\n'
    * match body == '#regex ^.*o.*$'

  Scenario: pass 3 - workaround - trim before trouble character

    * def originalBody = 'Word\n'
    * def indexOfTroubleCharacter = originalBody.indexOf('\n')
    * def body = originalBody.substring(0, indexOfTroubleCharacter)
    * match body == '#regex ^.*o.*$'

Thanks for the great tool.
If I am able, I'll try to submit a PR.

@ptrthomas ptrthomas added this to the 1.1.0 milestone Jun 10, 2021
@ptrthomas
Copy link
Member

@arnaldop yes please submit a PR - I'm also wondering if we should add a karate.trim() helper function, let me know and I can take care of that.

@ptrthomas ptrthomas removed this from the 1.1.0 milestone Jun 12, 2021
ptrthomas added a commit that referenced this issue Jun 12, 2021
@ptrthomas
Copy link
Member

@arnaldop on second thoughts I'm closing as won't fix as I think this is a very rare use-case, this is the first time this has come up in 4 years. I've made the change to add karate.trim() as a helper function.

you can attempt a PR and request to re-open but I need to be convinced this doesn't impact any existing use of #regex in the wild

@arnaldop
Copy link
Author

@ptrthomas, I missed your last message until now.

Here is an additional workaround, for anyone looking. I think this one is actually a better workaround. This works with every failing case I mentioned above.

Scenario: Better Workaround

    # initial string - can contain \r, \n or \r\n
    * def body = 'Word\r'
    # split text by individual lines
    * def splitBody = body.split(/\r\n|\n|\r/)
    # see if the array of split strings contains the line with the regular expression
    * match splitBody contains [ '#regex ^.*o.*$' ]

ptrthomas added a commit that referenced this issue Oct 30, 2021
because if you have a string, you can call trim() on it any-time
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants