Regex match issue when string contains <CR> or <LF> #1632

arnaldop · 2021-06-09T21:25:14Z

There are backslashes in the failing cases, but this is not the same issue as #988.

When there are '\r' or '\n' characters in the string being matched, the regex match fails. This is probably because Pattern is not being compiled with the MULTILINE flag. See https://github.com/intuit/karate/blob/ae199b270b9f566847d275760d779456ff7d5289/karate-robot/src/main/java/com/intuit/karate/robot/StringMatcher.java#L45

Without testing, it seems that passing the MULTILINE flag should fix this issue.

A possible workaround might be to trim the string before the problem characters. See below.

Here is a simple feature file showcasing 2 cases that pass, 2 cases that fail, and a sample workaround.
(Could not attach .feature file.)

Feature: regex bug

  Scenario: pass 1 - no special characters

    * def body = 'Word'
    * match body == '#regex ^.*o.*$'

  Scenario: pass 2 - no new line characters

    * def body = 'Word\\\\'
    * match body == '#regex ^.*o.*$'

  Scenario: fail 1 - carriage return

    * def body = 'Word\r'
    * match body == '#regex ^.*o.*$'

  Scenario: fail 2 - line feed

    * def body = 'Word\n'
    * match body == '#regex ^.*o.*$'

  Scenario: pass 3 - workaround - trim before trouble character

    * def originalBody = 'Word\n'
    * def indexOfTroubleCharacter = originalBody.indexOf('\n')
    * def body = originalBody.substring(0, indexOfTroubleCharacter)
    * match body == '#regex ^.*o.*$'

Thanks for the great tool.
If I am able, I'll try to submit a PR.

The text was updated successfully, but these errors were encountered:

ptrthomas · 2021-06-10T02:35:48Z

@arnaldop yes please submit a PR - I'm also wondering if we should add a karate.trim() helper function, let me know and I can take care of that.

ptrthomas · 2021-06-12T09:05:08Z

@arnaldop on second thoughts I'm closing as won't fix as I think this is a very rare use-case, this is the first time this has come up in 4 years. I've made the change to add karate.trim() as a helper function.

you can attempt a PR and request to re-open but I need to be convinced this doesn't impact any existing use of #regex in the wild

arnaldop · 2021-07-13T18:08:02Z

@ptrthomas, I missed your last message until now.

Here is an additional workaround, for anyone looking. I think this one is actually a better workaround. This works with every failing case I mentioned above.

Scenario: Better Workaround

    # initial string - can contain \r, \n or \r\n
    * def body = 'Word\r'
    # split text by individual lines
    * def splitBody = body.split(/\r\n|\n|\r/)
    # see if the array of split strings contains the line with the regular expression
    * match splitBody contains [ '#regex ^.*o.*$' ]

because if you have a string, you can call trim() on it any-time

ptrthomas assigned arnaldop Jun 10, 2021

ptrthomas added this to the 1.1.0 milestone Jun 10, 2021

ptrthomas unassigned arnaldop Jun 12, 2021

ptrthomas removed this from the 1.1.0 milestone Jun 12, 2021

ptrthomas added the wontfix label Jun 12, 2021

ptrthomas added a commit that referenced this issue Jun 12, 2021

add karate.trim() helper ref #1632

4131875

ptrthomas closed this as completed Jun 12, 2021

ptrthomas added a commit that referenced this issue Oct 30, 2021

reduce bloat and remove karate.trim() ref #1632

b4efb7f

because if you have a string, you can call trim() on it any-time

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regex match issue when string contains <CR> or <LF> #1632

Regex match issue when string contains <CR> or <LF> #1632

arnaldop commented Jun 9, 2021

ptrthomas commented Jun 10, 2021

ptrthomas commented Jun 12, 2021

arnaldop commented Jul 13, 2021

Regex match issue when string contains <CR> or <LF> #1632

Regex match issue when string contains <CR> or <LF> #1632

Comments

arnaldop commented Jun 9, 2021

ptrthomas commented Jun 10, 2021

ptrthomas commented Jun 12, 2021

arnaldop commented Jul 13, 2021