Applying strip_html filter to escaped html will unescape the string #306

ugenl · 2024-07-18T19:24:45Z

Encountered this on liqp 0.9, though it could go further back.

Ex.
{{ "test" | escape }} --> test

{{ "test" | escape | strip_html }} --> test

The text was updated successfully, but these errors were encountered:

msangel · 2024-07-27T08:36:17Z

This happened because we use Jsoup library for stripping html, while the Ruby's implementation is simple and naive:

    STRIP_HTML_BLOCKS = Regexp.union(
      /<script.*?<\/script>/m,
      /<!--.*?-->/m,
      /<style.*?<\/style>/m
    )
    STRIP_HTML_TAGS = /<.*?>/m

    def strip_html(input)
      empty  = ''
      result = input.to_s.gsub(STRIP_HTML_BLOCKS, empty)
      result.gsub!(STRIP_HTML_TAGS, empty)
      result
    end

And we probably should go naive implementation too

msangel · 2024-07-27T08:59:21Z

Will it be safer?
No.
Will it be more compatible?
Yes.

msangel · 2024-07-27T10:53:54Z

Fixed in 0.9.1.0. Side effect - jsoup dependency removed as not in use. If someone used it as transitive dependency in own projects, must add that back manually:

    <dependency>
      <groupId>org.jsoup</groupId>
      <artifactId>jsoup</artifactId>
      <version>1.15.3</version>
    </dependency>

As for this library the Jsoup has single use here.

ugenl · 2024-09-04T20:36:44Z

very much appreciate the fix here! Now that the fix is in place though, is there any other way to unescape strings at this point?

msangel · 2024-09-04T21:29:44Z

@ugenl probably not. as unescaping is destructive operation - you never know which symbol before unescaping was represented via escape sequence and which not.

ugenl · 2024-09-04T21:33:06Z

yeah, fair enough - and people can directly substitute via replace if necessary anyway. Sounds good

msangel closed this as completed Jul 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying strip_html filter to escaped html will unescape the string #306

Applying strip_html filter to escaped html will unescape the string #306

ugenl commented Jul 18, 2024

msangel commented Jul 27, 2024

msangel commented Jul 27, 2024

msangel commented Jul 27, 2024

ugenl commented Sep 4, 2024

msangel commented Sep 4, 2024

ugenl commented Sep 4, 2024

Applying strip_html filter to escaped html will unescape the string #306

Applying strip_html filter to escaped html will unescape the string #306

Comments

ugenl commented Jul 18, 2024

msangel commented Jul 27, 2024

msangel commented Jul 27, 2024

msangel commented Jul 27, 2024

ugenl commented Sep 4, 2024

msangel commented Sep 4, 2024

ugenl commented Sep 4, 2024