Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test to reproduce hang in sanitize_styles #61

Merged
merged 2 commits into from
Jun 1, 2012
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions bleach/sanitizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,15 @@ def sanitize_css(self, style):
style = re.compile('url\s*\(\s*[^\s)]+?\s*\)\s*').sub(' ', style)

# gauntlet
if not re.match("""^([-:,;#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+"""
"""'|"[\s\w]+"|\([\d,\s]+\))*$""",
style):
return ''
# TODO: Make sure this does what it's meant to - I *think* it wants to
# validate style attribute contents.
parts = style.split(';')
gauntlet = re.compile("""^([-/:,#%.\sa-zA-Z0-9!]|\w-\w|'[\s\w]+"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwinding this one a bit, let me see if I've got this right:

The style value must be (since it contains ^ and $) some combination of units that can be:

  • One of the characters in set(-:,;#%! + any character (. is unescaped) + any letter + any number). - OR -
  • Any 3 characters in the form \w-\w. - OR -
  • Any single OR double quoted string of [\s\w]+. - OR -
  • Any number of digits, commas, and whitespace within parentheses. - OR -
  • Empty.

... What? What is this testing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, my bad, inside [], . is literal, so the first line is any single letter, number, white space character, or character from -:,;.#%!.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is unwinded is good.

Short term break the regex into many regex at |.

Long term fix is to read the CSS spec and figure out many regex that will be able to parse the components. Each regex should fail fast for the wrong input, even if it is a really long string.

"""'|"[\s\w]+"|\([\d,\s]+\))*$""")
for part in parts:
if not gauntlet.match(part):
return ''

if not re.match("^\s*([-\w]+\s*:[^:;]*(;\s*|$))*$", style):
return ''

Expand Down
23 changes: 22 additions & 1 deletion bleach/tests/test_css.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from functools import partial

from nose.tools import eq_
from nose.tools import eq_, ok_

from bleach import clean

Expand Down Expand Up @@ -38,3 +38,24 @@ def test_valid_css():
clean('<p style="float: left; color: ">foo</p>', styles=styles))
eq_('<p style="">foo</p>',
clean('<p style="color: float: left;">foo</p>', styles=styles))


def test_style_hang():
"""The sanitizer should not hang on any inline styles"""
# TODO: Neaten this up. It's copypasta from MDN/Kuma to repro the bug
style = """margin-top: 0px; margin-right: 0px; margin-bottom: 1.286em; margin-left: 0px; padding-top: 15px; padding-right: 15px; padding-bottom: 15px; padding-left: 15px; border-top-width: 1px; border-right-width: 1px; border-bottom-width: 1px; border-left-width: 1px; border-top-style: dotted; border-right-style: dotted; border-bottom-style: dotted; border-left-style: dotted; border-top-color: rgb(203, 200, 185); border-right-color: rgb(203, 200, 185); border-bottom-color: rgb(203, 200, 185); border-left-color: rgb(203, 200, 185); background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: rgb(246, 246, 242); overflow-x: auto; overflow-y: auto; font: normal normal normal 100%/normal 'Courier New', 'Andale Mono', monospace; background-position: initial initial; background-repeat: initial initial;"""
html = '<p style="%s">Hello world</p>' % style
styles = [
'border', 'float', 'overflow', 'min-height', 'vertical-align',
'white-space',
'margin', 'margin-left', 'margin-top', 'margin-bottom', 'margin-right',
'padding', 'padding-left', 'padding-top', 'padding-bottom', 'padding-right',
'background',
'background-color',
'font', 'font-size', 'font-weight', 'text-align', 'text-transform',
]

expected = """<p style="margin-top: 0px; margin-right: 0px; margin-bottom: 1.286em; margin-left: 0px; padding-top: 15px; padding-right: 15px; padding-bottom: 15px; padding-left: 15px; background-color: rgb(246, 246, 242); font: normal normal normal 100%/normal 'Courier New', 'Andale Mono', monospace;">Hello world</p>"""

result = clean(html, styles=styles)
eq_(expected, result)