Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make METHRE RFC-7230 compliant #3235

Merged
merged 1 commit into from
Sep 8, 2018
Merged

Conversation

kxepal
Copy link
Member

@kxepal kxepal commented Sep 2, 2018

Definition of method is:

    method = token
    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
            "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
    token = 1*tchar

So he had two issues:

  1. Not all the characters were allowed.
  2. Actually, we did allowed too much characters since $-_ parsed as:
    "all characters in range between code 36 ($) and code 95 (_)" instead of
    "characters with codes 36, 45 and 95". So we did match methods like
    [GET] which are malformed according the spec.

@kxepal kxepal requested a review from asvetlov September 2, 2018 04:54
@kxepal
Copy link
Member Author

kxepal commented Sep 2, 2018

After review of #3233 I found this inconsistentcy between our regex and the spec. If fix is ok, will finish the rest bits.

@kxepal kxepal force-pushed the fix-methodre branch 2 times, most recently from 24a60db to 881a7a0 Compare September 2, 2018 05:17
@asvetlov
Copy link
Member

asvetlov commented Sep 2, 2018

@kxepal could you look on failed tests?

@kxepal
Copy link
Member Author

kxepal commented Sep 2, 2018

Oh, I didn't run full test suite, just test_http_parser. Now should be fixed.

@kxepal kxepal force-pushed the fix-methodre branch 2 times, most recently from 928df82 to 9272c20 Compare September 2, 2018 09:58
@codecov-io
Copy link

codecov-io commented Sep 2, 2018

Codecov Report

Merging #3235 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3235   +/-   ##
=======================================
  Coverage   98.07%   98.07%           
=======================================
  Files          43       43           
  Lines        7856     7856           
  Branches     1354     1354           
=======================================
  Hits         7705     7705           
  Misses         59       59           
  Partials       92       92
Impacted Files Coverage Δ
aiohttp/http_parser.py 98.06% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b71302...7351a1f. Read the comment docs.

@@ -29,7 +29,7 @@
'RawRequestMessage', 'RawResponseMessage')

ASCIISET = set(string.printable)
METHRE = re.compile('[A-Z0-9$-_.]+')
METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think It might make sense to put a comment with that rule somewhere nearby:

#     method = token
#     tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
#             "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
#     token = 1*tchar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is git blame with that information, but having a copy in source code wouldn't hurt anyone. Added that one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I know about blame. But it's visually more convenient to see it next to the code. Like it's done in yarl, for example.

# tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
# "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
# token = 1*tchar
METHRE = re.compile("[!#$%&'*+\-.^_`|~0-9A-Za-z]+")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use a raw-string literal r''

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, + and * might also need escaping and | needs escaping (otherwise it divides groups in [])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and . might stand for any single character.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably they have restricted meanings within [], but IMHO better safe than sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All true, but not for [] sets.

>>> import re
>>> re.match('[.]', 't')
>>> re.match('[.]', 't') is None
True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

digits can be replaced with \d (to match style below)

Copy link
Member Author

@kxepal kxepal Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they cannot. \d means all the unicode numbers, not just ascii ones:

>>> from hypothesis.strategies import from_regex
>>> thing = from_regex('^\d$').example()
>>> re.match('\d', thing)
<_sre.SRE_Match object; span=(0, 1), match='𝟙'>
>>> thing
'𝟙'
>>> unicodedata.category(thing)
'Nd'
>>> unicodedata.name(thing)
'MATHEMATICAL DOUBLE-STRUCK DIGIT ONE'

@@ -540,7 +540,7 @@ def test_http_request_parser_two_slashes(parser):

def test_http_request_parser_bad_method(parser):
with pytest.raises(http_exceptions.BadStatusLine):
parser.feed_data(b'!12%()+=~$ /get HTTP/1.1\r\n\r\n')
parser.feed_data(b'=":<G>(e),[T];?" /get HTTP/1.1\r\n\r\n')

This comment was marked as outdated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? These characters are valid.

Copy link
Member

@webknjaz webknjaz Sep 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's a negative test. This reads confusing. Let's change it to parametrized test, smth like:

@pytest.mark.parametrize(
    'invalid_byte',
    list(b'=":<>(),[];?"'),
)
@pytest.raises(http_exceptions.BadStatusLine)
def test_http_request_parser_bad_method(parser, invalid_byte):
    valid_method = b'Get'
    invalid_method = valid_method + invalid_byte
    invalid_request_line = invalid_method + b' /get HTTP/1.1\r\n\r\n'
    parser.feed_data(invalid_request_line)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with pytest.raises at least.
Personally I dont insist on pytest parametrization

@kxepal kxepal force-pushed the fix-methodre branch 2 times, most recently from 69b5734 to 7351a1f Compare September 2, 2018 11:23
Definition of method is:
```
    method = token
    tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
            "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
    token = 1*tchar
```

So we had two issues:
1. Not all the characters were allowed.
2. Actually, we did allowed too much characters since `$-_` parsed as:
"all characters in range between code 36 ($) and code 95 (_)" instead of
"characters with codes 36, 45 and 95". So we did match methods like
`[GET]` which are malformed according the spec.
@asvetlov
Copy link
Member

asvetlov commented Sep 2, 2018

@kxepal please merge when ready

@asvetlov asvetlov merged commit de4daf3 into aio-libs:master Sep 8, 2018
@lock
Copy link

lock bot commented Oct 28, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

@lock lock bot added the outdated label Oct 28, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019
@psf-chronographer psf-chronographer bot added the bot:chronographer:provided There is a change note present in this PR label Oct 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bot:chronographer:provided There is a change note present in this PR outdated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants