Skip to content
This repository has been archived by the owner on Apr 4, 2018. It is now read-only.

Url regex #14

Closed
0x0ece opened this issue Nov 21, 2012 · 0 comments
Closed

Url regex #14

0x0ece opened this issue Nov 21, 2012 · 0 comments

Comments

@0x0ece
Copy link

0x0ece commented Nov 21, 2012

I think there are a couple of problems with url regex.
I've found issues especially with "strange chars" at the end of urls, such as this url: http://t.co/iTyBIiBB)
where ) should not be part of the url.

I've checked at the js lib, but it seems Twitter made significant updates, so I couldn't easily understand the changes to apply.

As for the python version, the previous example can be fixed with:

- REGEXEN['valid_url_path_ending_chars'] = re.compile(ur'[a-z0-9\)=#\/]', re.IGNORECASE)
+ REGEXEN['valid_url_path_ending_chars'] = re.compile(ur'[a-z0-9=#\/]', re.IGNORECASE)

...

REGEXEN['valid_url'] = re.compile(u'''
    (%s)
    (
        (https?:\/\/|www\.)
        (%s)
-        (/%s*%s?)?
+        (/%s*%s)?
        (\?%s*%s)?
    )

But of course this does not ensure that other bugs are fixed...
Best, E.

@dryan dryan mentioned this issue May 16, 2013
6 tasks
@dryan dryan closed this as completed May 16, 2013
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants