Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube link regex fails for youtu.be links. #55

Closed
edelooff opened this issue Jan 10, 2011 · 2 comments
Closed

Youtube link regex fails for youtu.be links. #55

edelooff opened this issue Jan 10, 2011 · 2 comments
Labels

Comments

@edelooff
Copy link

Given the example link "http://youtu.be/watch?v=COiIC3A0ROM"
and the _VALID_URL regex present in the Jan 7 checkout, the groups() results are as follows:
('http://youtu.be/', 'watch')

The fix for the regex is a parenthesis that is placed too late. There should be an extra one after the ".com/" part, closing the optional matching group for youtu.be | \w+.youtube.com.

The regular expression that "works for me" is as follows:
_VALID_URL = r'^((?:https?://)?(?:youtu.be/|(?:\w+.)?youtube(?:-nocookie)?.com/)(?:(?:v/)|(?:(?:watch(?:popup)?(?:.php)?)?(?:?|#!?)(?:.+&)?v=)))?([0-9A-Za-z-]+)(?(1).+)?$'

@edelooff
Copy link
Author

An annotated version of the above, which is not as verbose as it could be, but attempts some explanation of parts:

_VALID_URL = re.compile(r"""
^( # Start matching an optional URL.
(?:https?://)? # The scheme may be 'http', 'https' or missing.
(?:youtu.be/| # Domain can be youtu.be ...
(?:\w+.)?youtube # ... or any youtube subdomain ...
(?:-nocookie)?.com/) # with or without -nocookie in the domain name.
(?:(?:v/)| # The path might start with 'v', ...
(?:(?:watch(?:_popup)? # ... 'watch', with or without 'popup' ...
(?:.php)?)? # ... and might have a trailing '.php'.
(?:?|#!?) # Info may be in the fragment or in the path ...
(?:.+&)? # ... and may contain leading query arguments.
v=)))? # but it WILL contain a 'v=' for the video ID!
([0-9A-Za-z
-]+) # Capture the video_id (group #2).
(?(1).+)?$ # If the video ID was in an URL, capture the rest.
""", re.VERBOSE)

@rg3
Copy link
Collaborator

rg3 commented Jan 10, 2011

Fixed. Thanks.

joedborg referenced this issue in joedborg/youtube-dl Nov 17, 2020
[pull] master from rg3:master
tsukumijima pushed a commit to tsukumijima/youtube-dl that referenced this issue Dec 2, 2020
…ta-fix

[youtube] fix: extract artist metadata from ytInitialData (ytdl-org#49)
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants