Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

southpark.de extractor broke due to the site structure changing, using the ComedyCentral extractor should work. #26763

Closed
5 tasks done
okh-mzny opened this issue Oct 1, 2020 · 13 comments

Comments

@okh-mzny
Copy link

okh-mzny commented Oct 1, 2020

Checklist

  • I'm reporting a broken site support
  • I've verified that I'm running youtube-dl version 2020.09.20
  • I've checked that all provided URLs are alive and playable in a browser
  • I've checked that all URLs and arguments with special characters are properly quoted or escaped
  • I've searched the bugtracker for similar issues including closed ones

Verbose log

A:\>youtube-dl -v https://www.southpark.de/alle-episoden/s20e02-skankhunt
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.southpark.de/alle-episoden/s20e02-skankhunt']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2020.09.20
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.14393
[debug] exe versions: ffmpeg git-2020-07-13-7772666, ffprobe git-2020-07-13-7772666
[debug] Proxy map: {}
[southpark.de] s20e02-skankhunt: Downloading webpage
Traceback (most recent call last):
  File "__main__.py", line 19, in <module>
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\__init__.py", line 474, in main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\__init__.py", line 464, in _real_main
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\YoutubeDL.py", line 2019, in download
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\YoutubeDL.py", line 797, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\common.py", line 532, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\mtv.py", line 287, in _real_extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\mtv.py", line 213, in _get_videos_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\mtv.py", line 39, in _id_from_uri
AttributeError: 'NoneType' object has no attribute 'split'

A:\>youtube-dl -v https://www.southpark.de/folgen/fi4nmu/south-park-mexikanischer-joker-staffel-23-ep-1
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-v', 'https://www.southpark.de/folgen/fi4nmu/south-park-mexikanischer-joker-staffel-23-ep-1']
[debug] Encodings: locale cp1252, fs mbcs, out cp850, pref cp1252
[debug] youtube-dl version 2020.09.20
[debug] Python version 3.4.4 (CPython) - Windows-10-10.0.14393
[debug] exe versions: ffmpeg git-2020-07-13-7772666, ffprobe git-2020-07-13-7772666
[debug] Proxy map: {}
[generic] south-park-mexikanischer-joker-staffel-23-ep-1: Requesting header
WARNING: Falling back on generic information extractor.
[generic] south-park-mexikanischer-joker-staffel-23-ep-1: Downloading webpage
[generic] south-park-mexikanischer-joker-staffel-23-ep-1: Extracting information
ERROR: Unsupported URL: https://www.southpark.de/folgen/fi4nmu/south-park-mexikanischer-joker-staffel-23-ep-1
Traceback (most recent call last):
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\YoutubeDL.py", line 797, in extract_info
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\common.py", line 532, in extract
  File "C:\Users\dst\AppData\Roaming\Build archive\youtube-dl\ytdl-org\tmpksi3o1r1\build\youtube_dl\extractor\generic.py", line 3382, in _real_extract
youtube_dl.utils.UnsupportedError: Unsupported URL: https://www.southpark.de/folgen/fi4nmu/south-park-mexikanischer-joker-staffel-23-ep-1

Description

Southpark.de has recently changed its site, breaking the southpark.de extractor. The first command was ran with the old url format which the southpark.de extractor works with, however it fails due to the old url redirecting to the new one. Below the first command is a second command with the new southpark.de url format, which fails due to no extractor being implemented for it.

I observed that the new southpark.de website looks and works similar to cc.com. Using the ComedyCentral extractor for southpark.de should probably work, I have not been able to test it out due to there being no way to force the use of an extractor for an unknown url.

Hope this will be fixed.

Thank you.

@jtetrault
Copy link

I am seeing the same thing when trying to grab episodes from southpark.cc.com. Seems like the MTV extractor is being used instead of the ComedyCentral one.

youtube-dl --proxy socks://localhost:8080 https://southpark.cc.com/episodes/yy0vjs/south-park-the-pandemic
-special-season-24-ep-1
[southpark.cc.com] south-park-the-pandemic-special-season-24-ep-1: Downloading webpage
Traceback (most recent call last):
  File "/usr/local/bin/youtube-dl", line 8, in <module>
    sys.exit(main())
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/__init__.py", line 474, in main
    _real_main(argv)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/__init__.py", line 464, in _real_main
    retcode = ydl.download(all_urls)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 2018, in download
    res = self.extract_info(
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/YoutubeDL.py", line 797, in extract_info
    ie_result = ie.extract(url)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/extractor/common.py", line 532, in extract
    ie_result = self._real_extract(url)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/extractor/mtv.py", line 287, in _real_extract
    videos_info = self._get_videos_info(mgid)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/extractor/mtv.py", line 213, in _get_videos_info
    video_id = self._id_from_uri(uri)
  File "/home/joel/.local/lib/python3.8/site-packages/youtube_dl/extractor/mtv.py", line 39, in _id_from_uri
    return uri.split(':')[-1]
AttributeError: 'NoneType' object has no attribute 'split'

@SkiTheSlicer
Copy link

southpark.cc.com changed their whole URL structure about 2 weeks ago.

For example:
https://southpark.cc.com/full-episodes/s23e10-christmas-snow
became:
https://southpark.cc.com/episodes/z4ipl3/south-park-christmas-snow-season-23-ep-10

@SkiTheSlicer
Copy link

It's not as simple as changing the ComedyCentralFullEpisodesIE() _VALID_URL in comedycentral.py from (?:full-episodes|shows(?=/[^/]+/full-episodes)) to (?:(?:full-)?episodes|shows(?=/[^/]+/full-episodes)) because it then calls the mtv.py extractor which doesn't understand the new structure either, I don't think.

@okh-mzny
Copy link
Author

okh-mzny commented Oct 5, 2020

became:
https://southpark.cc.com/episodes/z4ipl3/south-park-christmas-snow-season-23-ep-10

for me it's:
https://www.southparkstudios.com/episodes/z4ipl3/south-park-christmas-snow-season-23-ep-10

Their websites are strictly divided by geolocation. I get redirected to southpark.de due to me being in germany. It even detects my location trough tor (geolocation trough javascript time?).
So I wouldn't worry about URLs being different, they should all work with the same extractor.

@prtac
Copy link

prtac commented Oct 8, 2020

Similar issue with MTV.com they must have changed them across the board.

@jtetrault
Copy link

I fudged the ComedyCentralFullEpisodesIE extractor a little bit by hardcoding an mgid and it works fine. So it's just a question of fixing up which extractor gets used, and how the mgid is parsed out of the webpage.

diff --git a/youtube_dl/extractor/comedycentral.py b/youtube_dl/extractor/comedycentral.py
index d08b909a6..b1d175759 100644
--- a/youtube_dl/extractor/comedycentral.py
+++ b/youtube_dl/extractor/comedycentral.py
@@ -45,14 +45,15 @@ class ComedyCentralFullEpisodesIE(MTVServicesInfoExtractor):
         'only_matching': True,
     }]
 
+    @classmethod
+    def suitable(cls, url):
+        return True
+
     def _real_extract(self, url):
-        playlist_id = self._match_id(url)
-        webpage = self._download_webpage(url, playlist_id)
-        mgid = self._extract_triforce_mgid(webpage, data_zone='t2_lc_promo1')
+        mgid = 'mgid:arc:episode:southparkstudios.com:230a4f02-f583-11ea-834d-70df2f866ace'
         videos_info = self._get_videos_info(mgid)
         return videos_info
 
-
 class ToshIE(MTVServicesInfoExtractor):
     IE_DESC = 'Tosh.0'
     _VALID_URL = r'^https?://tosh\.cc\.com/video-(?:clips|collections)/[^/]+/(?P<videotitle>[^/?#]+)'

@okh-mzny
Copy link
Author

okh-mzny commented Oct 11, 2020

I fudged the ComedyCentralFullEpisodesIE extractor a little bit by hardcoding an mgid and it works fine. So it's just a question of fixing up which extractor gets used, and how the mgid is parsed out of the webpage.

I applied the patch and can confirm that it works! I downloaded the Pandemic Special with no problem.

@okh-mzny okh-mzny mentioned this issue Oct 12, 2020
5 tasks
@okh-mzny
Copy link
Author

okh-mzny commented Oct 12, 2020

Above Issue mentions that this has been fully fixed in youtube-dlc.
I'll test it tomorrow and hope we can merge this fix into youtube-dl aswell.

@okh-mzny
Copy link
Author

I tested it, and downloading Southpark content does not work in Yotube-dlc yet. They have only implemented a fix for mtv. But as said it should be trivial to fix Southpark aswell.

@okh-mzny
Copy link
Author

This has been fixed in youtube-dlc with blackjack4494#188
I'm keeping this open since it should be pulled into youtube-dl aswell

@ParadoxGBB
Copy link

With the latest ComedyCentral / MTV fixes it looks like we're closer but still not there for South Park:

[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['-o', 'F:\GREG\DATA\CinAdmin\data\Subscriptions\i3j24ra1.wp3\%(title)s.%(ext)s', '--fixup', 'detect_or_warn', '--add-metadata', '--write-thumbnail', '--verbose', '--recode-video', 'mp4', '--ffmpeg-location', 'F:\GREG\OneDrive\tools\ffmpeg', '--sub-format', 'best', '--write-sub', 'https://southpark.cc.com/episodes/z4ipl3/south-park-christmas-snow-season-23-ep-10']
[debug] Encodings: locale cp1252, fs utf-8, out utf-8, pref cp1252
[debug] youtube-dl version 2021.01.16
[debug] Python version 3.9.0 (CPython) - Windows-10-10.0.19041-SP0
[debug] exe versions: ffmpeg git-2020-03-15-c467328, ffprobe git-2020-03-15-c467328
[debug] Proxy map: {}
[southpark.cc.com] south-park-christmas-snow-season-23-ep-10: Downloading webpage
[southpark.cc.com] ac8dec94-b355-11e9-9fb2-70df2f866ace: Downloading info
ERROR: Unable to download XML: HTTP Error 404: Not Found (caused by <HTTPError 404: 'Not Found'>); please report this issue on https://yt-dl.org/bug . Make sure you are using the latest version; type youtube-dl -U to update. Be sure to call youtube-dl with the --verbose flag and include its complete output.
File "F:\GREG\OneDrive\Projects\youtube-dl\youtube-dl\youtube_dl\extractor\common.py", line 632, in _request_webpage
return self._downloader.urlopen(url_or_request)
File "F:\GREG\OneDrive\Projects\youtube-dl\youtube-dl\youtube_dl\YoutubeDL.py", line 2275, in urlopen
return self._opener.open(req, timeout=self._socket_timeout)
File "C:\Python\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Python\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Python\lib\urllib\request.py", line 555, in error
result = self._call_chain(*args)
File "C:\Python\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Python\lib\urllib\request.py", line 747, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Python\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Python\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Python\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Python\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

@dstftw dstftw closed this as completed in 1860d0f Mar 14, 2021
github-actions bot added a commit to hellopony/youtube-dl that referenced this issue Mar 14, 2021
* https://github.com/ytdl-org/youtube-dl:
  [applepodcasts] fix extraction(closes ytdl-org#28445)
  [rtve] improve extraction
  release 2021.03.14
  [ChangeLog] Actualize [ci skip]
  [southpark] Fix extraction and add support for southparkstudios.com (closes ytdl-org#26763, closes ytdl-org#28413)
@andi448
Copy link

andi448 commented Apr 21, 2021

Still getting ERROR: Unsupported URL: https://www.southpark.de/folgen/okhu48/south-park-viel-frottee-um-nichts-staffel-10-ep-5 with version 2021.04.17.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants
@jtetrault @SkiTheSlicer @ParadoxGBB @andi448 @okh-mzny @prtac and others