-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[instagram] Add support for GraphSidecar media types #201
Conversation
Refactor _extract_postpage() to always return a list of medias. Fetch common keywords and gracefully handle GraphSidecar media type by extracting each single media and adding `sidecar_media_id' and `sidecar_shortcode' keywords to indicate the parent of sidecar childrens. While here join the copyright comment lines in a single one. Closes #178.
Relevant parts of the Travis CI logs (possible TLDR;). It seems that all instagram extractor related tests passes:
...and the failure is a komikast one (but I have not investigated further):
|
Thanks a lot for this. (Ignore the one failed test, its not important.) There is one small thing that might be an issue for users who have already downloaded a few images and are now re-downloading everything to get the new, additional images that are now being downloaded: Take for example https://www.instagram.com/p/BoHk1haB5tM/. Before your change it got 1 image with ID 1875629777499953996. Now it gets the same image (and 4 others), but with a different ID than before (1875628837415270345). This could cause users to have downloaded the same image twice. |
Reading through your changes again made me realize the old ID was for the whole sidecar and the new IDs are for the actual media files, meaning they are now "more correct" than before, so I guess it's fine the way it is right now. And one can also use some filename shenanigans to give those IDs a bit more meaning: |
Add a possible leading `media_id' of the sidecar for GraphSidecar media. Thanks to @mikf for the suggestion!
That's a good idea! I have just adjusted it, thanks again! |
And, regarding possible IDs conflicts, that's right. Previously only the (now) |
I found an issue that's indirectly caused by this PR: GraphSidecar posts with multiple videos will download those videos multiple times, once for each shortcode. Example: https://www.instagram.com/p/BtOvDOfhvRr/ Possible solution: add a (Maybe this should be its own issue/pull request, but I thought it would fit in here as well) |
Whooops, nice catch @mikf! I guess that the problem is that each children shortcode - e.g. for BtOvDOfhvRr these are: Another possible kludge is to extract just the first GraphVideo in a GraphSidecar (maybe by directly returning the |
GraphSidecar children ytdl: URLs when consumed by youtube-dl redirects to the URL of their parent. In GraphSidecar-s with multiple GraphVideo-s this leads to downloading the same video multiple times. Add a `_ytdl_index' field to indicate the index of the youtube-dl playlist corresponding the children of the sidecar. This will be used by the `ytdl' downloader.
Hello Mike,
Mike Fährmann writes:
Merged #201 into master.
Thank you for quickly merging it and for all the reviews/suggestions!
|
Still doesn't work ;-( |
In what way? Does it crash? |
Hello Aris,
Aris Boch writes:
Still doesn't work ;-(
Can you please share the complete gallery-dl incantation (including
the used URL(s)) and maybe also add `--verbose' option?
Thanks!
|
Refactor _extract_postpage() to always return a list of medias.
Fetch common keywords and gracefully handle GraphSidecar media type
by extracting each single media and adding
sidecar_media_id' and
sidecar_shortcode' keywords to indicate the parent of sidecarchildrens.
While here join the copyright comment lines in a single one.
Closes #178.