Weasyl Extractor #977

Korvox · 2020-09-04T16:15:32Z

Closes #813

Supports individual posts, user galleries, folders, a journal, or all of a users journals. Uses the Weasyl API for everything but journals since there is no endpoint for that.

Currently user extraction is an alias for submissions extraction but it could be made its own extractor for both submissions and journals. I feel like in general most people would just want the gallery if they fed in a username though.
There might be someone that wants to download a favorites collection which is not a folder or gallery. There is no api for this so it would require iterating over the pages of the collection and grabbing the submits. If someone actually wants this its pretty trivial to implement but its a lot of non-API requests so I left it out for now.
There is no way to filter by file type - if someone has text files, music, videos, and pictures in their submissions you get all of them. If someone wants filtering it can be added.

docs/supportedsites.rst

kattjevfel · 2020-09-22T22:02:45Z

Decided to give this one a go and other than the supportsites.py issue the only feedback I've got is to perhaps set a different default filename format? Others extractors tend to prefix the filenames with the category.

My suggestion would be something like {category}_{submitid}_{title}.{extension} as the default, as that seems to be most common one.

2020-09-22_darkchibishadow-solanaceae-prologue-chapter-2-5-page-1.png --> weasyl_1948673_Solanaceae - Prologue Chapter 2.5 - Page 1.png

Other than that, seems to work great!

Korvox · 2020-09-23T00:58:36Z

My suggestion would be something like {category}_{submitid}_{title}.{extension} as the default, as that seems to be most common one.

Most extractors format title to remove spaces, lowercase, etc. They don't just stick the raw title in a filename. The {filename} part is nice because Weasyl already does this formatting in its own filenames.

I can get adding {category} for consistency but I do like keeping the date around since its data that isn't carried anywhere else (it isn't in the img metadata) and can be useful for sorting. Most extractors just don't have the luxury of getting it via an API.

Hows {category}_{filename}_{date}? That respects the folder hierarchy containing it. The submitid is an internal detail the user shouldn't really care about.

kattjevfel · 2020-09-23T11:18:41Z

I was mainly thinking of the imgur, gfycat and furaffinity extractor that does in fact just get the title, perhaps others too. I personally find the date really useless, you get the same thing with ID (which is why it goes after {category}, for sorting).

gallery-dl/gallery_dl/extractor/imgur.py

Line 58 in 2184ec5

filename_fmt = "{category}_{id}{title:?_//}.{extension}"

gallery-dl/gallery_dl/extractor/gfycat.py

Line 19 in 2184ec5

filename_fmt = "{category}_{gfyName}{title:?_//}.{extension}"

gallery-dl/gallery_dl/extractor/furaffinity.py

Line 22 in 2184ec5

filename_fmt = "{id} {title}.{extension}"

Though I see now that the furaffinity one is also lacking {category}, so idk how important consistency is.
Another reason for using {title} is that the {filename} on weasyl is quite useless for identification and not unique.

mikf · 2020-09-23T17:53:41Z

I think it is generally a good idea for sites with 1 file per post and many posts per user to try and replicate the general structure of the furaffinity module, but it might be a bit too late for that.

As for the filename_fmt, {submitid} {title}.{extension} sounds like a good idea
Usernames can contain -. Replace all ([\w\d]+) with ([\w-]+) (\d is included in \w)
There is a text.parse_datetime() function, which should be used to parse dates into datetime objects
Why do journals have a completely different filename_fmt than everything else?
Why doesn't retrieve_journal(self) have a journalid parameter, but instead uses self.journalid?

Korvox · 2020-09-23T21:01:01Z

There is a text.parse_datetime() function, which should be used to parse dates into datetime objects

If I'm not encoding date in the filename_fmt anymore all the date stuff can be dropped. Mind if I take a stab at #374 to embed this stuff? Everything else is in the latest revision. Unrelated question but is there a way to run tests on just the one extractor?

mikf · 2020-09-25T13:18:01Z

Ok then, thanks a lot @Korvox. Time to merge this.

If I'm not encoding date in the filename_fmt anymore all the date stuff can be dropped

The filename_fmt value of each extractor is just the default. Anyone can change it to his own personal taste with the filename option. More metadata fields are usually better.

Mind if I take a stab at #374 to embed this stuff?

Sure. Do you know of a good (cross-platform) library that can do this?
(I'd recommend implementing this as a postprocessor module, by the way)

Unrelated question but is there a way to run tests on just the one extractor?

Running test/test_results.py with the category value you want to test as argument is the closest you can get, but you could also modify this script as needed.

$ python test/test_results.py weasyl

tux93 · 2020-09-25T14:06:31Z

First of all thank you very much for implementing this @Korvox !

There might be someone that wants to download a favorites collection which is not a folder or gallery. There is no api for this so it would require iterating over the pages of the collection and grabbing the submits. If someone actually wants this its pretty trivial to implement but its a lot of non-API requests so I left it out for now.

Should I open up a followup issue for this, since I think it would be worth having?

mikf · 2020-09-25T17:09:41Z

@tux93 #1032

God-damnit-all · 2020-10-11T03:33:13Z

Is there any way to login or use cookies? As it stands, without a way to login, only SFW submissions are able to be downloaded.

Being able to use an API key generated in my account settings would be good too.

Korvox · 2020-10-11T03:43:40Z

You can generate an API key here: https://www.weasyl.com/control/apikeys

API calls want the X-Weasyl-API-Key header set to it.

I'll look into hooking it into extractor.config. Tumblr does basically the same thing.

God-damnit-all · 2020-10-11T04:06:31Z

You can generate an API key here: https://www.weasyl.com/control/apikeys

API calls want the X-Weasyl-API-Key header set to it.

I'll look into hooking it into extractor.config. Tumblr does basically the same thing.

Does gallery-dl even have a way of using custom header values other than User-Agent?

God-damnit-all · 2020-10-11T06:18:07Z

@Korvox There's another problem I ran into. Usernames are allowed to have tildes in their name, which means two tildes in the URL. This messes up the pattern matching for extraction.

Korvox · 2020-10-11T06:21:51Z

Can you link me an example of it?

Its a weird edge case if it exists because all the API requests use login_names: "A user’s username, lowercase, and omitting all non-alphanumeric, non-ASCII characters."

Korvox force-pushed the master branch from f4085ed to 9c754fa Compare September 5, 2020 03:05

kattjevfel reviewed Sep 22, 2020

View reviewed changes

docs/supportedsites.rst Outdated Show resolved Hide resolved

weasyl extractor

2617893

@kattjevfel suggested changes

e1117c0

Korvox force-pushed the master branch from 9c754fa to e1117c0 Compare September 23, 2020 01:38

@mikf changes

0660a5d

mikf merged commit ebb7737 into mikf:master Sep 25, 2020

mikf mentioned this pull request Sep 25, 2020

[weasyl] support favorites collections #1032

Closed

Korvox mentioned this pull request Oct 11, 2020

[weasyl] api-key authentication #1057

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weasyl Extractor #977

Weasyl Extractor #977

Korvox commented Sep 4, 2020

kattjevfel commented Sep 22, 2020

Korvox commented Sep 23, 2020 •

edited

Loading

kattjevfel commented Sep 23, 2020

mikf commented Sep 23, 2020

Korvox commented Sep 23, 2020

mikf commented Sep 25, 2020

tux93 commented Sep 25, 2020

mikf commented Sep 25, 2020

God-damnit-all commented Oct 11, 2020

Korvox commented Oct 11, 2020

God-damnit-all commented Oct 11, 2020

God-damnit-all commented Oct 11, 2020

Korvox commented Oct 11, 2020 •

edited

Loading

Weasyl Extractor #977

Weasyl Extractor #977

Conversation

Korvox commented Sep 4, 2020

kattjevfel commented Sep 22, 2020

Korvox commented Sep 23, 2020 • edited Loading

kattjevfel commented Sep 23, 2020

mikf commented Sep 23, 2020

Korvox commented Sep 23, 2020

mikf commented Sep 25, 2020

tux93 commented Sep 25, 2020

mikf commented Sep 25, 2020

God-damnit-all commented Oct 11, 2020

Korvox commented Oct 11, 2020

God-damnit-all commented Oct 11, 2020

God-damnit-all commented Oct 11, 2020

Korvox commented Oct 11, 2020 • edited Loading

Korvox commented Sep 23, 2020 •

edited

Loading

Korvox commented Oct 11, 2020 •

edited

Loading