Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Site Request] Kemono.party #1216

Closed
ghost opened this issue Jan 2, 2021 · 11 comments
Closed

[Site Request] Kemono.party #1216

ghost opened this issue Jan 2, 2021 · 11 comments

Comments

@ghost
Copy link

ghost commented Jan 2, 2021

The site is a patreon/DL-Site/Fantia/etc scrapper. It's a yiff.party alternative now that yiff.party is gone. It's kind of a bother to download 300+ images by hand so it would be nice if gallery-dl had this site.

@Hrxn
Copy link
Contributor

Hrxn commented Jan 3, 2021

New site, already down? 😄

@kattjevfel
Copy link
Contributor

New site, already down? smile

It seems they don't do http, not even a redirect and browsers don't auto-try https.
https://kemono.party/

@Hrxn
Copy link
Contributor

Hrxn commented Jan 3, 2021

I did use HTTPS, and I see something, but this is extremely flaky, at least for me. I get 404s and timeouts all the time, some other errors as well.

@kattjevfel
Copy link
Contributor

Ah, yeah I get timeouts too, and here I got my hopes up for getting free high quality anime tiddies!

@kattjevfel
Copy link
Contributor

kattjevfel commented Jan 6, 2021

The site now appears to be working reliably, and while it does list a RSS option, it just 404's, so we're gonna have to go with more caveman approaches.

Example user: https://kemono.party/patreon/user/233822 (NSFW)

each entry looks like:

  <a href="/patreon/user/233822/post/39735940" class="thumb-link">
  
    <div class="thumb thumb-with-image thumb-standard">
      <img src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png">
      <div class="thumb-with-image-overlay">
        <h3>Adventure Girls - The Suitor</h3>
        
          <small>2020-07-26 16:57:56</small><br>
        
        
          <small>1 attachments</small>
        
      </div>
    </div>
  
</a>

some metadata available inside :

      <meta name="service" content="patreon"/>
      <meta name="count" content="1063"/>

posts look like this:

  <div class="page" id="page">
    
      
      
      
      
      
      
      <h1>Adventure Girls - The Suitor</h1>
      
      <p><p></p><p>[Based on Season 5, Episode 21]</p><p>Princess Bubblegum makes a sex robot for local simp Braco, decides to keep it for herself.</p><h3><u><strong>Last post here on Patreon. Moving to Subscribe Star.</strong></u></h3></p>
      
        
          <a class="fileThumb" href="/files/233822/39735940/Adventure_Girls_S5_EP21.png">
            <img
              data-src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png"
              src="/thumbnail/files/233822/39735940/Adventure_Girls_S5_EP21.png"
            >
          </a>
          <br>
        
      
        
          <a class="fileThumb" href="/attachments/233822/39735940/Adventure-Girls-S5-EP21.png">
            <img
              data-src="/thumbnail/attachments/233822/39735940/Adventure-Girls-S5-EP21.png"
              src="/thumbnail/attachments/233822/39735940/Adventure-Girls-S5-EP21.png"
            >
          </a>
          <br>
        
      
    
  </div>

post with only a file attached (and header):

      <h1>June 2020 Art</h1>
      
        <a href="/attachments/233822/38850658/Jun20.rar" target="_blank">
          Download Jun20.rar
        </a>
        <br>
      
      <p><p>Did a lot this month.</p></p>
      
        
          <a class="fileThumb" href="/files/233822/38850658/Pat_Pack.png">
            <img
              data-src="/thumbnail/files/233822/38850658/Pat_Pack.png"
              src="/thumbnail/files/233822/38850658/Pat_Pack.png"
            >
          </a>
          <br>

some metadata here too:

      <meta name="service" content="patreon"/>
        <meta name="published" content="2020-07-26 16:57:56"/>
      <meta name="added" content="2020-09-16 11:13:47.903080"/>
      <meta name="id" content="39735940"/>

Worth noting, as seen above when a post only has one image, it's seemingly linked twice. Checking closer with another account that posts multiple attachments, the href="/files/ is the "header" image, and the other is actual attachment it seems.

Hope this is of any help!

@mikf
Copy link
Owner

mikf commented Jan 11, 2021

Added some initial support with e07dfc4. It kind of works, but duplicate files will be a problem. There are posts with 4 listed files, all identical and all with a different filename. And from what I can tell, there is no other way to decide whether two files might be the same before downloading and comparing them.

@kattjevfel
Copy link
Contributor

I'll take duplicate files over files with conflicting names :P

@ghost
Copy link

ghost commented Jan 12, 2021

Hello, I maintain Kemono. Just wanted to add that the software has APIs for both users and posts at /api/<service>/user/<id> and /api/<service>/user/<id>/post/<id> respectively. Think that'd be easier to scrape from instead of the HTML, which may change in the future. Let me know if there's any way I can help 👍

mikf added a commit that referenced this issue Jan 14, 2021
@mikf
Copy link
Owner

mikf commented Jan 14, 2021

@kemono-bugs
Thank you very much for mentioning the API endpoints, they are very helpful.
Where should I have been looking for them myself? Is there some sort of documentation anywhere I missed?

And another question, if you don't mind: What is with "duplicated" posts like https://kemono.party/patreon/user/94956/post/2337848? Even the API response contains two entries, both for post ID 2337848. Did something just go wrong when importing that post?

@ghost
Copy link

ghost commented Jan 14, 2021

@mikf

Where should I have been looking for them myself? Is there some sort of documentation anywhere I missed?

Unfortunately no, I haven't had the time to write documentation proper. All API routes can easily be found in the source code though.

And another question, if you don't mind: What is with "duplicated" posts like https://kemono.party/patreon/user/94956/post/2337848?

The site was originally designed to allow detected revisions of a post to be stored under the same ID as their parent. However, this also let unintended duplicates slip in, mostly due to user error (namely, clicking submit multiple times)
I'm currently working on an update that will clean up the unintentional dupes and limit the amount of posts for a single ID to one.

mikf added a commit that referenced this issue Jan 17, 2021
Use metadata from API responses as is and
don't try to detect duplicated by their original filename.
@mikf mikf closed this as completed Jan 23, 2021
@ghost
Copy link

ghost commented Jan 24, 2021

Unsure if this is out of scope for gallery-dl, but is it possible to add options for it to download text posts, and just text that accompanies posts? An idea I had would be, for each post for a user, would be a folder for said post. The folder containing images, text, and anything else the post might have had.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants