Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

Open
compound-dumbo opened this issue Jun 2, 2020 · 6 comments
Labels
help wanted Extra attention is needed

Comments

@compound-dumbo
Copy link

Describe the bug
Currently, when a scene scraper is run, the resulting Detail field's newlines get removed.

To Reproduce
Steps to reproduce the behavior:

  1. Go to the edit tab on a scene
  2. Fill in an URL of a scene with a multiline description that has a scraper associated with it
  3. Scrape the scene details
  4. Profit

Expected behavior
Since Detail is presented as a multiline textbox, I would expect newlines to survive.

Stash Version: v0.1.1-167-gdc5efb9

@compound-dumbo compound-dumbo added the help wanted Extra attention is needed label Jun 2, 2020
@bnkai
Copy link
Collaborator

bnkai commented Jun 2, 2020

This was already mentioned in discord channel and here stashapp/CommunityScrapers#49 . The problem is that the xpath code applies some common postprocessing that removes multiple spaces and newlines for every field. For the details one I think we can skip the line ( "\n" ) removal.
To complete this the scene details panel in the UI needs the pre class defined in the css

.pre { 
white-space: pre-line;
}

Was adviced
I had some code that wasn't working as I wanted but I forgot to revisit that, I'll have another look when I can.

@bnkai
Copy link
Collaborator

bnkai commented Jun 6, 2020

related to #579

@WithoutPants
Copy link
Collaborator

@bnkai is this still a reproducible?

@bnkai
Copy link
Collaborator

bnkai commented Aug 19, 2020

@WithoutPants I wouldn't say reproducible since I don't have a test sample available but it's not yet 100% resolved. Nodetext function that processes every field still removes newlines. #579 works for newlines that are added by the user or are part of an element attribute but not for newlines that are already processed by the nodeText function

@WithoutPants
Copy link
Collaborator

Closed as presumably stale. If we're still getting this, we can reopen or open a new issue.

@bkbd3177
Copy link

@WithoutPants Please see comments in this issue:
stashapp/CommunityScrapers#123 (comment)
stashapp/CommunityScrapers#123 (comment)

There is still an issue with nodeText function in how text is being scraped when there are new lines in HTML. I provided the instructions to reproduce. Can this issue be reopened?

Scraper BluMedia.yml

When scraping scenes from collegedudes.com, the Details selector (under cdScraper) seems to be smashing together text broken up with line breaks in the HTML source, so that there's no space separating the words. For instance, on https://www.collegedudes.com/play/MTgz/cody-busts-a-nut the text that says "around to watch us" becomes "around towatch us".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
Status: No status
Development

No branches or pull requests

4 participants