[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

compound-dumbo · 2020-06-02T20:28:13Z

Describe the bug
Currently, when a scene scraper is run, the resulting Detail field's newlines get removed.

To Reproduce
Steps to reproduce the behavior:

Go to the edit tab on a scene
Fill in an URL of a scene with a multiline description that has a scraper associated with it
Scrape the scene details
Profit

Expected behavior
Since Detail is presented as a multiline textbox, I would expect newlines to survive.

Stash Version: v0.1.1-167-gdc5efb9

bnkai · 2020-06-02T21:09:53Z

This was already mentioned in discord channel and here stashapp/CommunityScrapers#49 . The problem is that the xpath code applies some common postprocessing that removes multiple spaces and newlines for every field. For the details one I think we can skip the line ( "\n" ) removal.
To complete this the scene details panel in the UI needs the pre class defined in the css

.pre { 
white-space: pre-line;
}

Was adviced
I had some code that wasn't working as I wanted but I forgot to revisit that, I'll have another look when I can.

bnkai · 2020-06-06T10:49:32Z

related to #579

WithoutPants · 2020-08-19T05:26:52Z

@bnkai is this still a reproducible?

bnkai · 2020-08-19T14:45:58Z

@WithoutPants I wouldn't say reproducible since I don't have a test sample available but it's not yet 100% resolved. Nodetext function that processes every field still removes newlines. #579 works for newlines that are added by the user or are part of an element attribute but not for newlines that are already processed by the nodeText function

WithoutPants · 2021-08-30T01:29:51Z

Closed as presumably stale. If we're still getting this, we can reopen or open a new issue.

bkbd3177 · 2024-03-22T14:44:03Z

@WithoutPants Please see comments in this issue:
stashapp/CommunityScrapers#123 (comment)
stashapp/CommunityScrapers#123 (comment)

There is still an issue with nodeText function in how text is being scraped when there are new lines in HTML. I provided the instructions to reproduce. Can this issue be reopened?

Scraper BluMedia.yml

When scraping scenes from collegedudes.com, the Details selector (under cdScraper) seems to be smashing together text broken up with line breaks in the HTML source, so that there's no space separating the words. For instance, on https://www.collegedudes.com/play/MTgz/cody-busts-a-nut the text that says "around to watch us" becomes "around towatch us".

compound-dumbo added the help wanted Extra attention is needed label Jun 2, 2020

WithoutPants closed this as completed Aug 30, 2021

Maista6969 mentioned this issue Mar 22, 2024

Broken Scrapers stashapp/CommunityScrapers#123

Closed

WithoutPants reopened this Mar 22, 2024

Maista6969 added this to Scraper system improvements Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

compound-dumbo commented Jun 2, 2020

bnkai commented Jun 2, 2020

bnkai commented Jun 6, 2020

WithoutPants commented Aug 19, 2020

bnkai commented Aug 19, 2020

WithoutPants commented Aug 30, 2021

bkbd3177 commented Mar 22, 2024

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields #591

Comments

compound-dumbo commented Jun 2, 2020

bnkai commented Jun 2, 2020

bnkai commented Jun 6, 2020

WithoutPants commented Aug 19, 2020

bnkai commented Aug 19, 2020

WithoutPants commented Aug 30, 2021

bkbd3177 commented Mar 22, 2024