You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Issues to be fixed in htmlparse in Telegraaf and Metro rss scrapers:
Metro htmlparser for text also catches some 'invisible' HTML that is not part of the main article text. (Likely they have CSS display: none applied?)
Telegraaf htmlparser is unable to parse some texts, because they are not included in the HTML, but only load after a script is run on the website. Possible solution... htmlsource is a string that has the text included in the script: "articleBody": "HERE IS THE TEXT.","author":
if text.strip() == "":
logger.warning("Trying alternative method....")
#parse the text from htmlsource```
The text was updated successfully, but these errors were encountered:
Issues to be fixed in htmlparse in Telegraaf and Metro rss scrapers:
The text was updated successfully, but these errors were encountered: