Bypassing bot detection #4828
tom-anders
started this conversation in
Ideas
Replies: 1 comment
-
I would be willing to contribute to this feature. I'm sure there will be more sites in the future that may get blocked by some kind of bot protection, including Cloudflare. If I were to approach this with a PR, I would suggest going with a SeleniumBase solution. Identify failures in normal request-based processing, if bot protection is identified, fallback to SB. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Some sites (like for example
www.rewe.de/rezepte
) can currently not be scraped due to their bot detection. In the case ofrewe.de/rezepte
,recipe_scrapers
even has a working scraper, but we cannot use it with Mealie right now due to this issue.I found that https://pypi.org/project/cloudscraper/ is able to circumvent the bot detection and get the full HTML - could Mealie maybe (optionally) use this library for getting the HTML of a recipe page?
Beta Was this translation helpful? Give feedback.
All reactions