Bypassing bot detection #4828

tom-anders · 2025-01-04T15:00:09Z

tom-anders
Jan 4, 2025

Some sites (like for example www.rewe.de/rezepte) can currently not be scraped due to their bot detection. In the case of rewe.de/rezepte, recipe_scrapers even has a working scraper, but we cannot use it with Mealie right now due to this issue.

I found that https://pypi.org/project/cloudscraper/ is able to circumvent the bot detection and get the full HTML - could Mealie maybe (optionally) use this library for getting the HTML of a recipe page?

roachadam · 2025-01-04T16:44:25Z

roachadam
Jan 4, 2025

I would be willing to contribute to this feature. I'm sure there will be more sites in the future that may get blocked by some kind of bot protection, including Cloudflare.

If I were to approach this with a PR, I would suggest going with a SeleniumBase solution. Identify failures in normal request-based processing, if bot protection is identified, fallback to SB.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bypassing bot detection #4828

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Bypassing bot detection #4828

tom-anders Jan 4, 2025

Replies: 1 comment

roachadam Jan 4, 2025

tom-anders
Jan 4, 2025

roachadam
Jan 4, 2025