-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thanks #1
Comments
Hi @NikolaiT The list of fingerprinting/detection surfaces that I have covered so far displays barely tip of the iceberg. In the upcoming weeks I will make some more updates. Stay tuned 😎 Generally the all bot detection technologies work in three "dimensions" and aim to find irregularities:
At the first sight it may sound overhelming, but you need to keep in mind that no anti-bot system should block access for regular users. To put it differently, if the anti-bot system is not 100% sure you are a bot, you are very likely not one, and you will pass the test. The system may generate you a score and based on that apply some evasion techniques e.g. slow down your requests, display "shadowed" data, send a captcha gateway. At this point your job is to polish your scraper, proxy until it perfectly resembles a real browser. Now, to your question:
I suggest addressing all three points mentioned above:
Good idea with using original Chrome. I can't say more than that, because I am not sure if |
Just wanted to drop by and say thanks. It's good to be aware of those techniques. It's insanely complex to not get detected.
What kind of scraping setup do you suggest?
I am currently going with something like this, what do you think?
The text was updated successfully, but these errors were encountered: