Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

So many 'failed to GET url' #323

Open
minotaurrr opened this issue Nov 15, 2017 · 6 comments
Open

So many 'failed to GET url' #323

minotaurrr opened this issue Nov 15, 2017 · 6 comments

Comments

@minotaurrr
Copy link

I'm just doing horseman.open('https://www.google.com') for testing but getting sooo many failed to get URL just at random times - maybe about 7 out of 10 times it'll fail.

any idea why?

@nelsonwittwer
Copy link

nelsonwittwer commented Nov 17, 2017

Kicked the tires for this library following the docs for the project and saw a similar thing. Both Twitter and Google examples failed to run.

horseman v3.3.0
node v 8.9.1

@minotaurrr
Copy link
Author

Tried on multiple hosts, and did notice that frequencies vary. But still getting the same error at some point evenutially

@grohsfabian
Copy link

Up to this topic, same happening to me

@NoelDavies
Copy link

Up to this, I'm getting it repeatedly, not can I catch them

@t0ursene
Copy link

t0ursene commented Jan 4, 2018

minotaurrr, Google detects scrapper and banned your IP address very quickly.
That's mean you can only "horseman.open('http://google.com') " ONCE every 5 minutes. If you want to scrap it more than 1 time per 5 minutes, you need to :

  • set up proxy in horseman options
  • clean cookies with horseman.cookies()
  • changing User-Agent in horseman
    -also modify your value in horseman.wait(value). If you always have same timing interval between your request, google will flagged it.

@jorgerosal
Copy link

Google must have banned your IP. Set the time interval between GET request OR set a list of proxy and cycle through randomly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants