Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more Proxy Providers #36

Open
pgaref opened this issue Jul 31, 2017 · 4 comments
Open

Add more Proxy Providers #36

pgaref opened this issue Jul 31, 2017 · 4 comments

Comments

@pgaref
Copy link
Owner

pgaref commented Jul 31, 2017

Possible Proxy Lists

Providers that require an API key:

Could also parse related forum thread - blackhatworld:

@la55u
Copy link
Contributor

la55u commented Jul 2, 2018

I may add a couple of these later if you provide some info on how to do it, what needs to be implemented etc.

@pgaref
Copy link
Owner Author

pgaref commented Jul 3, 2018

Hello @la55u

Every proxy parser currently extends the base UrlParser class which represents any URL containing Proxy information.
If you check an implementation, such as SamairProxyParser class, it mainly overrides the parse_proxyList method which does three things: 1) parses the page html code 2) retrieves proxy information from html and 3) returns a list of proxyObjects The html parsing part is automated by BeautifulSoup which should makes it a bit easier.

If you want to support a new provider, for instance coolProxy you would create a new class extending UrlParser. Then by inspecting the html fields needs and using BeautifulSoup you could retrive the proexy information. You might also need to decode hidden information: for example the IP of the specific provider is encoded. You will need to do something like:

base64.b64decode(codecs.getencoder( "rot-13" )("IP_string"))

PS: Some of the existing provides have updated their websites, adding extra javascript or encodings to hide proxy information (thats why some existing providers currently fail). However this does not mean there is not a way around it :)

Let me know if this makes sense - I would be happy to help!

@la55u
Copy link
Contributor

la55u commented Jul 3, 2018

I don't really get this encoding. For example the IPs that are listed on the coolProxy website are not the actual proxy IPs that we need?
edit: oh wait I think i get it; the IPs are not present in the html when we query it from python!
i'll add this site later this week.

@pgaref
Copy link
Owner Author

pgaref commented Jul 3, 2018

Hey @la55u

When you view the source code of the provider (Ctrl+U with Google Chrome) you will realise that every proxy is a row in an html table. In that table most of the information can be traversed directly but the IPs for example are 'text/javascript' elements - meaning that you need to do a bit more to decode them :)

In the provider above for example, the first tag(td) in the first table row I found looks like:

<script type="text/javascript">document.write(Base64.decode(str_rot13("BGZhBGxhAv4kAGt=")))</script>

Now if we do encode in rot13 the stirng above: codecs.getencoder( "rot-13" )("BGZhBGxhAv4kAGt=") we will get something like: "OTMuOTkuNi4xNTg="
Then if we decode to base64 the above we get: 93.99.6.158 which is the IP we were actually looking for!

Does it make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants