Skip to content

Parallel HTTP requests in Python

Louis Maddox edited this page Feb 11, 2021 · 4 revisions

Edit: thankfully this is now outdated! See Parallel asynchronous GET requests with asyncio instead

  • I've come across this in Go and thought it looked hairy, but the solution in Python seems simple (and old news even!)

  • Will Larson's 2008 blog post shares the following (which I've Py3-ified and replaced urllib with requests):

    from threading import Thread, enumerate
    from requests import get
    from time import sleep
    from functools import reduce
    
    UPDATE_INTERVAL = 0.01
    
    class URLThread(Thread):
        def __init__(self, url):
            super(URLThread, self).__init__()
            self.url = url
            self.response = None
    
        def run(self):
            self.response = get(self.url)
    
    def multi_get(uris, timeout=2.0):
        def alive_count(lst):
            alive = map(lambda x: 1 if x.isAlive() else 0, lst)
            return reduce(lambda a,b: a+b, alive)
        threads = [URLThread(uri) for uri in uris]
        for thread in threads:
            thread.start()
        while alive_count(threads) > 0 and timeout > 0.0:
            timeout = timeout - UPDATE_INTERVAL
            sleep(UPDATE_INTERVAL)
        return [(x.url, x.response) for x in threads]
  • simply execute this function and it will return an array of (url, payload) tuples corresponding to the input sites array order once the last request is returned

Clone this wiki locally