Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Asynchronous Methods and Jobs

Graham Wheeler edited this page Jun 2, 2016 · 3 revisions

Datalab has a unified, general mechanism for handling asynchronous tasks via the datalab.utils.Job class. Jobs can be waited on by calling their wait() or result() methods or by calling either of the top-level functions wait_one() or wait_all().

The base Job class uses a Python concurrent.futures.Future object to support async behavior. Subclasses can use the Future mechanism or implement their way of checking job status and blocking until completion. For example, BigQuery jobs (such as running a query) use polling via an HTTP request to check on job status, so the subclass datalab.bigquery.Job makes no use of the Future.

The datalab.utils module implements two decorators which allow arbitrary functions or methods to be turned into asynchronous Jobs, namely @async_function and @async_method. If you use these to decorate a function or method, the function will return a Job object, and the result() method will block to completion and give the result. For example:

@async_function
def double(x):
  return x + x

will define an asynchronous function to double a number. In this case of course this is overkill, but it serves to illustrate the point. To call this function we would use something like:

job = double(10)
while !job.is_complete():
  # Do something else
  print "waiting..."

if job.failed():
  print "Failed! %s" % (', '.join(job.errors))
else:
 print job.result()

Note that Datalab currently does not support any of the IPython parallel programming features. If that changes in future then Jobs should perhaps be reimplemented in terms of that framework.