-
Notifications
You must be signed in to change notification settings - Fork 79
Asynchronous Methods and Jobs
Datalab has a unified, general mechanism for handling asynchronous tasks via the datalab.utils.Job
class. Jobs can be waited on by calling their wait()
or result()
methods or by calling either of the top-level functions wait_one()
or wait_all()
.
The base Job class uses a Python concurrent.futures.Future
object to support async behavior. Subclasses can
use the Future mechanism or implement their way of checking job status and blocking until completion. For example, BigQuery jobs (such as running a query) use polling via an HTTP request to check on job status, so the
subclass datalab.bigquery.Job
makes no use of the Future
.
The datalab.utils
module implements two decorators which allow arbitrary functions or methods to be turned into
asynchronous Jobs, namely @async_function
and @async_method
. If you use these to decorate a function or method, the function will return a Job object, and the result()
method will block to completion and give the result. For example:
@async_function
def double(x):
return x + x
will define an asynchronous function to double a number. In this case of course this is overkill, but it serves to illustrate the point. To call this function we would use something like:
job = double(10)
while !job.is_complete():
# Do something else
print "waiting..."
if job.failed():
print "Failed! %s" % (', '.join(job.errors))
else:
print job.result()
Note that Datalab currently does not support any of the IPython parallel programming features. If that changes in future then Jobs should perhaps be reimplemented in terms of that framework.