Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.
Graham Wheeler edited this page Jun 2, 2016 · 5 revisions

Datalab supports Javascript user-defined functions (UDFs) for BigQuery. UDFs are Javascript functions that take a row object and emitter function as input, perform some computation, and then call the emitter function to output a resulting transformed row object (or possibly multiple rows). A UDF is thus similar to the Map function in a MapReduce.

The BigQuery UDF documentation explains that a defineFunction() call is needed to define a UDF, including its input fields, output schema, and so forth. Datalab is a bit simpler; it makes use of jsdoc-style @param comments to achieve the same result. Also note that a UDF function should not have a name or be assigned to a variable. It should ideally be stateless, as you cannot guarantee consistency of state across multiple nodes, but you can call support functions.

The basic skeleton of a UDF in Datalab looks like:

%bigquery udf -m sample_udf
/**
 * A UDF function should take an input row `r` and emitter function `emitFn()`.
 * It should create one or more transformed forms of the row and output
 * them by calling the emitter.
 *
 * We define two parameters, below, which specify the schema of the input row and
 * the output row.
 *
 * In our example we will output a row with one less column which is the product
 * of the count and weight.
 *
 * @param {{timestamp: timestamp, label: string, count: integer, weight: float}} r
 * @param function({{timestamp: timestamp, label: string, weighted_count:float}}) emitFn
 */
function(r, emitFn) {
  emitFn({
    timestamp: r.timestamp,
    label: r.label,
    weighted_count: r.count * r.weight
  });
}

To apply this UDF to a table [foo] we can use:

%%sql 
SELECT * FROM sample_udf([foo])