-
Notifications
You must be signed in to change notification settings - Fork 79
UDFs
Datalab supports Javascript user-defined functions (UDFs) for BigQuery. UDFs are Javascript functions that take a row object and emitter function as input, perform some computation, and then call the emitter function to output a resulting transformed row object (or possibly multiple rows). A UDF is thus similar to the Map function in a MapReduce.
The BigQuery UDF documentation explains that a defineFunction()
call is needed to define a UDF, including its input fields, output schema, and so forth. Datalab is a bit simpler; it makes use of jsdoc-style @param
comments to achieve the same result. Also note that a UDF function should not have a name or be assigned to a variable. It should ideally be stateless, as you cannot guarantee consistency of state across multiple nodes, but you can call support functions.
The basic skeleton of a UDF in Datalab looks like:
%bigquery udf -m sample_udf
/**
* A UDF function should take an input row `r` and emitter function `emitFn()`.
* It should create one or more transformed forms of the row and output
* them by calling the emitter.
*
* We define two parameters, below, which specify the schema of the input row and
* the output row.
*
* In our example we will output a row with one less column which is the product
* of the count and weight.
*
* @param {{timestamp: timestamp, label: string, count: integer, weight: float}} r
* @param function({{timestamp: timestamp, label: string, weighted_count:float}}) emitFn
*/
function(r, emitFn) {
emitFn({
timestamp: r.timestamp,
label: r.label,
weighted_count: r.count * r.weight
});
}
To apply this UDF to a table [foo]
we can use:
%%sql
SELECT * FROM sample_udf([foo])