-
Notifications
You must be signed in to change notification settings - Fork 79
Charting and Chart Controls
The chart code currently generates both static and dynamic charts. The dynamic charts are generated by Google Charts (or Plotly) as SVG objects. The static charts are created by having the SVG objects render to a PNG.
The reason we do both is so that the saved notebook has a static rendering of the chart to display without having to execute Javascript. This is especially useful in GitHub, which supports viewing static notebooks.
In an environment that can execute Javascript, we don't want to show the static chart if the dynamic one is available. For this reason we add the static chart as an output in the notebook JSON but don't render it. When the notebook is opened in a non-executing environment the PNG output will be shown. In an environment that executes Javascript the PNG will be shown briefly but will be replaced by the dynamic chart once that is available (which will in turn add -but not display - a new static PNG). This explains why charts/tables can seem to 'flash' when opening a notebook; the static PNGs are shown momentarily before being replaced by the dynamic charts.
It is possible to add controls to a chart. Changing the value of the control will change the value of a bound variable and re-execute the chart code; if the chart is plotting the result of a query that has a reference to the variable(s) in it then the query results will be changed and likewise the plot will update.
The chart code keeps a cache of the data that is plotted, keyed off the value of any control variables. This can improve performance considerably if viewing a previously viewed chart/variable combination, but there is no current limit on the size of the cache, so there is some risk of memory exhaustion.
The controls are specified by YAML in the body of the %chart cell (handled by _utils/parse_control_options
in datalab.utils.commands
). The different types currently supported are set, picker, textbox, checkbox and slider. As this area is usually used to pass options to Google Charts, the controls must all be specified in a variables
list. For example, if we define the SQL module natality
thus:
%%sql -m natality
state='WA'
married=True
SELECT mother_age, COUNT(*) AS count FROM [publicdata:samples.natality]
WHERE state==$state AND mother_married==$married
GROUP BY mother_age
ORDER BY mother_age
We can use the cell below to chart it:
%chart columns -d natality
title: Births by Age of Mother
hAxis:
title: Age
variables:
state:
type: picker
choices: [AZ, TX, WA]
married:
type: boolean
This gives us a drop-down picker bound to the state
variable allowing us to choose between three states,
as well as a checkbox bound to the married
variable.
Note that for the checkbox
we did not specify the control type; checkbox
is the default
if the variable type is Boolean. Similarly, textbox
is the default control type for string variables (whether
that be because we specified the type
field or because we specified no type, just a value), and set
is
the default control type for list variables.
In general, each control will have a name, which should be the same as the variable to which it is bound,
an optional type
, an optional label
(if not specified the variable name will be used), an optional
initial value
, and then some type-specific attributes:
- for picker and set (which is a multi-picker), a list of
choices
- for slider, a
min
andmax
For set
or picker
controls it is possible to specify the set of choices in the YAML or in the %sql cell.
For example, we could declare the variable in the %sql module like this:
%%sql -m natality
state = ['AZ', 'TX', 'WA']
married=True
SELECT mother_age, COUNT(*) AS count FROM [publicdata:samples.natality]
WHERE state IN $state AND mother_married==$married
GROUP BY mother_age
ORDER BY mother_age
and then just do this:
%chart columns -d natality
title: Births by Age of Mother
hAxis:
title: Age
variables:
state:
type set
married:
type: boolean
One of the complaints we have had about the charts subsystem is that it is not very easy to figure out how
to use; the help generated for %chart
is minimal.
It was never reviewed and thus never merged, but is worth pointing out that there was a big attempt made to address this by adding both better help text and data schema validation to the chart subsystem. This PR can be found here.
Plotly support
The charts code in JS uses an abstract base class for a chart 'driver'. This has a Google Charts subclass as well as a Plotly subclass. The Plotly subclass is currently very limited and was added to allow us to draw confusion matrices, something not supported by Google Charts.
The code for this is quite ugly because the format of the data passed to the chart render()
function is currently optimized for Google Charts, and needs to be quite significantly massaged to be used by Plotly. This may be okay for the current use case, but if we were to extend the Plotly support we would probably want to revisit the data format in order to come up with one that is more general and less Google Charts-specific.
Some things that are worth considering in the future for the charts subsystem:
- it may make sense to rebuild the chart system using
ipywidgets
. Right nowipywidgets
isn't working 100% in Datalab; possible due to CSS issues. It is close though. The benefit here is that ipywidgets has a number of existing control types that we could use instead of building our own. - an interesting recent project on Github that provides similar functionality but with a different idiom is this.