Skip to content
This repository has been archived by the owner on Sep 3, 2022. It is now read-only.

Charting and Chart Controls

Graham Wheeler edited this page Jun 2, 2016 · 5 revisions

Static and Dynamic Charts

The chart code currently generates both static and dynamic charts. The dynamic charts are generated by Google Charts (or Plotly) as SVG objects. The static charts are created by having the SVG objects render to a PNG.

The reason we do both is so that the saved notebook has a static rendering of the chart to display without having to execute Javascript. This is especially useful in GitHub, which supports viewing static notebooks.

In an environment that can execute Javascript, we don't want to show the static chart if the dynamic one is available. For this reason we add the static chart as an output in the notebook JSON but don't render it. When the notebook is opened in a non-executing environment the PNG output will be shown. In an environment that executes Javascript the PNG will be shown briefly but will be replaced by the dynamic chart once that is available (which will in turn add -but not display - a new static PNG). This explains why charts/tables can seem to 'flash' when opening a notebook; the static PNGs are shown momentarily before being replaced by the dynamic charts.

Chart controls

It is possible to add controls to a chart. Changing the value of the control will change the value of a bound variable and re-execute the chart code; if the chart is plotting the result of a query that has a reference to the variable(s) in it then the query results will be changed and likewise the plot will update.

The chart code keeps a cache of the data that is plotted, keyed off the value of any control variables. This can improve performance considerably if viewing a previously viewed chart/variable combination, but there is no current limit on the size of the cache, so there is some risk of memory exhaustion.

The controls are specified by YAML in the body of the %chart cell (handled by _utils/parse_control_options in datalab.utils.commands). The different types currently supported are set, picker, textbox, checkbox and slider. As this area is usually used to pass options to Google Charts, the controls must all be specified in a variables list. For example, if we define the SQL module natality thus:

%%sql -m natality

state='WA'
married=True

SELECT mother_age, COUNT(*) AS count FROM [publicdata:samples.natality]
WHERE state==$state AND mother_married==$married
GROUP BY mother_age
ORDER BY mother_age

We can use the cell below to chart it:

%chart columns -d natality
title: Births by Age of Mother
hAxis:
  title: Age
variables:
  state:
    type: picker
    choices: [AZ, TX, WA]
  married:
    type: boolean

This gives us a drop-down picker bound to the state variable allowing us to choose between three states, as well as a checkbox bound to the married variable. Note that for the checkbox we did not specify the control type; checkbox is the default if the variable type is Boolean. Similarly, textbox is the default control type for string variables (whether that be because we specified the type field or because we specified no type, just a value), and set is the default control type for list variables.

In general, each control will have a name, which should be the same as the variable to which it is bound, an optional type, an optional label (if not specified the variable name will be used), an optional initial value, and then some type-specific attributes:

  • for picker and set (which is a multi-picker), a list of choices
  • for slider, a min and max

For set or picker controls it is possible to specify the set of choices in the YAML or in the %sql cell. For example, we could declare the variable in the %sql module like this:

%%sql -m natality

state = ['AZ', 'TX', 'WA']
married=True

SELECT mother_age, COUNT(*) AS count FROM [publicdata:samples.natality]
WHERE state IN $state AND mother_married==$married
GROUP BY mother_age
ORDER BY mother_age

and then just do this:

%chart columns -d natality
title: Births by Age of Mother
hAxis:
  title: Age
variables:
  state:
    type set
  married:
    type: boolean

Improving the Help

One of the complaints we have had about the charts subsystem is that it is not very easy to figure out how to use; the help generated for %chart is minimal.

It was never reviewed and thus never merged, but is worth pointing out that there was a big attempt made to address this by adding both better help text and data schema validation to the chart subsystem. This PR can be found here.

Plotly support

The charts code in JS uses an abstract base class for a chart 'driver'. This has a Google Charts subclass as well as a Plotly subclass. The Plotly subclass is currently very limited and was added to allow us to draw confusion matrices, something not supported by Google Charts.

The code for this is quite ugly because the format of the data passed to the chart render() function is currently optimized for Google Charts, and needs to be quite significantly massaged to be used by Plotly. This may be okay for the current use case, but if we were to extend the Plotly support we would probably want to revisit the data format in order to come up with one that is more general and less Google Charts-specific.

Future

Some things that are worth considering in the future for the charts subsystem:

  • it may make sense to rebuild the chart system using ipywidgets. Right now ipywidgets isn't working 100% in Datalab; possible due to CSS issues. It is close though. The benefit here is that ipywidgets has a number of existing control types that we could use instead of building our own.
  • an interesting recent project on Github that provides similar functionality but with a different idiom is this.