Notebook guidelines

Guidelines for writing notebooks

General guidelines

Specific points to keep in mind (most of this is part of PEP8 already):

Limit line length to 80 characters when reasonably possible
For functions that are more than a few lines (i.e. not easily self-describing), add a docstring (see this example)
Use logging module to give status updates from your code, rather than print (see example below)
Don't comment on obvious things, i.e. don't do this:
```
dataframe.plot()  # Plot the dataframe
```
Function and variable names should be all lowercase

Examples and recipes

Dealing with German number formats in pandas

When reading data, e.g. from a CSV file, pandas tries to automatically convert columns into the correct datatypes, i.e. parse numerical values.

object dtypes in a DataFrame are usually strings, suggesting that we probably want to check whether all numerical data have been parsed as either float or int dtypes. Often, the quickest way to do is to set the thousands and decimal arguments to pd.read_csv:

df = pd.read_csv(path_to_my_file, thousands='.', decimal=',')

An alternative approach is to manually process a column after reading the file, which is usually more appropriate in complex cases where multiple values need to be replaced or some other logic has to happen:

df['lat'] = df['lat'].str.replace('.', '').astype('float64')

Pandas provides a large number of vectorized (=fast) string methods via str, see the documentation for a complete list.

Using the `logging` module

You can set the logging level at the beginning of a notebook to only print messages that are at that level or higher:

import logging
logger = logging.getLogger()
logger.setLevel('INFO')

Then log for example with logging.error('An error occurred').

To add a timestamp to each log entry, add the following after the initialization code above:

formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s',
                              datefmt='%d %b %Y %H:%M:%S')
logger.handlers[0].setFormatter(formatter)

Complex configuration, e.g. long list of different URLs to download

To make this more readable, we can use """ to write a multi-line YAML string and then parse it into a Python dict.

Note: if dates are given in the form YYYY-MM-DD they are automatically parsed into datetime objects by the YAML parser, which simplifies later code -- if they should be treated as strings, they have to be wrapped as strings accordingly: 'YYYY-MM-DD'

import yaml

conf = """

Fingrid:
    wind:
        url_template: http://www.fingrid.fi/en/electricity-market/load-and-generation/windpower/Pages/default.aspx/Excel/TimeSeries.xls?beginDate=20150101&endDate=20151231&variables=H4sIAAAAAAAEAO29B2AcSZYlJi9tynt_SvVK1-B0oQiAYBMk2JBAEOzBiM3mkuwdaUcjKasqgcplVmVdZhZAzO2dvPfee--999577733ujudTif33_8_XGZkAWz2zkrayZ4hgKrIHz9-fB8_Ih7_ZFYX2aTMmyP769nyvDp6_CJb5EfPzn7_N1999fzsJ788--L49_99vv3m8V3-_PHZ8jKv25_MynV-dJ6VTf74rv_RY_7xrKoXWfu6rYvlxdGPj378x3ce3-1_8firZdG-yd-1R198d353_viu_fvxm6It86PvFstZuqqu8jq9yJd5nbVFtUy3uPWdx3el0ePnxTI_qcqqTu96f3y7XZRHP_7k5MnDHeo8_Pjx63l19eWyvH62btd1_jRrMzOWyDdA3aeP_bM5-n8AhQmq0kUBAAA1&cultureId=en-US&dataTimePrecision=5
        bundle: YEARLY
        start: 2014-11-28
        end: 2015-01-01
    ...
...

"""

conf = yaml.load(conf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook guidelines

Guidelines for writing notebooks

General guidelines

Examples and recipes

Dealing with German number formats in pandas

Using the `logging` module

Complex configuration, e.g. long list of different URLs to download

Clone this wiki locally

Notebook guidelines

Guidelines for writing notebooks

General guidelines

Examples and recipes

Dealing with German number formats in pandas

Using the logging module

Complex configuration, e.g. long list of different URLs to download

Clone this wiki locally

Using the `logging` module