Brainbox - experiment successful (int-brain-lab#85)

* Initialisation of brainbox package with one data example * examples import from brainbox * synchronisation example * synchronisation example: read full traces * Added brainbox folder structure and readmes from the other repos * Update README.md Updated link to CONTRIBUTING.md in the brainbox readme * WIP: add xcorr_numpy example * Create IBL_sync_check.py * Update IBL_sync_check.py * Update IBL_sync_check.py * Create bin_multiple_types.py * Update IBL_sync_check.py * Added IBL-specific task module in brainbox * Initialisation of brainbox package with one data example * examples import from brainbox * synchronisation example * synchronisation example: read full traces * Added brainbox folder structure and readmes from the other repos * Update README.md Updated link to CONTRIBUTING.md in the brainbox readme * WIP: add xcorr_numpy example * Create IBL_sync_check.py * Update IBL_sync_check.py * Update IBL_sync_check.py * Create bin_multiple_types.py * Update IBL_sync_check.py * Create IBL_sync_check.py * Update IBL_sync_check.py * Init core and proc modules, started on syncbin * Update IBL_sync_check.py * Update IBL_sync_check.py * Fixed CI problem with brainbox processing.sync * brainbox.behaviour.wheel move from ibllib * Commit for the heck of it * Commit for the heck of it * Fleshed out core.sync and TimeSeries. Fixed the british problem. * Finished up sync function and tests * add __init__.py to brainbox tests * added brainbox behavior init * Delete bin_multiple_types.py bin_multiple_types is redundant given processing.sync, which uses a more robust behavior for interpolation. Also bin_multiple types had hard-coded IO and explicitly took wheels, trials, and time series information which are IBL specific structures. * Update IBL_sync_check.py * Added bin_spikes function shuffled chairs around * refactor behavior module * Rectified another import problem from bad renaming * Added docstrings to core types and sync * More documentation, added extrap behavior to sync * Expanded docstrings and set a fixed testing seed * flake * move sync check to iblscripts * Added numpy array functionality to sync * flake...
Yiman00 · Jul 25, 2019 · 74aa495 · 74aa495
1 parent 38cde2f
commit 74aa495
Show file tree

Hide file tree

Showing 30 changed files with 545 additions and 80 deletions.
diff --git a/brainbox/CONTRIBUTING.md b/brainbox/CONTRIBUTING.md
@@ -0,0 +1,80 @@
+Table of Contents
+=================
+
+   * [Contributing to Brainbox](#contributing-to-brainbox)
+   * [Installing the right python environment (10 minutes)](#installing-the-right-python-environment-10-minutes)
+   * [Git, GitFlow, and you (15 minutes)](#git-gitflow-and-you-15-minutes)
+   * [Writing code for Brainbox](#writing-code-for-brainbox)
+
+# Contributing to Brainbox
+
+Things you need to be familiar with before contributing to Brainbox:
+* Fundamentals of Python programming and how to use NumPy for math
+* How to use Git and GitFlow to contribute to a repository
+* Our guidelines on how to write readable and understandeable code
+
+Below is a guide which will take you from the ground up through the process of contributing to the brainbox software package. Some of these sections may already be familiar to you, but it may be worth skimming them again in case you've forgotten some of the nuances of using python, git, github, or unit tests.
+
+# Installing the right python environment (10 minutes)
+
+**TL;DR: We provide an `environment.yml` file. Use Anaconda to create an environment which only contains the packages Brainbox needs.**
+
+We suggest using [Anaconda](https://www.anaconda.com/distribution/), which is developed by continuum.io, as your basis for developing Brainbox. Begin by downloading the most recent version of Python 3 Anaconda for your operating system. Install using the installation instructions on the Anaconda website, and make sure that you can interact successfully with the `conda` command in either a terminal (OS X, Linux) or in the Anaconda Prompt provided on Windows.
+
+Once you have installed Anaconda, the next step is to create an environment for working with brainbox. This requires you to have the `environment.yml` file which lives in the top directory of this repository. We will just clone the whole repository now though, since you will need it later, using the following command on *nix systems:
+
+```bash
+git clone https://github.com/int-brain-lab/brainbox
+```
+
+Note: please navigate to the folder where you want to run this command beforehand, e.g. `/home/username/Documents` if you want the `brainbox` repository to live in your Documents folder
+
+For Windows users we recommend using [git for Windows](https://gitforwindows.org/) as a Windows TTL emulator, which will allow for you to run the above command without any changes. That software also includes a graphical git interface which can help new users.
+
+Once you have cloned the repository and downloaded Anaconda, navigate to the top level of Brainbox where the `environment.yml` file is, then run the following command in a terminal or Anaconda prompt session:
+
+```bash
+conda env create -f environment.yml
+```
+
+Type "yes" when prompted and conda will install everything you need to get working on brainbox! After this you will need to run
+
+```bash
+conda activate bbx
+```
+
+if you are developing from the terminal, in order to activate the environment you just installed. **Always do this when you create a new terminal to develop Brainbox! This way you don't code in packages that brainbox doesn't support!** 
+
+# Git, GitFlow, and you (15 minutes)
+
+**TL;DR: We use [Git](https://rogerdudler.github.io/git-guide/) with a [GitFlow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) workflow to develop Brainbox. Please create new feature branches of `develop` for writing code and then make a pull request to have it added to `develop`.**
+
+For those unfamiliar with it, Git is a system for *version control*, which allows you to make changes to whatever you put into it (Git isn't limited to just code!) that are:
+
+* Tracked (When?)
+* Revertable 
+* Identifiable (Who? Why?)
+* Branching
+
+That last bit is crucial to how we develop brainbox and how Git works.
+
+Git allows for multiple versions of a repository (which is a glorified name for a folder of stuff) that can exist at the same time, in parallel. Each version, called a branch, contains its own internal history and lets you undo changes.
+
+This way you can keep a version of your code that you know works (called `master`), a version where you have new stuff you're still working on (called `develop` in our repository), and branches for trying out specific ideas all at the same time.
+
+For an explanation of the basics of Git, [this guide by Roger Dudler](http://git.huit.harvard.edu/guide/) is a necessary five-minute read on the basics of manipulating a repository.
+
+Brainbox uses [GitFlow](https://www.atlassian.com/git/tutorials/comparing-workflows/gitflow-workflow) as a model for how to organize our repository. This means that there are two main branches that always exist, `master` and `develop`, the latter of which is the basis for all development of the toolbox. If you want to incorporate a new feature into the repository, e.g. a raster plot, you can run the following git command in your repository:
+
+```bash
+git flow feature start rasterplot
+```
+
+and Git will automatically create a new branch for you to work on and make it active. Once you've created a few commits and feel confident that your code is working well, you can create a pull request on GitHub so we can incorporate your code into the `develop` branch for future release.
+
+
+# Writing code for Brainbox
+
+We require all code in Brainbox to conform to [PEP8](https://www.python.org/dev/peps/pep-0008/) guidelines, with [Numpy-style](https://numpydoc.readthedocs.io/en/latest/format.html) docstrings. We require all contributors to use `flake8` as a linter to check their code before a pull request.
+
+[MORE GUIDELINES HERE PLEASE]
diff --git a/brainbox/README.md b/brainbox/README.md
@@ -0,0 +1,5 @@
+# brainbox
+
+## Contributing
+
+To contribute to this repository, please read [our guide to contributing](https://github.com/int-brain-lab/ibllib/blob/brainbox/brainbox/CONTRIBUTING.md)
diff --git a/ibllib/behaviour/__init__.py → brainbox/__init__.py b/ibllib/behaviour/__init__.py → brainbox/__init__.py
diff --git a/brainbox/behavior/__init__.py b/brainbox/behavior/__init__.py
@@ -0,0 +1 @@
+from .behavior import *
diff --git a/brainbox/behavior/behavior.py b/brainbox/behavior/behavior.py
diff --git a/ibllib/behaviour/wheel.py → brainbox/behavior/wheel.py b/ibllib/behaviour/wheel.py → brainbox/behavior/wheel.py
diff --git a/brainbox/core/__init__.py b/brainbox/core/__init__.py
@@ -0,0 +1 @@
+from .core import *
diff --git a/brainbox/core/core.py b/brainbox/core/core.py
@@ -0,0 +1,64 @@
+'''
+Core data types and functions which support all of brainbox.
+'''
+import numpy as np
+
+
+class Bunch(dict):
+    """A subclass of dictionary with an additional dot syntax."""
+
+    def __init__(self, *args, **kwargs):
+        super(Bunch, self).__init__(*args, **kwargs)
+        self.__dict__ = self
+
+    def copy(self):
+        """Return a new Bunch instance which is a copy of the current Bunch instance."""
+        return Bunch(super(Bunch, self).copy())
+
+
+class TimeSeries(dict):
+    """A subclass of dict with dot syntax, enforcement of time stamping"""
+
+    def __init__(self, times, values, columns=None, *args, **kwargs):
+        """TimeSeries objects are explicity for storing time series data in which entry (row) has
+        a time stamp associated. TS objects have obligatory 'times' and 'values' entries which
+        must be passed at construction, the length of both of which must match. TimeSeries takes an
+        optional 'columns' argument, which defaults to None, that is a set of labels for the
+        columns in 'values'.
+
+        :param times: an ordered object containing a list of timestamps for the time series data
+        :param values: an ordered object containing the associated measurements for each time stamp
+        :param columns: a tuple or list of column labels, defaults to none. Each column name will
+            be exposed as ts.colname in the TimeSeries object unless colnames are not strings.
+
+        Also can take any additional kwargs beyond times, values, and columns for additional data
+        storage like session date, experimenter notes, etc.
+
+        Example:
+        timestamps, mousepos = load_my_data()  # in which mouspos is T x 2 array of x,y coordinates
+        positions = TimeSeries(times=timestamps, values=mousepos, columns=('x', 'y'),
+                               analyst='John Cleese', petshop=True,
+                               notes=("Look, matey, I know a dead mouse when I see one, "
+                                      'and I'm looking at one right now."))
+        """
+        super(TimeSeries, self).__init__(times=np.array(times), values=np.array(values),
+                                         columns=columns, *args, **kwargs)
+        self.__dict__ = self
+        self.columns = columns
+        if self.values.ndim == 1:
+            self.values = self.values.reshape(-1, 1)
+
+        # Enforce times dict key which contains a list or array of timestamps
+        if len(self.times) != len(values):
+            raise ValueError('Time and values must be of the same length')
+
+        # If column labels are passed ensure same number of labels as columns, then expose
+        # each column label using the dot syntax of a Bunch
+        if isinstance(self.values, np.ndarray) and columns is not None:
+            if self.values.shape[1] != len(columns):
+                raise ValueError('Number of column labels must equal number of columns in values')
+            self.update({col: self.values[:, i] for i, col in enumerate(columns)})
+
+    def copy(self):
+        """Return a new TimeSeries instance which is a copy of the current TimeSeries instance."""
+        return TimeSeries(super(TimeSeries, self).copy())
diff --git a/brainbox/experimental/__init__.py b/brainbox/experimental/__init__.py
diff --git a/brainbox/io/__init__.py b/brainbox/io/__init__.py
diff --git a/brainbox/misc.py b/brainbox/misc.py
diff --git a/brainbox/population/__init__.py b/brainbox/population/__init__.py
diff --git a/brainbox/processing/__init__.py b/brainbox/processing/__init__.py
@@ -0,0 +1 @@
+from .processing import *
diff --git a/brainbox/processing/processing.py b/brainbox/processing/processing.py
@@ -0,0 +1,182 @@
+'''
+Set of functions for processing data from one form into another,
+for example taking spike times and then binning them into non-overlapping
+bins or convolving with a gaussian kernel.
+'''
+import numpy as np
+import pandas as pd
+from scipy import interpolate
+from brainbox import core
+
+
+def sync(dt, timeseries=None, times=None, values=None, offsets=None, interp='zero',
+         fillval=np.nan):
+    """
+    Function for resampling a single or multiple time series to a single, evenly-spaced, delta t
+    between observations. Uses interpolation to find values.
+
+    Can be used on raw numpy arrays of timestamps and values using the 'times' and 'values' kwargs
+    and/or on brainbox.core.TimeSeries objects passed to the 'timeseries' kwarg. If passing both
+    TimeSeries objects and numpy arrays, the offsets passed should be for the TS objects first and
+    then the numpy arrays.
+
+    Uses scipy's interpolation library to perform interpolation.
+    See scipy.interp1d for more information regarding interp and fillval parameters.
+
+    :param dt: Separation of points which the output timeseries will be sampled at
+    :type dt: float
+    :param timeseries: A group of time series to perform alignment or a single time series.
+        Must have time stamps.
+    :type timeseries: tuple of TimeSeries objects, or a single TimeSeries object.
+    :param times: time stamps for the observations in 'values']
+    :type times: np.ndarray or list of np.ndarrays
+    :param values: observations corresponding to the timestamps in 'times'
+    :type values: np.ndarray or list of np.ndarrays
+    :param offsets: tuple of offsets for time stamps of each time series. Offsets for passed
+        TimeSeries objects first, then offsets for passed numpy arrays. defaults to None
+    :type offsets: tuple of floats, optional
+    :param interp: Type of interpolation to use. Refer to scipy.interpolate.interp1d for possible
+        values, defaults to np.nan
+    :type interp: str
+    :param fillval: Fill values to use when interpolating outside of range of data. See interp1d
+        for possible values, defaults to np.nan
+    :return: TimeSeries object with each row representing synchronized values of all
+        input TimeSeries. Will carry column names from input time series if all of them have column
+        names.
+    """
+    #########################################
+    # Checks on inputs and input processing #
+    #########################################
+
+    # Initialize a list to contain times/values pairs if no TS objs are passed
+    if timeseries is None:
+        timeseries = []
+    # If a single time series is passed for resampling, wrap it in an iterable
+    elif isinstance(timeseries, core.TimeSeries):
+        timeseries = [timeseries]
+    # Yell at the user if they try to pass stuff to timeseries that isn't a TimeSeries object
+    elif not all([isinstance(ts, core.TimeSeries) for ts in timeseries]):
+        raise TypeError('All elements of \'timeseries\' argument must be brainbox.core.TimeSeries '
+                        'objects. Please uses \'times\' and \'values\' for np.ndarray args.')
+    # Check that if something is passed to times or values, there is a corresponding equal-length
+    # argument for the other element.
+    if (times is not None) or (values is not None):
+        if len(times) != len(values):
+            raise ValueError('\'times\' and \'values\' must have the same number of elements.')
+        if type(times[0]) is np.ndarray:
+            if not all([t.shape == v.shape for t, v in zip(times, values)]):
+                raise ValueError('All arrays in \'times\' must match the shape of the'
+                                 ' corresponding entry in \'values\'.')
+            # If all checks are passed, convert all times and values args into TimeSeries objects
+            timeseries.extend([core.TimeSeries(t, v) for t, v in zip(times, values)])
+        else:
+            # If times and values are only numpy arrays and lists of arrays, pair them and add
+            timeseries.append(core.TimeSeries(times, values))
+
+    # Adjust each timeseries by the associated offset if necessary then load into a list
+    if offsets is not None:
+        tstamps = [ts.times + os for ts, os in zip(timeseries, offsets)]
+    else:
+        tstamps = [ts.times for ts in timeseries]
+    # If all input timeseries have column names, put them together for the output TS
+    if all([ts.columns is not None for ts in timeseries]):
+        colnames = []
+        for ts in timeseries:
+            colnames.extend(ts.columns)
+    else:
+        colnames = None
+
+    #################
+    # Main function #
+    #################
+
+    # Get the min and max values for all timeseries combined after offsetting
+    tbounds = np.array([(np.amin(ts), np.amax(ts)) for ts in tstamps])
+    if not np.all(np.isfinite(tbounds)):
+        # If there is a np.inf or np.nan in the time stamps for any of the timeseries this will
+        # break any further code so we check for all finite values and throw an informative error.
+        raise ValueError('NaN or inf encountered in passed timeseries.\
+                          Please either drop or fill these values.')
+    tmin, tmax = np.amin(tbounds[:, 0]), np.amax(tbounds[:, 1])
+    if fillval == 'extrapolate':
+        # If extrapolation is enabled we can ensure we have a full coverage of the data by
+        # extending the t max to be an whole integer multiple of dt above tmin.
+        # The 0.01% fudge factor is to account for floating point arithmetic errors.
+        newt = np.arange(tmin, tmax + 1.0001 * (dt - (tmax - tmin) % dt), dt)
+    else:
+        newt = np.arange(tmin, tmax, dt)
+    tsinterps = [interpolate.interp1d(ts.times, ts.values, kind=interp, fill_value=fillval, axis=0)
+                 for ts in timeseries]
+    syncd = core.TimeSeries(newt, np.hstack([tsi(newt) for tsi in tsinterps]), columns=colnames)
+    return syncd
+
+
+def bincount2D(x, y, xbin=0, ybin=0, xlim=None, ylim=None, weights=None):
+    """
+    Computes a 2D histogram by aggregating values in a 2D array.
+
+    :param x: values to bin along the 2nd dimension (c-contiguous)
+    :param y: values to bin along the 1st dimension
+    :param xbin: bin size along 2nd dimension (set to 0 to aggregate according to unique values)
+    :param ybin: bin size along 1st dimension (set to 0 to aggregate according to unique values)
+    :param xlim: (optional) 2 values (array or list) that restrict range along 2nd dimension
+    :param ylim: (optional) 2 values (array or list) that restrict range along 1st dimension
+    :param weights: (optional) defaults to None, weights to apply to each value for aggregation
+    :return: 3 numpy arrays MAP [ny,nx] image, xscale [nx], yscale [ny]
+    """
+    # if no bounds provided, use min/max of vectors
+    if not xlim:
+        xlim = [np.min(x), np.max(x)]
+    if not ylim:
+        ylim = [np.min(y), np.max(y)]
+
+    # create the indices on which to aggregate: binning is different that aggregating
+    if xbin:
+        xscale = np.arange(xlim[0], xlim[1] + xbin / 2, xbin)
+        xind = (np.floor((x - xlim[0]) / xbin)).astype(np.int64)
+    else:  # if bin size = 0 , aggregate over unique values
+        xscale, xind = np.unique(x, return_inverse=True)
+    if ybin:
+        yscale = np.arange(ylim[0], ylim[1] + ybin / 2, ybin)
+        yind = (np.floor((y - ylim[0]) / ybin)).astype(np.int64)
+    else:  # if bin size = 0 , aggregate over unique values
+        yscale, yind = np.unique(y, return_inverse=True)
+
+    # aggregate by using bincount on absolute indices for a 2d array
+    nx, ny = [xscale.size, yscale.size]
+    ind2d = np.ravel_multi_index(np.c_[yind, xind].transpose(), dims=(ny, nx))
+    r = np.bincount(ind2d, minlength=nx * ny, weights=weights).reshape(ny, nx)
+    return r, xscale, yscale
+
+
+def bin_spikes(spikes, binsize, interval_indices=False):
+    """
+    Wrapper for bincount2D which is intended to take in a TimeSeries object of spike times
+    and cluster identities and spit out spike counts in bins of a specified width binsize, also in
+    another TimeSeries object. Can either return a TS object with each row labeled with the
+    corresponding interval or the value of the left edge of the bin.
+
+    :param spikes: Spike times and cluster identities of sorted spikes
+    :type spikes: TimeSeries object with \'clusters\' column and timestamps
+    :param binsize: Width of the non-overlapping bins in which to bin spikes
+    :type binsize: float
+    :param interval_indices: Whether to use intervals as the time stamps for binned spikes, rather
+        than the left edge value of the bins, defaults to False
+    :type interval_indices: bool, optional
+    :return: Object with 2D array of shape T x N, for T timesteps and N clusters, and the
+        associated time stamps.
+    :rtype: TimeSeries object
+    """
+    if type(spikes) is not core.TimeSeries:
+        raise TypeError('Input spikes need to be in TimeSeries object format')
+
+    if not hasattr(spikes, 'clusters'):
+        raise AttributeError('Input spikes need to have a clusters attribute. Make sure you set '
+                             'columns=(\'clusters\',)) when constructing spikes.')
+
+    rates, tbins, clusters = bincount2D(spikes.times, spikes.clusters, binsize)
+    if interval_indices:
+        intervals = pd.interval_range(tbins[0], tbins[-1], freq=binsize, closed='left')
+        return core.TimeSeries(times=intervals, values=rates.T[:-1], columns=clusters)
+    else:
+        return core.TimeSeries(times=tbins, values=rates.T, columns=clusters)
diff --git a/brainbox/simulation/__init__.py b/brainbox/simulation/__init__.py
diff --git a/brainbox/singlecell/__init__.py b/brainbox/singlecell/__init__.py
diff --git a/brainbox/task/__init__.py b/brainbox/task/__init__.py
diff --git a/examples/ibllib/raster_clusters.py → examples/brainbox/raster_clusters.py b/examples/ibllib/raster_clusters.py → examples/brainbox/raster_clusters.py
@@ -3,10 +3,11 @@
 import numpy as np
 
 from oneibl.one import ONE
-from ibllib.misc import bincount2D
 import alf.io as ioalf
 import ibllib.plots as iblplt
 
+from brainbox.misc import bincount2D
+
 T_BIN = 0.01
 
 # get the data from flatiron and the current folder

diff --git a/examples/ibllib/raster_depths.py → examples/brainbox/raster_depths.py b/examples/ibllib/raster_depths.py → examples/brainbox/raster_depths.py
@@ -3,10 +3,11 @@
 import numpy as np
 
 from oneibl.one import ONE
-from ibllib.misc import bincount2D
 import alf.io as ioalf
 import ibllib.plots as iblplt
 
+from brainbox.misc import bincount2D
+
 T_BIN = 0.01
 D_BIN = 20