-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Add animations tutorial #547
Changes from all commits
890e4fb
5b71ece
89dcc79
0718e3e
ce9cfe3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,179 @@ | ||
.. _animations: | ||
|
||
.. index:: animations | ||
|
||
Creating an animation using Emperor | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
In this tutorial we describe how to create a principal coordinates analysis | ||
(PCoA) plot, and display animated traces of the samples sorted by a metadata | ||
category. For this purpose, we will describe a `Synthetic Example` (explaining | ||
concepts) and a `Real Example` (that deals with the actual plot generation, and | ||
curation). | ||
|
||
To do this, we need to have two metadata categories, a *gradient* category, and | ||
a *trajectory* category. The *gradient* category determines the order in which | ||
samples are connected together, the *trajectory* category determines how | ||
samples are grouped together. | ||
|
||
Synthetic Example | ||
================= | ||
|
||
In most cases the *trajectory* and *gradient* columns already exist as part of | ||
your sample information, however you may need to do some curation to make these | ||
compatible with Emperor. | ||
|
||
---- | ||
Data | ||
---- | ||
|
||
In this example, consider a longitudinal study where you wish to track the oral | ||
microbiome changes in a cohort of 3 mice over the course of 5 weeks, each | ||
sample will be described by the following columns: | ||
|
||
* ``cage_number``: the cage where each mice was housed, more than one mice could | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. AFAIK, you don't need this for the animation, right? But perhaps I need to read further down. |
||
have resided in the same cage. | ||
|
||
* ``age_in_years``: the age of each mice in years. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess we are planning to describe 2 trajectories examples, not only one, right? |
||
* ``week``: the number of the week in this experiment. | ||
|
||
* ``sex``: the sex of each mice. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this category really need it by the animation? Perhaps adding some text after the description saying why we want it will be useful. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added this two paragraphs after the description of these columns. |
||
|
||
* ``mice_identifier``: where each mice is assigned a unique identifier. | ||
|
||
---------- | ||
Processing | ||
---------- | ||
|
||
Here, we can use the ``week`` column as our *gradient* category, so as long as | ||
all the values are numerical. To be more precise, a column where values were | ||
indicated as ``pre-treatment, first, second, third and last`` would not be | ||
appropriate and instead would need to be converted into (for example): ``-1, 1, | ||
2, 3 and 4`` (remember we have 5 weeks of data). | ||
|
||
As for the *trajectory* category, the natural choice would be to use the | ||
``mice_identifier`` column, because it uniquely identifies every mice, and | ||
should be the same throughout the experiment. | ||
|
||
All the remaining columns (``cage_number``, ``age_in_years`` and ``sex``), are | ||
not explicitly needed to create an animation, but can be used to change the | ||
color, visibility and size of the samples. | ||
|
||
The following figure shows what we expect to observe when we press the play | ||
button (week numbers are only showed as a reference). | ||
|
||
.. figure:: trajectories.png | ||
:alt: Cartoon representation of the example above. | ||
|
||
Cartoon representation of the synthetic example. On the left, the unmodified | ||
ordination coloring samples by mice. On the center, the same ordination with | ||
a label for each sample, corresponding to the week where this sample was | ||
collected. On the right, samples connected by a line, where the order is | ||
determined by the collection time (all trajectories begin at ``-1``). | ||
|
||
From the trajectories, you can see that samples are connected according to the | ||
numerical order in the *gradient* category, and that missing data is simply | ||
ignored, for example the red samples are missing timepoint ``2``, therefore | ||
sample ``1`` is connected to sample ``3``. | ||
|
||
In the next section we will go through an example using published data from | ||
`Weingarden et al. 2015 <https://www.ncbi.nlm.nih.gov/pubmed/25825673>`_. | ||
|
||
Real Example | ||
============ | ||
|
||
---- | ||
Data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do we need 2 examples that are different mice/human? If you want to add them both suggest splitting them clearly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't really need both, but I think that if you are unfamiliar with the topic, it may be of more help to look at a small synthetic example with just a handful of data points. As per your other comments, I've made the distinction between these two more explicit, hopefully this will help. |
||
---- | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you use these same divisions (data, processing, filtering) in the synthetic example? Also, could you add a small description of each section before starting the examples. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've now separated the sections of the two examples. The filtering section is really a "you should know" thing, not really part of the second example. I've changed the headers to make this distinction clearer. |
||
|
||
This example will help us visualize the short and long-term changes of four | ||
patients as they undergo a fecal material transplant (FMT). To contextualize | ||
these changes, we are going to use the data from the Human Microbiome Project | ||
(HMP), an initiative that characterized the microbial communities of 252 | ||
**healthy** human adults in four different supersites (fecal, skin, oral and | ||
vaginal communities). | ||
|
||
For convenience, we combined the two datasets using `Qiita | ||
<https://qiita.ucsd.edu>`_. Specifically the studies we used are `study 10057 | ||
<https://qiita.ucsd.edu/study/description/10057>`_ (FMT) and `study 1928 | ||
<https://qiita.ucsd.edu/study/description/1928>`_ (HMP). Remember you need to | ||
be logged in to access the studies. | ||
|
||
The files needed for this tutorial can be downloaded from this `link | ||
<http://emperor.microbio.me/animations-tutorial.zip>`_. | ||
|
||
---------- | ||
Processing | ||
---------- | ||
|
||
As discussed before, we will need to identify two columns that allow us to sort | ||
samples, and to group them. We only want to focus on the observed changes in | ||
the microbiome of patients that undergo an FMT, therefore the subjects from the | ||
HMP data won't need to be animated, and the samples are instead used as a frame | ||
of reference. | ||
|
||
Notice that in ``mapping-file.txt`` there are two columns that describe this | ||
information. First, as the *gradient* category, we can use | ||
``day_relative_to_fmt`` (a column that describes the number of days before or | ||
after the FMT), and as the *trajectory* category we can use ``host_subject_id`` | ||
(a column with unique identifiers for each individual participating in both | ||
studies). | ||
|
||
One thing you will notice is that samples from the HMP lack a value for the | ||
``day_relative_to_fmt`` column, since these subjects did not undergo a | ||
transplant. When we look at these samples, we observe that they are all labeled | ||
with an ``unknown`` value. In order to use this information we will replace the | ||
label ``unknown`` for a ``0``, such that the mapping file passes Emperor's | ||
validations. You can do this using a spreadsheet manipulation program like | ||
Excel, or alternatively you can use a scripting language like R or Python | ||
(using Pandas is recommended) to perform these manipulations. After doing this, | ||
we suggest that you create a new column that includes these modifications, and | ||
name it ``animations_gradient``. | ||
|
||
.. note:: | ||
When plots are generated with Emperor, only columns where all values are | ||
numeric will be accessible as a *trajectory* category. | ||
|
||
As for the *trajectory* category, we will ignore all subjects but the ones that | ||
underwent a FMT, so for all other samples (both for the HMP and FMT), we will | ||
set the ``host_subject_id`` value to ``NA``. Again, we will create a new column | ||
to store this modified information, and we will name it | ||
``animations_subject``. | ||
|
||
.. note:: | ||
The names of the columns can be arbitrarly chosen by the user, but we | ||
recommend clearly distinguishing the purpose. | ||
|
||
After you've done this, the result will be a new metadata mapping file that | ||
includes two new columns, ``animations_gradient`` and ``animations_subject`` | ||
(for an example see ``mapping-file.animations.txt``). All that's left is to | ||
create the plot itself, to do that we will use ``make_emperor.py``:: | ||
|
||
make_emperor.py -i unweighted-unifrac-pc.txt -m mapping-file.animations.txt -o animations --add_unique_columns | ||
|
||
After you do this, you can open the plot (by opening the file inside | ||
``animations/index.html``), select ``body_habitat`` as a color category (under | ||
the Colors tab). Now, go to the animations tab on the right. Next, in the | ||
*Gradient Category* menu select *animations_gradient*, and in the *Trajectory | ||
Category* menu select *animations_subject*. Now you can click the play | ||
button and visualize the changes in the microbiome of the four patients. As you | ||
do this, you can continue to interact with the plot, and change any colors as | ||
needed. | ||
|
||
The resulting plot can be found `here | ||
<http://emperor.microbio.me/animation/>`_, please note that this plot includes | ||
a few presets that will be different from the plot that you generated above, | ||
however both plots are fundamentally the same. | ||
|
||
Filtering out data | ||
================== | ||
|
||
In some situations, we want to focus only one or a handful of the existing | ||
trajectories in a dataset. In such a case, you can hide any trajectories you | ||
want by creating a new column in your sample information, for example | ||
``animation_one_trajectory``, and then setting the values of the samples that | ||
you do not wish to see animated to ``0``. | ||
|
||
The idea above applies as well to blanks or other types of technical samples | ||
that will not need to be animated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we always call it trajectory? IMOO it sounds confusing cause the trajectory is given by the gradient, right? Perhaps grouping is a better name but OK if this is how we always have call it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have always called it that, or at least that's the name given in the menu, and it's the name we used in the manuscript.