Trinh Viet Doan (Technical University of Munich) | Vaibhav Bajpai (Technical University of Munich) | Sam Crawford (SamKnows)
IEEE INFOCOM 2020, July 6–9, 2020. Publication →
The dataset is collected from ~100 SamKnows probes:
The raw datasets are available at:
The data consists of two sqlite3 databases, one for the measurements by the netflix
test (netflix-data.db
), the other one for the throughput measurements toward MLab (mlab-data.db
).
The schemas of the tables can be found under ./data/netflix-schema.sql
) and ./data/mlab-schema.sql
).
This repository contains (most of) the required code and metadata to reproduce the results, see below for further instructions.
To read from the database (see above), sqlite3
is needed.
The analyses were performed using jupyter
notebooks on Python 2.7
.
Required Python dependencies are listed in requirements.txt
and can be installed using pip install -r requirements.txt
.
For the calculation of CDFs and drawing of the corresponding plots, Pmf.py
→ and Cdf.py
→ from Think Stats → are used.
Move the required datasets and modules to the right locations:
netflix-data.db
→./data/
mlab-data.db
→./data/
Pmf.py
→./notebooks/
Cdf.py
→./notebooks/
Run the aggregation.ipynb
notebook to process and aggregate the raw dataset, which will store the results in a separate database. After that, the other notebooks fig-*.ipynb
can be used to draw the plots presented in the paper.
All plots are saved under ./plots/
.
Note: the lookup of metadata was already done, however, it can be repeated by running ./metadata/netflix-metadata-lookup.py
and ./metadata/probe-to-timezone.ipynb
.
Please feel welcome to contact the authors for further details.
- Trinh Viet Doan ([email protected]) (corresponding author)
- Vaibhav Bajpai ([email protected])
- Sam Crawford ([email protected])