Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link test #7

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
code/
data/
graphs/
img/
output/
__pycache__
*.pyc
.git/
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ code/secrets
*.pyc
.RData
.Rhistory
__pycache__/
.pytest_cache/
9 changes: 9 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,16 @@
language: node_js

sudo: required

services:
- docker

node_js:
- "node"

install:
- npm i markdown-spellcheck -g

script:
- docker build -t tester .
- docker run -i tester /bin/bash -c "pytest -v --tests-per-worker auto"
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM alpine:edge

RUN apk update && apk add --no-cache \
python3 \
bash \
py3-lxml && \
python3 -m ensurepip

ADD ./tests/requirements.txt /tmp/requirements.txt

RUN pip3 install -qr /tmp/requirements.txt

ADD . /src/
WORKDIR /src
2 changes: 1 addition & 1 deletion deep-learning-libraries.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ The ranking is based on equally weighing its three components: Github (stars and
`TensorFlow` is at least two standard deviations above the mean on all calculated metrics. `TensorFlow` has almost three times as many Github forks and more than six times as many Stack Overflow questions than the second most popular framework, `Caffe`. First open-sourced by the Google Brain team in 2015, `TensorFlow` has climbed over more senior libraries such as `Theano` (4) and `Torch` (8) for the top spot on our list. While `TensorFlow` is distributed with a Python API running on a C++ engine, several of the libraries on our list can utilize `TensorFlow` as a back-end and offer their own interfaces. These include `Keras` (2), which will [soon be part of core TensorFlow](https://twitter.com/fchollet/status/820746845068505088) and `Sonnet` (6). The popularity of `TensorFlow` is likely due to a combination of its general-purpose deep learning framework, flexible interface, good-looking computational graph visualizations, and Google’s significant developer and community resources.

## `Caffe` has yet to be replaced by `Caffe2`
`Caffe` takes a strong third place on our list with more Github activity than all of its competitors (excluding `TensorFlow`). `Caffe` is traditionally thought of as more specialized than `Tensorflow` and was developed with a focus on image processing, objection recognition, and pre-trained convolutional neural networks. Facebook released `Caffe2` (11) in April 2017, and it already ranks in the top half the deep learning libraries. `Caffe2` is a more lightweight, modular, and scalable version of `Caffe` that includes recurrent neural networks. `Caffe` and `Caffe2` are separate repos, so data scientists can continue to use the original `Caffe`. However, there are migration tools such as [Caffe Translator](https://github.com/caffe2/caffe2/blob/master/caffe2/python/caffe_translator.py) that provide a means of using `Caffe2` to drive existing `Caffe` models.
`Caffe` takes a strong third place on our list with more Github activity than all of its competitors (excluding `TensorFlow`). `Caffe` is traditionally thought of as more specialized than `Tensorflow` and was developed with a focus on image processing, objection recognition, and pre-trained convolutional neural networks. Facebook released `Caffe2` (11) in April 2017, and it already ranks in the top half the deep learning libraries. `Caffe2` is a more lightweight, modular, and scalable version of `Caffe` that includes recurrent neural networks. `Caffe` and `Caffe2` are separate repos, so data scientists can continue to use the original `Caffe`. However, there are migration tools such as [Caffe Translator](https://github.com/pytorch/pytorch/blob/master/caffe2/python/caffe_translator.py) that provide a means of using `Caffe2` to drive existing `Caffe` models.

## `Keras` is the most popular front-end for deep learning
`Keras` (2) is highest ranked non-framework library. `Keras` can be used as a front-end for `TensorFlow` (1), `Theano` (4), `MXNet` (7), `CNTK` (9), or `deeplearning4j` (14). `Keras` performed better than average on all three metrics measured. The popularity of `Keras` is likely due to its simplicity and ease-of-use. `Keras` allows for fast prototyping at the cost of some of the flexibility and control that comes from working directly with a framework. `Keras` is favorited by data scientists experimenting with deep learning on their data sets. The development and popularity of `Keras` continues with R Studio recently releasing [an interface](https://keras.rstudio.com) in `R` for `Keras`.
Expand Down
2 changes: 1 addition & 1 deletion js-viz-packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ The data presented a few difficulties:

All source code and data is on [our Github Page](https://github.com/thedataincubator/data-science-blogs).

We first generated a list of 141 Data Science packages [from](https://github.com/fasouto/awesome-dataviz') [these](https://github.com/wbkd/awesome-d3) [four](https://en.wikipedia.org/wiki/Comparison_of_JavaScript_charting_frameworks) [sources](https://cssauthor.com/javascript-charting-libraries), and then collected metrics for all of them, to come up with the ranking. Github data is based on both stars and forks, while Stack Overflow data is based on tags and questions containing the package name. Downloads data is from npmjs. Downloads were totaled over a six month period, and the compound monthly growth rate was calculated over the same period. After scraping other sites for JS visualization package names, we had gathered over 200 package names. Many of them were aliases for the same packages (d3, D3JS). If a the first result of Github search returned the same repo as another package, we treated them as the same package, but saved the aliases to search Stack Overflow questions.
We first generated a list of 141 Data Science packages [from](https://github.com/fasouto/awesome-dataviz) [these](https://github.com/wbkd/awesome-d3) [four](https://en.wikipedia.org/wiki/Comparison_of_JavaScript_charting_frameworks) [sources](https://cssauthor.com/javascript-charting-libraries), and then collected metrics for all of them, to come up with the ranking. Github data is based on both stars and forks, while Stack Overflow data is based on tags and questions containing the package name. Downloads data is from npmjs. Downloads were totaled over a six month period, and the compound monthly growth rate was calculated over the same period. After scraping other sites for JS visualization package names, we had gathered over 200 package names. Many of them were aliases for the same packages (d3, D3JS). If a the first result of Github search returned the same repo as another package, we treated them as the same package, but saved the aliases to search Stack Overflow questions.

A few other notes:

Expand Down
2 changes: 1 addition & 1 deletion python-packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ tables), and `shogun` (machine learning). They were all below average compared
to the ranked packages, in all categories.

Importantly,
the [Anaconda distribution](https://www.continuum.io/anaconda-overview) bundles
the [Anaconda distribution](https://www.anaconda.com/what-is-anaconda/) bundles
together many of these packages, and this was not considered.

Further, naturally, some packages that have been around longer will have higher
Expand Down
5 changes: 5 additions & 0 deletions tests/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
requests
pytest
beautifulsoup4
markdown
pytest-parallel
56 changes: 56 additions & 0 deletions tests/test_links.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
import os
import re
import codecs
import pytest
import requests
from bs4 import BeautifulSoup
import markdown

def _get_files():
return [i for i in os.listdir()
if i.split('.')[-1] == 'md']

def _parse_links(filename):
input_file = codecs.open(filename,
mode="r",
encoding="utf-8")
text = input_file.read()
soup = BeautifulSoup(markdown.markdown(text), "lxml")
return [link['href'] for link in soup.find_all('a', href=True)]

def _valid(link):
if '.md' in link:
return False
if '.csv' in link:
return False
if '.' not in link:
return False
return True

def _get_links_from_page(filename):
links = []
for link in _parse_links(filename):
if _valid(link):
links.append((filename, link))
return links

def _get_links():
links = []
for filename in _get_files():
links += _get_links_from_page(filename)
return links

@pytest.mark.parametrize("filename,link", _get_links())
def test_link(filename, link):
request_success = False
for i in range(3):
if request_success:
break
try:
r = requests.get(link, timeout=10)
request_success = True
except Exception:
pass

assert request_success
assert r.status_code == 200