Reproduction of Text Categorization with SVMs

Paper: Text Categorization with Support Vector Machines: Learning with Many Relevant Features
Source: https://dl.acm.org/citation.cfm?id=649721

The paper suggests the suitability of Support Vector Machines for text classification purposes. In this jupyter notebook I tried to reproduce the results obtained in the paper. Here is a small sample from the notebook:

F1-Scores from reproduction

Datasets:

Reuters: http://disi.unitn.it/moschitti/corpora/Reuters21578-Apte-90Cat.tar.gz
Ohsumed: http://disi.unitn.it/moschitti/corpora/ohsumed-first-20000-docs.tar.gz

Unzip the datasets into a directory called ./datasets

Note: This notebook needs a Python 2 kernel because of the svmlight package and if you don't want to do the training yourself the results are stored in ./results.tar.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reproduction of Text Categorization with SVMs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reproduction of Text Categorization with SVMs