Skip to content

Latest commit

 

History

History
33 lines (21 loc) · 1.49 KB

README.md

File metadata and controls

33 lines (21 loc) · 1.49 KB

kaggle_Microsoft_Malware

Code that won [Kaggle Microsoft Malware Classification Competition] (https://www.kaggle.com/c/malware-classification). Great credits go to team mate daxiongshu for organizing everything!

Please see the PDF for our methods and running the code. It heavily used [XGBOOST] (https://github.com/dmlc/xgboost).

This is a fork of the original winning code ported to Python 3 with updated dependencies. This fork requires Python 3.6.1 to run properly.

Installation

  1. Clone the repository

  2. (Optional, highly recommended) Create a virtual environment

  3. Install the required packages:

    python -m pip install -r requirements.txt
    
  4. Install pypy for Python 3

  5. Set up your PATH variable so that pypy points to the executable that runs pypy (so that pypy may be run as pypy [arguments] without specifying the full path to pypy)

Usage

To train a model and perform predictions, see the PDF.

If you performed a custom split of the dataset into a train and a test set and you want to assess the prediction performance, run

prediction_performance.py [path to predictions] [path to true labels]

where [path to predictions] is a path to the CSV containing predictions generated by one of the models, and [path to true labels] is a path to the CSV containing test labels (having the same structure as the CSV for the train labels on the Kaggle site).