Skip to content

Latest commit

 

History

History
 
 

appshield-dga-detection

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

DGA Detection via AppShield

Use Case

Detection of domains created by domain generation algorithms

Version

1.0

Model Overview

This model is a convolution neural network model trained to classify URL domains generated by Domain-Generation-Algorithms. Domain generation algorithms (DGA) are algorithms seen in various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. The large number of potential rendezvous points makes it difficult for law enforcement to effectively shut down botnets, since infected computers will attempt to contact some of these domain names every day to receive updates or commands.

Model Architecture

There are two models for this use case. One is a CNN binary classifier (DGA or benign), and the other classifies the specific DGA family the URL belongs to using a Siamese Network.

Requirements

To run this example, additional requirements must be installed into your environment. A supplementary requirements file has been provided in this example directory.

pip install -r requirements.txt

Training

Training data

Training data consists of 320K labelled as DGA domains of 17 known DGA families and 710K labelled as not DGA domains.

Training epochs

Binary model = 30 epochs

Family classification model = 20 epochs

Training batch size

Binary model = 1000

Family classification model = 500

GPU Model

V100

Model accuracy

Binary model precision = 0.9

Binary model accuracy = 0.9

Training script

To train the model run the following script under working directory.

cd ${MORPHEUS_EXPERIMENTAL_ROOT}/appshield-dga-detection/training-tuning

# Run training script and save models

python dga-appshield-cnn-training.py

This saves trained model files under ../models directory. Then the inference script can load the models for future inferences.

How To Use This Model

Combined with host data from DOCA AppShield, this model can be used to detect DGA malware. A training notebook is also included so that users can update the model as more labeled data is collected.

Input

This model is based on DOCA AppShield and the input of the model is the URL plugin which contains list of URLs connected to host processes.

Output

Binary classifier outputs process with URLs classified as DGA or benign. DGA family detection classifier outputs DGA family name.

Out-of-scope use cases

N/A

Ethical considerations

N/A

References