Rules Refine the Riddle: Global Explanation for Deep Learning-Based Anomaly Detection in Security Applications. Accepted by CCS'24.
⚠️ Notice for AEC- The files in this Github repository are incomplete (due to restrictions on large files); - Please download the complete files from Zenodo
This artifact is the implementation and experiment results of GEAD proposed in CCS'24. In short, GEAD is a method for extracting rules in deep learning-based anomaly detection models. As shown in the figure below, it contains several core steps:
- Root regression tree generation: leveraging black-box knowledge distillation methods to extract a raw rule tree
- Low-confidence Region Identification: Find regions that cause inconsistencies between the original model and the tree-based explainable model
- Low-confidence augmentation: augmenting data that can lead to inconsistent decisions between the two models
- Low-confidence rule generation: using augmented data to expand original rules
- Tree merging and discretization: simplifying the rules to increase the readability for operators
- Rule generation (optional): Convert the rule tree into readable a rule set
The following is a brief introduction to the directory structure of this artifact:
- baseline/ ; code of baselines
- code/
- gead.py ; code of GEAD
- gead_seq/py ; code of GEAD (for RNN)
- ...
- demo/
- demo.ipynb ; demo to show how to use GEAD
- ...
- experiment/
- results/ ;reproduced experient results
- Fidelity_Evaluation.ipynb ; experiment 1
- Usage 2.ipynb ; experiment 2
- setup/ ;environment setup files
- doc/ ; images used in README
- README.md ; instructions of this artifact
This implementation has been successfully tested in Ubuntu 16.04 server with Python 3.7.16. To ensure compatibility, this artifact (pytorch-based parts) can be fully run with CPU (GPU/CUDA is not required).
To ensure the proper functioning of this artifact, please follow the commands below:
- Ensure that you have
conda
installed on your system. If you do not haveconda
, you can install it as part of the Anaconda distribution or Miniconda. - Open a terminal or command prompt.
- Create a new conda environment with the name of your choice (e.g.,
GEAD
) and specify the version of python to configure it:conda create -n GEAD python=3.7.16
- Once the environment is created, activate it by running:
This will switch your command line environment to use the newly created conda environment with all the necessary packages.
conda activate GEAD
- Run the following command to install all the required packages:
This command tells
pip install -r setup/requirements.txt
pip
to install all the packages listed in therequirements.txt
file.
Below, the experiments or demos in this artifact mainly use jupyter notebook. So make sure you can view and execute the notebook (.ipynb) files.
How to use jupyter notebook can be found on the official website.
In short, select the right kernel (namely, the above GEAD
) and then execute all cells (except markdown cells) in sequence. All cells in this artifact have been pre-executed with output shown. If all goes well, you should get consistent output in your environment.
We provide a step-by-step demo of explaining an autoencoder-based anomaly detection model with GEAD, which is also the result in Section 4.3.1 of our paper.
The output of the experiments will validate the following claims:
- (Quality of GEAD) Figure 3 on page 7 is reproduced in experiment/Fidelity_Evaluation.ipynb and the results are shown in:
experiment/results/figure3_part1(Positive Consistency).png
experiment/results/figure3_part2(Negative Credibility).png
- (Quality of GEAD) Table 1 on page 7 is reproduced in experiment/Fidelity_Evaluation.ipynb and the results are shown in:
experiment/results/Table1_part1(CICIDS2017).csv
experiment/results/Table1_part2(Kitsune).csv
experiment/results/Table1_part3(Kyoto2006)
experiment/results/Table1_part4(HDFS).csv
- (Usage of GEAD) The statement of the inverse relationship between AUC and the number of rules (see Insight 5 on page 11 and section 4.3.2) is verified in experiment/Usage_2.ipynb and the reproduced cases of Figure 6 on page 9 are shown in:
experiment/results/figure6_case2_part1(loss-auc).png
experiment/results/figure6_case2_part2(rule-auc).png
Our artifact does not validate the following claims/experiments:
-
Usage 3 in section 4.3.3 (also Section 5). These experiments were conducted with State Grid Corporation of China. Due to data compliance and privacy restrictions, we cannot leak any of data, even in a private or certified manner.
-
Other insights and experiments that require case-by-case analysis with various background knowledge (such as Usage 1 in Section 4.3.1). This part of the experiment does not have clear verifiable indicators but requires analysis based on specific circumstances (the analysis process has been introduced in the paper). At the same time, these contents are not our claims against GEAD, but are based on GEAD’s analysis of existing systems, so we leave it out of scope of artifact evaluation.
To facilitate AEC review, we summarize the materials and instructions related to the three badges as follows:
This artifacts will be put into Zenodo and public in Github.
We have include a demo to show how to use GEAD at here, which could help you to understand how to use and the workflow of GEAD.
In order to facilitate understanding of the GEAD code and verify its integrity, we briefly explain the relevant code in code/gead.py
(the same applies to gead_seq.py
) according to the GEAD process as introduced in "Introduction":
- Root regression tree generation: see
get_roottree()
function on line 112 ofcode/gead.py
- Low-confidence Region Identification: also in
get_roottree()
function on line 112 ofcode/gead.py
- Low-confidence augmentation: see
lc_data_augment()
function on line 140 ofcode/gead.py
- Low-confidence rule generation: see
get_lc_trees()
function on line 386 ofcode/gead.py
- Tree merging and discretization: see
get_merged_tree()
function on line 717 ofcode/gead.py
- Rule generation (optional): we leave this to specific applications since this step is optional (as stated in our paper)
Please see the above section in this README ("Experiment Results") and specific notes in the jupyter notebook files.