CLAP
is a reinforcement learning PPO agent performs Penetration Testing in simulated computer network environment (we use Network Attack Simulator (NASim)). The agent is trained to scan for vulnerabilities in the network and exploit them to gain access to various network resources. CLAP
was initially poposed in our paper Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach.
Simulated Network Enviornment: Network Attack Simulator (NASim)
Network Attack Simulator (NASim) is a simulated computer network complete with vulnerabilities, scans and exploits designed to be used as a testing environment for AI agents and planning techniques applied to network penetration testing.
However, compared to the original paper, this repo has made following changes
- Developed based on CleanRL
- Add LSTM for POMDP scenarios
- To Support NASim 2D observation space,
Transformer
was implementated as preceptions- However, they are extremely unstable to train
- To learn more about the transformer enocder: Check Yekun's Note
To run this code, you will need to have the following installed on your system:
- Python 3.5 or later
- Pytorch 2.0 or later
- OpenAI Gym 0.21.0 (huge change 0.25)
- NASim 0.91
It's important to be aware that OpenAI Gym underwent a significant update after version 0.25.0, which included a new
step
API.
Use Conda
to manage python
environmnent and Poetry
to manage packages.
Get Started
Clone this repo:
git clone https://github.com/yyzpiero/RL4RedTeam.git
Create conda environment:
conda create -p ./venv python==X.X
and use poetry to install all Python packages:
poetry install
To train the agent, you can use the following command:
cd ./algo
python clap.py
This will start the training process, which will run until the agent reaches a satisfactory level of performance. The performance of the agent will be printed to the console at regular intervals, so you can monitor its progress.
The ppo implementation is heavily based on Costa Huang's fantasitc library CleanRl
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
@article{yang2022behaviour,
title={Behaviour-Diverse Automatic Penetration Testing: A Curiosity-Driven Multi-Objective Deep Reinforcement Learning Approach},
author={Yang, Yizhou and Liu, Xin},
journal={arXiv preprint arXiv:2202.10630},
year={2022}
}
- Add origin code for
CLAP
- Add Random Network Distillation (RND)
- Include figures of the training results
This implementation of the PPO algorithm is not intended for use in real-world penetration testing. It is only meant for use in a simulated environment, and should not be used to perform actual penetration testing on real networks.