Based on the superb repo https://github.com/suragnair/alpha-zero-general but with higher strength, optimized for CPU-only 25-100x speed improvement, and supporting 2+ players games. You can play with it on your browser https://github.com/cestpasphoto/cestpasphoto.github.io. Technical details about the improvements are listed in this page.
- Botanik
- Machi Koro (a.k.a. Minivilles)
- Santorini (with basic gods)
- Small World
- Splendor
- The Little Prince - Make me a planet
- Support of Splendor game with 2 players
- Support of 3-4 players (just change NUMBER_PLAYERS constant)
- Proper MCTS handling of "chance" factor when revealing new deck card
- Adding "universe exploration" feature for even better training
- Added pretrained models for 2-3-4 players
The AI engine doesn't know which cards will be drawn. There is one limitation: implemented logic doesn't allow you to both take gems from the bank and give back some (whereas allowed in real rules), you are limited to either take 1-2-3 gems or give back 1-2 gems.
- Quick implementation of Minivilles, with handful limitations
- Quick implementation
of The little prince, with
limitations. Main ones are:
- No support of 2 players, only 3-5 players are supported
- When market is empty, current player doesn't decide card type, it is randomly chosen.
- Grey sheeps are displayed on console using grey wolf emoji, and brown sheeps are displayed using a brown goat.
- Own implementation of Santorini, policy for initial status is user switchable (predefined, random or chosen by players)
- Support of goddess (basic ones only)
About 90+% winrate against Ai Ai and 95+% win rate against BoardSpace AI. See more details here
- Support of Botanik with 2 players
- Machine size is limited to 7x7, which should be enough in most cases
- Support of Small World with 2 to 5 players
- All types of people and power from the original game are supported. Each turn of a player may need several actions (see code for more details).
- Extra effort to improve pretrained model for 2 players
(Scores are on the 2 last rows after sc=, and number after # is number of next round).
Contrary to the repo this repo is based on, this one doesn't set up a Docker. Just launch the Python scripts from your PC, no special GPU support required.
pip3 install onnxruntime onnx numba tqdm colorama coloredlogs
pip3 install torch torchvision --extra-index-url https://download.pytorch.org/whl/cpu
Use main.py
with your parameters, e.g.:
python main.py splendor -m 800 -f 0.1 -l 0.0003 -D 0.3 -C ../results/mytest -V 74
Launch pit.py
with the name of the game and the name of the two players.
python /pit.py splendor splendor/pretrained_2players.pt splendor/pretrained_2players.pt -n 1 --display
or
python ./pit.py splendor splendor/pretrained_2players.pt human -n 1
You can run more than 1 game at a time and just display the game results Contrary to baseline version, pit.py automatically retrieves training settings and load them (numMCTSSims, num_channels, ...) although you can override if you want; you may even select 2 different architecture to compare them!
Compared to initial version, I target a smaller network but more MCTS simulations allowing to see further: this approach is less efficient on GPU, but similar on CPU and allow stronger AI.
python main.py splendor -m 800 -e 1000 -i 5 -F -c 2.5 -f 0.1 -T 10 -b 32 -l 0.0003 -p 1 -D 0.3 -C ../results/mytest -V 74
:
- Start by defining proper number of players in SplendorGame.py and disabling card reserve actions in first lines of splendor/SplendorLogicNumba.py
-c 2.5 -f 0.1
: MCTS options to tune, like cpuct value and FPU (first play urgency)- Initiate training with lower simulations number and less episodes per round
-b 32 -l 0.0003 -p 1 -D 0.3
: define batch size, learning rate, number of epochs and dropout. Larger number of epochs may degrade performance, same for larger batch sizes so you only need to tune roughly dropout value (0., 0.3 or 0.3).-V 74
: define NN architecture to use. The specified NN architecture must be listed in splendor/splendorNNet.py (look atforward()
function).
My baseline of training scenario is the following:
-m 100 -q 0. -l 0.003 -e 200 -i 2 -f 0.1
-m 200 -q 0.5 -l 0.001 -e 200 -i 4 -f 0.1
-m 400 -q 0.5 -l 0.0003 -e 500 -i 8 -F -f 0.1
-m 800 -q 1.0 -l 0.0003 -e 1500 -i 10 -F -f 0.1
Of course you need to tune parameters depending on the game, especially cpuct and FPU.
If you specify a previous checkpoint using a different architecture (-V
option), it will do its best by training such new architecture based on examples of previous checkpoint. It allows me starting first steps of training with small/fast networks and then experimenting larger networks.
To debug add NUMBA_DISABLE_JIT=1
as a prefix before main.py, and the option --debug
.
I also usually execute several trainings in parallel; you can evaluate the results obtained in the last 24 hours by
using this command (execute as many times as threads): ./pit.py -A 24 -T 8
The code also runs several games to benefit from faster batch inferences; note that games are not run simultaneously but one at a time, meaning it still uses 1 CPU core. The downside is the bigger memory footprint.
You can use Ray to run HPO (especially Population Based Training "PBT"). This is still new so it may change but currently process is to configure first rayConfig.py
and useRay.py
, then call the latest.