Skip to content

Official implementation of Captain Agent

License

CC-BY-4.0, MIT licenses found

Licenses found

CC-BY-4.0
LICENSE
MIT
LICENSE-CODE
Notifications You must be signed in to change notification settings

LinxinS97/captain_agent_demo

 
 

Captain Agent

These are the supplementary files for the "Adaptive In-conversation Team Building for Language Model Agents." They contain code for running the experiments in the paper.

The codebase is developed upon the AutoGen, where our implementations are located at autogen/agentchat/contrib/meta_agent.py and autogen/agentchat/contrib/meta_user_proxy_agent.py

Instruction

We use autogenbench to test all scenarios in our benchmark. For the detailed instruction on using autogenbench, please refer to autogenbench. We also provided some brief instructions for autogenbench below.

Installation

The codebase is built upon autogenbench and autogen. So instead of installing via pip, you should install pyautogen and autogenbench in editable way:

cd /path/to/autogen
pip install -e .
cd /path/to/autogen/samples/autogenbench
pip install -e .

Modify the first line in requirement.txt to the path of your autogen-autobuild-dev.

Evaluations

This is the general method to run evaluations on different scenarios. Use the following command to run the benchmark for each scenario:

cd [SCENARIO FOLDER. For example, /path/to/scenarios/MATH]
python Scripts/init_tasks.py  // initialize the tasks
autogenbench run Tasks/[TASK YOU WANT TO RUN].jsonl --native  // run the task. native is use to run the scenario without docker. If you have a docker environment, you can remove it.
autogenbench tabulate Results/[TASK YOU WANT TO RUN]  // print the results in tabulate.

If you want to debug, set -s 1 to use a single data for testing:

cd [SCENARIO FOLDER. For example, /path/to/scenarious/MATH]
autogenbench run Tasks/[TASK YOU WANT TO RUN].jsonl -s 1

If you want to debug a specific problem, you can run the scenario.py in Results/[YOUR TASK]/[PROBLEM ID]/0/scenario.py manually in debug mode.

Note that every time the autogenbench run TASK is run, it checks the Results folder and only runs problems that are not in it. If you want to rerun the tasks, delete the corresponding files in the Results folder.

Some templates requires manual addition to the Templates/scenarios.py, it is recommended to check the code and fill out the placeholders. For detailed instructions on running each benchmark, please take a look at the respective readme in the folder.

About

Official implementation of Captain Agent

Resources

License

CC-BY-4.0, MIT licenses found

Licenses found

CC-BY-4.0
LICENSE
MIT
LICENSE-CODE

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 98.6%
  • Shell 1.4%