Captain Agent

These are the supplementary files for the "Adaptive In-conversation Team Building for Language Model Agents." They contain code for running the experiments in the paper.

The codebase is developed upon the AutoGen, where our implementations are located at autogen/agentchat/contrib/meta_agent.py and autogen/agentchat/contrib/meta_user_proxy_agent.py

Instruction

We use autogenbench to test all scenarios in our benchmark. For the detailed instruction on using autogenbench, please refer to autogenbench. We also provided some brief instructions for autogenbench below.

Installation

The codebase is built upon autogenbench and autogen. So instead of installing via pip, you should install pyautogen and autogenbench in editable way:

cd /path/to/autogen
pip install -e .
cd /path/to/autogen/samples/autogenbench
pip install -e .

Modify the first line in requirement.txt to the path of your autogen-autobuild-dev.

Evaluations

This is the general method to run evaluations on different scenarios. Use the following command to run the benchmark for each scenario:

cd [SCENARIO FOLDER. For example, /path/to/scenarios/MATH]
python Scripts/init_tasks.py  // initialize the tasks
autogenbench run Tasks/[TASK YOU WANT TO RUN].jsonl --native  // run the task. native is use to run the scenario without docker. If you have a docker environment, you can remove it.
autogenbench tabulate Results/[TASK YOU WANT TO RUN]  // print the results in tabulate.

If you want to debug, set -s 1 to use a single data for testing:

cd [SCENARIO FOLDER. For example, /path/to/scenarious/MATH]
autogenbench run Tasks/[TASK YOU WANT TO RUN].jsonl -s 1

If you want to debug a specific problem, you can run the scenario.py in Results/[YOUR TASK]/[PROBLEM ID]/0/scenario.py manually in debug mode.

Note that every time the autogenbench run TASK is run, it checks the Results folder and only runs problems that are not in it. If you want to rerun the tasks, delete the corresponding files in the Results folder.

Some templates requires manual addition to the Templates/scenarios.py, it is recommended to check the code and fill out the placeholders. For detailed instructions on running each benchmark, please take a look at the respective readme in the folder.

Name		Name	Last commit message	Last commit date
Latest commit History 1,735 Commits
.github		.github
autobuild_bench		autobuild_bench
autogen		autogen
samples		samples
scripts		scripts
tools		tools
.coveragerc		.coveragerc
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
LICENSE-CODE		LICENSE-CODE
OAI_CONFIG_LIST_sample		OAI_CONFIG_LIST_sample
README.md		README.md
SECURITY.md		SECURITY.md
TRANSPARENCY_FAQS.md		TRANSPARENCY_FAQS.md
agent_library.json		agent_library.json
azure-pipelines.yml		azure-pipelines.yml
pyproject.toml		pyproject.toml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Captain Agent

Instruction

Installation

Evaluations

About

Licenses found

Releases

Packages

Languages

License

Licenses found

LinxinS97/captain_agent_demo

Folders and files

Latest commit

History

Repository files navigation

Captain Agent

Instruction

Installation

Evaluations

About

Resources

License

Licenses found

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages