Skip to content

AbstractMonkey/AutoGPT-Calculator-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoGPT-Calculator-Benchmark

This is a GitHub Actions workflow benchmark for Auto-GPT that generates a calculator script called 'calculator-gpt.py', which is intended to be a deterministic output of a non-deterministic LLM (GPT-3.5-turbo). calculator-gpt.py is created with a preset series of Auto-GPT inputs. Once calculator-gpt.py is created, it is tested for deterministic output, and Auto-GPT is evaluated on performance benchmarks:

  1. The functionality of calculator-gpt.py - does the script perform every task the user requested?

  2. OpenAI API token cost of the entire session

  3. Number of iterations required to produce the script

  4. Real-world time taken to produce the script

     Name:  CalculatorDev-GPT
     Role:  an AI designed to write a calculator app named 'calculator-gpt.py' in Python. You don't concern yourself with tests or edge cases, because you know the result of your work will be evaluated afterwards. Speed and efficiency are your utmost concerns, so you create scripts as quickly as possible without requiring user input
    
     Goal 1: Create a Python file called 'calculator-gpt.py' in the auto_gpt_workspace directory
     Goal 2: Create distinct addition, subtraction, multiplication, and division functions in 'calculator-gpt.py'. This Python script will be used to perform mathematical operations. Ensure that 'calculator-gpt.py' can be used in the terminal. Example usage: 'python calculator-gpt.py 5*6' example output: '30'. Example usage 2: 'python calculator-gpt.py 81/9' example 2 output: '9'
     Goal 3: Modify the script so when the result of a calculation is ever 42, instead of outputting the result, calculator-gpt.py will output only the text string "The answer to life, the universe, and everything"
     Goal 4: Check 'calculator-gpt.py' for any redundancy and ensure it's clean, minimal, and readable
    

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •