Merge branch 'main' of github.com:AntonOsika/gpt-engineer

* 'main' of github.com:AntonOsika/gpt-engineer: Mark test as failed because it requires OpenAI API access currently `black` Create test_ai.py fix to_files execute_workspace -> gen_entrypoint; execute_entrypoint Ignore my-new-project/ Added CODE_OF_CONDUCT.md to the .github directory (#147) make pre commit pass in the whole codebase (#149) Create ci.yaml Fix linting Add support for directory paths in filenames and improve code splitting - Enforce an explicit markdown code block format - Add a token to split the output to clearly detect when the code blocks start - Save all non-code output to a `README.md` file - Update RegEx to extract and strip text more reliably and clean up the output - Update the identify prompts appropriately Enhance philosophy to include supporting documents - Create instructions for running/compiling the project - Create any package manager files Generate instructions for all platforms - Update prompt to create instructions for all 3 major OS platforms - Fix small typo Add support for directory creation and binary files - Use the `Path` module instead of `os` - Add ability to create any amount of missing directories for a given file - Add ability to save both text and binary files to save images (or other file types) later Add cleanup & move `projects` to their own directory - Add optional argument to clean and delete the working directories of the project before running the prompt - Add `.gitignore` entry to ignore all possible projects - Update readme
AntonOsika · Jun 18, 2023 · e90ac46 · e90ac46
2 parents 4a212d9 + d3d1c9e
commit e90ac46
Show file tree

Hide file tree

Showing 17 changed files with 352 additions and 114 deletions.
diff --git a/.github/CODE_OF_CONDUCT.md b/.github/CODE_OF_CONDUCT.md
@@ -0,0 +1,131 @@
+# Contributor Covenant Code of Conduct
+
+## Our Pledge
+
+We as members, contributors, and leaders pledge to make participation in our
+community a harassment-free experience for everyone, regardless of age, body
+size, visible or invisible disability, ethnicity, sex characteristics, gender
+identity or expression, level of experience, education, socio-economic status,
+nationality, personal appearance, race, caste, color, religion, or sexual
+identity and orientation.
+
+We pledge to act and interact in ways that contribute to an open, welcoming,
+diverse, inclusive, and healthy community.
+
+## Our Standards
+
+Examples of behavior that contributes to a positive environment for our
+community include:
+
+* Demonstrating empathy and kindness toward other people
+* Being respectful of differing opinions, viewpoints, and experiences
+* Giving and gracefully accepting constructive feedback
+* Accepting responsibility and apologizing to those affected by our mistakes,
+  and learning from the experience
+* Focusing on what is best not just for us as individuals, but for the overall
+  community
+
+Examples of unacceptable behavior include:
+
+* The use of sexualized language or imagery, and sexual attention or advances of
+  any kind
+* Trolling, insulting or derogatory comments, and personal or political attacks
+* Public or private harassment
+* Publishing others' private information, such as a physical or email address,
+  without their explicit permission
+* Other conduct which could reasonably be considered inappropriate in a
+  professional setting
+
+## Enforcement Responsibilities
+
+Community leaders are responsible for clarifying and enforcing our standards of
+acceptable behavior and will take appropriate and fair corrective action in
+response to any behavior that they deem inappropriate, threatening, offensive,
+or harmful.
+
+Community leaders have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are
+not aligned to this Code of Conduct, and will communicate reasons for moderation
+decisions when appropriate.
+
+## Scope
+
+This Code of Conduct applies within all community spaces, and also applies when
+an individual is officially representing the community in public spaces.
+Examples of representing our community include using an official e-mail address,
+posting using an official social media account, or acting as an appointed
+representative at an online or offline event.
+
+## Enforcement
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be
+reported to the community leaders responsible for enforcement at
+[email protected].
+All complaints will be reviewed and investigated promptly and fairly.
+
+All community leaders are obligated to respect the privacy and security of reporters of incidents.
+
+## Enforcement Guidelines
+
+Community leaders will follow these Community Impact Guidelines in determining
+the consequences for any action they deem in violation of this Code of Conduct:
+
+### 1. Correction
+
+**Community Impact**: Use of inappropriate language or other behavior deemed
+unprofessional or unwelcome in the community.
+
+**Consequence**: A private, written warning from community leaders, providing
+clarity around the nature of the violation and an explanation of why the
+behavior was inappropriate. A public apology may be requested.
+
+### 2. Warning
+
+**Community Impact**: A violation through a single incident or series of
+actions.
+
+**Consequence**: A warning with consequences for continued behavior. No
+interaction with the people involved, including unsolicited interaction with
+those enforcing the Code of Conduct, for a specified period of time. This
+includes avoiding interactions in community spaces as well as external channels
+like social media. Violating these terms may lead to a temporary or permanent
+ban.
+
+### 3. Temporary Ban
+
+**Community Impact**: A serious violation of community standards, including
+sustained inappropriate behavior.
+
+**Consequence**: A temporary ban from any sort of interaction or public
+communication with the community for a specified period of time. No public or
+private interaction with the people involved, including unsolicited interaction
+with those enforcing the Code of Conduct, is allowed during this period.
+Violating these terms may lead to a permanent ban.
+
+### 4. Permanent Ban
+
+**Community Impact**: Demonstrating a pattern of violation of community
+standards, including sustained inappropriate behavior, harassment of an
+individual, or aggression toward or disparagement of classes of individuals.
+
+**Consequence**: A permanent ban from any sort of public interaction within the
+community.
+
+## Attribution
+
+This Code of Conduct is adapted from the [Contributor Covenant][homepage],
+version 2.1, available at
+[https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
+
+Community Impact Guidelines were inspired by
+[Mozilla's code of conduct enforcement ladder][Mozilla CoC].
+
+For answers to common questions about this code of conduct, see the FAQ at
+[https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
+[https://www.contributor-covenant.org/translations][translations].
+
+[homepage]: https://www.contributor-covenant.org
+[v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
+[Mozilla CoC]: https://github.com/mozilla/diversity
+[FAQ]: https://www.contributor-covenant.org/faq
+[translations]: https://www.contributor-covenant.org/translations
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
@@ -0,0 +1,33 @@
+on:
+  pull_request:
+    branches:
+      - main
+  push:
+    branches:
+      - main
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version:
+          - "3.8"
+          - "3.9"
+          - "3.10"
+    steps:
+      - uses: actions/checkout@v3
+
+      - uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+          cache: pip
+
+      - name: Install package
+        run: pip install -e .
+
+      - name: Install test runner
+        run: pip install pytest pytest-cov
+
+      - name: Run unit tests
+        run: pytest --cov=gpt_engineer
diff --git a/.gitignore b/.gitignore
@@ -38,3 +38,7 @@ archive
 # any log file
 *log.txt
 todo
+
+# Ignore GPT Engineer files
+projects
+my-new-project/
diff --git a/README.md b/README.md
@@ -22,12 +22,13 @@ GPT Engineer is made to be easy to adapt, extend, and make your agent learn how
 - `export OPENAI_API_KEY=[your api key]` with a key that has GPT4 access
 
 **Run**:
-- Create a new empty folder with a `main_prompt` file (or copy the example folder `cp -r example/ my-new-project`)
+- Create a new empty folder with a `main_prompt` file in the `projects` folder (or copy the example folder `cp -r projects/example/ projects/my-new-project`)
 - Fill in the `main_prompt` in your new folder
 - Run `python -m gpt_engineer.main my-new-project`
+  - Optionally pass in `true` to delete the working files before running
 
 **Results**:
-- Check the generated files in my-new-project/workspace
+- Check the generated files in projects/my-new-project/workspace
 
 ### Limitations
 Implementing additional chain of thought prompting, e.g. [Reflexion](https://github.com/noahshinn024/reflexion), should be able to make it more reliable and not miss requested functionality in the main prompt.

diff --git a/gpt_engineer/ai.py b/gpt_engineer/ai.py
@@ -8,10 +8,12 @@ def __init__(self, **kwargs):
         try:
             openai.Model.retrieve("gpt-4")
         except openai.error.InvalidRequestError:
-            print("Model gpt-4 not available for provided api key reverting "
-                  "to gpt-3.5.turbo. Sign up for the gpt-4 wait list here: "
-                  "https://openai.com/waitlist/gpt-4-api")
-            self.kwargs['model'] = "gpt-3.5-turbo"
+            print(
+                "Model gpt-4 not available for provided api key reverting "
+                "to gpt-3.5.turbo. Sign up for the gpt-4 wait list here: "
+                "https://openai.com/waitlist/gpt-4-api"
+            )
+            self.kwargs["model"] = "gpt-3.5-turbo"
 
     def start(self, system, user):
         messages = [
@@ -26,10 +28,10 @@ def fsystem(self, msg):
 
     def fuser(self, msg):
         return {"role": "user", "content": msg}
+
     def fassistant(self, msg):
         return {"role": "assistant", "content": msg}
 
-
     def next(self, messages: list[dict[str, str]], prompt=None):
         if prompt:
             messages = messages + [{"role": "user", "content": prompt}]

diff --git a/gpt_engineer/chat_to_files.py b/gpt_engineer/chat_to_files.py
@@ -1,27 +1,42 @@
 import re
-from typing import List, Tuple
-from gpt_engineer.db import DB
 
 
-def parse_chat(chat) -> List[Tuple[str, str]]:
-    # Get all ``` blocks
-    regex = r"```(.*?)```"
+def parse_chat(chat):  # -> List[Tuple[str, str]]:
+    # Split the chat into sections by the "*CODEBLOCKSBELOW*" token
+    split_chat = chat.split("*CODEBLOCKSBELOW*")
 
-    matches = re.finditer(regex, chat, re.DOTALL)
+    # Check if the "*CODEBLOCKSBELOW*" token was found
+    is_token_found = len(split_chat) > 1
+
+    # If the "*CODEBLOCKSBELOW*" token is found, use the first part as README
+    # and second part as code blocks. Otherwise, treat README as optional and
+    # proceed with empty README and the entire chat as code blocks
+    readme = split_chat[0].strip() if is_token_found else "No readme"
+    code_blocks = split_chat[1] if is_token_found else chat
+
+    # Get all ``` blocks and preceding filenames
+    regex = r"(\S+?)\n```\S+\n(.+?)```"
+    matches = re.finditer(regex, code_blocks, re.DOTALL)
 
     files = []
     for match in matches:
-        path = match.group(1).split("\n")[0]
+        # Strip the filename of any non-allowed characters and convert / to \
+        path = re.sub(r'[<>"|?*]', "", match.group(1))
+
         # Get the code
-        code = match.group(1).split("\n")[1:]
-        code = "\n".join(code)
+        code = match.group(2)
+
         # Add the file to the list
         files.append((path, code))
 
+    # Add README to the list
+    files.append(("README.txt", readme))
+
+    # Return the files
     return files
 
 
-def to_files(chat: str, workspace: DB):
+def to_files(chat, workspace):
     workspace["all_output.txt"] = chat
 
     files = parse_chat(chat)

diff --git a/gpt_engineer/db.py b/gpt_engineer/db.py
@@ -1,33 +1,51 @@
 from dataclasses import dataclass
-import os
 from pathlib import Path
 
 
+# This class represents a simple database that stores its data as files in a directory.
+# It supports both text and binary files, and can handle directory structures.
 class DB:
-    """A simple key-value store, where keys are filenames and values are file contents."""
-
     def __init__(self, path):
+        # Convert the path string to a Path object and get its absolute path.
         self.path = Path(path).absolute()
-        os.makedirs(self.path, exist_ok=True)
+
+        # Create the directory if it doesn't exist.
+        self.path.mkdir(parents=True, exist_ok=True)
 
     def __getitem__(self, key):
-        with open(self.path / key, encoding='utf-8') as f:
-            return f.read()
+        # Combine the database directory with the provided file path.
+        full_path = self.path / key
+
+        # Check if the file exists before trying to open it.
+        if full_path.is_file():
+            # Open the file in text mode and return its content.
+            with full_path.open("r") as f:
+                return f.read()
+        else:
+            # If the file doesn't exist, raise an error.
+            raise FileNotFoundError(f"No such file: '{full_path}'")
 
     def __setitem__(self, key, val):
-        Path(self.path / key).absolute().parent.mkdir(parents=True, exist_ok=True)
+        # Combine the database directory with the provided file path.
+        full_path = self.path / key
 
-        with open(self.path / key, 'w', encoding='utf-8') as f:
-            f.write(val)
+        # Create the directory tree if it doesn't exist.
+        full_path.parent.mkdir(parents=True, exist_ok=True)
 
-    def __contains__(self, key):
-        return (self.path / key).exists()
+        # Write the data to the file. If val is a string, it's written as text.
+        # If val is bytes, it's written as binary data.
+        if isinstance(val, str):
+            full_path.write_text(val)
+        elif isinstance(val, bytes):
+            full_path.write_bytes(val)
+        else:
+            # If val is neither a string nor bytes, raise an error.
+            raise TypeError("val must be either a str or bytes")
 
 
+# dataclass for all dbs:
 @dataclass
 class DBs:
-    """A dataclass for all dbs"""
-
     memory: DB
     logs: DB
     identity: DB

diff --git a/gpt_engineer/main.py b/gpt_engineer/main.py
@@ -1,32 +1,41 @@
-import os
 import json
+import os
 import pathlib
+import shutil
+
 import typer
 
-from gpt_engineer.chat_to_files import to_files
 from gpt_engineer.ai import AI
-from gpt_engineer.steps import STEPS
 from gpt_engineer.db import DB, DBs
-
+from gpt_engineer.steps import STEPS
 
 app = typer.Typer()
 
 
 @app.command()
 def chat(
-    project_path: str = typer.Argument(str(pathlib.Path(os.path.curdir) / "example"), help="path"),
+    project_path: str = typer.Argument("example", help="path"),
+    delete_existing: str = typer.Argument(None, help="delete existing files"),
     run_prefix: str = typer.Option(
         "",
-        help="run prefix, if you want to run multiple variants of the same project and later compare them",
+        help=(
+            "run prefix, if you want to run multiple variants of the same project and "
+            "later compare them",
+        ),
     ),
     model: str = "gpt-4",
     temperature: float = 0.1,
     steps_config: str = "default",
 ):
     app_dir = pathlib.Path(os.path.curdir)
-    input_path = project_path
-    memory_path = pathlib.Path(project_path) / (run_prefix + "memory")
-    workspace_path = pathlib.Path(project_path) / (run_prefix + "workspace")
+    input_path = pathlib.Path(app_dir / "projects" / project_path)
+    memory_path = input_path / (run_prefix + "memory")
+    workspace_path = input_path / (run_prefix + "workspace")
+
+    if delete_existing == "true":
+        # Delete files and subdirectories in paths
+        shutil.rmtree(memory_path, ignore_errors=True)
+        shutil.rmtree(workspace_path, ignore_errors=True)
 
     ai = AI(
         model=model,
@@ -45,5 +54,6 @@ def chat(
         messages = step(ai, dbs)
         dbs.logs[step.__name__] = json.dumps(messages)
 
+
 if __name__ == "__main__":
     app()