Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 436: character maps to <undefined> #758

Closed
1 task done
RiccardoRomagnoli opened this issue Oct 3, 2023 · 7 comments
Labels
bug Something isn't working sweep Assigns Sweep to an issue or pull request.

Comments

@RiccardoRomagnoli
Copy link

RiccardoRomagnoli commented Oct 3, 2023

Expected Behavior

gpt-engineer "path" -i command to work properly

Current Behavior

Error after "Press enter to proceed with modifications."

Steps to Reproduce

windows
python 3.9

Failure Logs

Traceback (most recent call last):

File "C:\tools\Anaconda3\envs\gpteng\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,

File "C:\tools\Anaconda3\envs\gpteng\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)

File "C:\tools\Anaconda3\envs\gpteng\Scripts\gpt-engineer.exe_main_.py", line 7, in
sys.exit(app())

File "C:\tools\Anaconda3\envs\gpteng\lib\site-packages\gpt_engineer\main.py", line 96, in main
messages = step(ai, dbs)

File "C:\tools\Anaconda3\envs\gpteng\lib\site-packages\gpt_engineer\steps.py", line 360, in improve_existing_code
files_info = get_code_strings(dbs.input) # this only has file names not paths

File "C:\tools\Anaconda3\envs\gpteng\lib\site-packages\gpt_engineer\chat_to_files.py", line 113, in get_code_strings
file_data = file.read()

File "C:\tools\Anaconda3\envs\gpteng\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 436: character maps to

Checklist
  • gpt_engineer/core/chat_to_files.py:get_code_strings ✅ Commit 83c9784
@RiccardoRomagnoli RiccardoRomagnoli added bug Something isn't working triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. labels Oct 3, 2023
@ATheorell
Copy link
Collaborator

My impression is that this error is related to the specific file that is read. Would you be able to share it somehow, preferably on your github?

@ATheorell ATheorell removed the triage Interesting but stale issue. Will be close if inactive for 3 more days after label added. label Oct 6, 2023
@RiccardoRomagnoli
Copy link
Author

it is working just fine in WSL

@ATheorell
Copy link
Collaborator

That is great! But, if we are to solve this problem without WSL, it would be great if we can reproduce it. UnicodeDecodeError sounds like it is related to the file. It would be great if you can try it again after the recent refactor and repost the stack trace (the code has changed quite a lot).

@AntonOsika
Copy link
Owner

This easily happens if you have a folder that contains non text files in file_list.txt and that folder contains cache files etc

@AntonOsika
Copy link
Owner

We should try to fix the error message!

Like "file xyz is not of text format, please make sure gpt-engineer doesn't read it"

@ATheorell ATheorell added the sweep Assigns Sweep to an issue or pull request. label Oct 8, 2023
@sweep-ai
Copy link
Contributor

sweep-ai bot commented Oct 8, 2023

Sweeping

75%

⚡ Sweep Free Trial: I'm creating this ticket using GPT-4. You have 4 GPT-4 tickets left for the month and 2 for the day. For more GPT-4 tickets, visit our payment portal.

Actions (click)

  • ↻ Restart Sweep

Step 1: 🔎 Searching

I found the following snippets in your repository. I will now analyze these snippets and come up with a plan.

Some code snippets I looked at (click to expand). If some file is missing from here, you can mention the path in the ticket description.

This function initiates the process to determine which files from an existing
codebase the AI should work with. By calling `ask_for_files()`, it prompts for
and sets the specific files that should be considered, storing their full paths.
Parameters:
- ai (AI): An instance of the AI model. Although passed to this function, it is
not used within the function scope and might be for consistency with other
function signatures.
- dbs (DBs): An instance containing the database configurations and project metadata,
which is used to gather information about the existing codebase. Additionally,
the 'input' is used to handle user interactions related to file selection.
Returns:
- list: Returns an empty list, which can be utilized for consistency in return
types across related functions.
Note:
- The selected file paths are stored as a side-effect of calling `ask_for_files()`,
and they aren't directly returned by this function.
"""
"""Sets the file list for files to work with in existing code mode."""
ask_for_files(dbs.project_metadata, dbs.input) # stores files as full paths.
return []
def assert_files_ready(ai: AI, dbs: DBs):
"""
Verify the presence of required files for headless 'improve code' execution.
This function checks the existence of 'file_list.txt' in the project metadata
and the presence of a 'prompt' in the input. If either of these checks fails,
an assertion error is raised to alert the user of the missing requirements.
Parameters:
- ai (AI): An instance of the AI model. Although passed to this function, it is
not used within the function scope and might be for consistency with other
function signatures.
- dbs (DBs): An instance containing the database configurations and project metadata,
which is used to validate the required files' presence.
Returns:
- list: Returns an empty list, which can be utilized for consistency in return
types across related functions.
Raises:
- AssertionError: If 'file_list.txt' is not present in the project metadata
or if 'prompt' is not present in the input.
Notes:
- This function is typically used in 'auto_mode' scenarios to ensure that the
necessary files are set up correctly before proceeding with the 'improve code'
operation.
"""
"""Checks that the required files are present for headless
improve code execution."""
assert (
"file_list.txt" in dbs.project_metadata
), "For auto_mode file_list.txt need to be in your .gpteng folder."
assert "prompt" in dbs.input, "For auto_mode a prompt file must exist."
return []
def get_improve_prompt(ai: AI, dbs: DBs):
"""
Prompt the user to specify what they'd like to improve within the selected files.
The function first checks if a prompt already exists in the input database (`dbs.input`).
If no prompt is provided, the user is asked interactively. After the prompt is set,
a confirmation string, listing the files selected for improvement and the provided
prompt, is displayed to the user. The user is then prompted to press enter to proceed
with the modifications.
Parameters:
- ai (AI): An instance of the AI model. While it's passed to this function,
it is not utilized within the function's scope. This might be for consistency
with other function signatures.
- dbs (DBs): An instance containing the database configurations and project metadata.
This is used to fetch the selected files for improvement and to set or get the
improvement prompt.
Returns:
- list: Returns an empty list, presumably for consistency in return types across
related functions.
Notes:
- The purpose of this function is mainly to prepare and confirm the setup before
proceeding with actual code improvements based on user input.
"""
"""
Asks the user what they would like to fix.
"""
if not dbs.input.get("prompt"):
dbs.input["prompt"] = input(
"\nWhat do you need to improve with the selected files?\n"
)
confirm_str = "\n".join(
[
"-----------------------------",
"The following files will be used in the improvement process:",
f"{FILE_LIST_NAME}:",
colored(str(dbs.project_metadata[FILE_LIST_NAME]), "green"),
"",
"The inserted prompt is the following:",
colored(f"{dbs.input['prompt']}", "green"),
"-----------------------------",
"",
"You can change these files in your project before proceeding.",
"",
"Press enter to proceed with modifications.",
"",
]
)
input(confirm_str)
return []
def improve_existing_code(ai: AI, dbs: DBs):
"""
Process and improve the code from a specified set of existing files based on a user prompt.
This function first retrieves the code from the designated files and then formats this
code to be processed by the Language Learning Model (LLM). After setting up the system prompt
for existing code improvements, the files' contents are sent to the LLM. Finally, the user's
prompt detailing desired improvements is passed to the LLM, and the subsequent response
from the LLM is used to overwrite the original files.
Parameters:
- ai (AI): An instance of the AI model that is responsible for processing and generating
responses based on the provided system and user inputs.
- dbs (DBs): An instance containing the database configurations, user prompts, and project metadata.
It is used to fetch the selected files for improvement and the user's improvement prompt.
Returns:
- list[Message]: Returns a list of Message objects that record the interaction between the
system, user, and the AI model. This includes both the input to and the response from the LLM.
Notes:
- Ensure that the user has correctly set up the desired files for improvement and provided an
appropriate prompt before calling this function.
- The function expects the files to be formatted in a specific way to be properly processed by the LLM.
"""
"""
After the file list and prompt have been aquired, this function is called
to sent the formatted prompt to the LLM.
"""
files_info = get_code_strings(
dbs.input, dbs.project_metadata
) # this has file names relative to the workspace path
messages = [
ai.fsystem(setup_sys_prompt_existing_code(dbs)),
]
# Add files as input
for file_name, file_str in files_info.items():
code_input = format_file_to_input(file_name, file_str)
messages.append(ai.fuser(f"{code_input}"))
messages.append(ai.fuser(f"Request: {dbs.input['prompt']}"))
messages = ai.next(messages, step_name=curr_fn())
overwrite_files(messages[-1].content.strip(), dbs)
return messages
def human_review(ai: AI, dbs: DBs):
"""
Collects human feedback on the code and stores it in memory.
This function prompts the user for a review of the generated or improved code using the `human_review_input`
function. If a valid review is provided, it's serialized to JSON format and stored within the database's
memory under the "review" key.
Parameters:
- ai (AI): An instance of the AI model. Although not directly used within the function, it is kept as
a parameter for consistency with other functions.
- dbs (DBs): An instance containing the database configurations, user prompts, project metadata,
and memory storage. This function specifically interacts with the memory storage to save the human review.
Returns:
- list: Returns an empty list, indicating that there's no subsequent interaction with the LLM
or no further messages to be processed.
Notes:
- It's assumed that the `human_review_input` function handles all the interactions with the user to
gather feedback and returns either the feedback or None if no feedback was provided.
- Ensure that the database's memory has enough space or is set up correctly to store the serialized review data.
"""
"""Collects and stores human review of the code"""
review = human_review_input()
if review is not None:
dbs.memory["review"] = review.to_json() # type: ignore
return []
class Config(str, Enum):
"""
Enumeration representing different configuration modes for the code processing system.
Members:
- DEFAULT: Standard procedure for generating, executing, and reviewing code.
- BENCHMARK: Used for benchmarking the system's performance without execution.
- SIMPLE: A basic procedure involving generation, execution, and review.
- LITE: A lightweight procedure for generating code without further processing.
- CLARIFY: Process that starts with clarifying ambiguities before code generation.
- EXECUTE_ONLY: Only executes the code without generation.
- EVALUATE: Execute the code and then undergo a human review.
- USE_FEEDBACK: Uses prior feedback for code generation and subsequent steps.
- IMPROVE_CODE: Focuses on improving existing code based on a provided prompt.
- EVAL_IMPROVE_CODE: Validates files and improves existing code.
- EVAL_NEW_CODE: Evaluates newly generated code without further steps.
Each configuration mode dictates the sequence and type of operations performed on the code.
"""
DEFAULT = "default"
BENCHMARK = "benchmark"
SIMPLE = "simple"
LITE = "lite"
CLARIFY = "clarify"
EXECUTE_ONLY = "execute_only"
EVALUATE = "evaluate"
USE_FEEDBACK = "use_feedback"
IMPROVE_CODE = "improve_code"
EVAL_IMPROVE_CODE = "eval_improve_code"
EVAL_NEW_CODE = "eval_new_code"
STEPS = {
Config.DEFAULT: [
simple_gen,
gen_entrypoint,
execute_entrypoint,
human_review,
],
Config.LITE: [
lite_gen,
],
Config.CLARIFY: [
clarify,
gen_clarified_code,
gen_entrypoint,
execute_entrypoint,
human_review,
],
Config.BENCHMARK: [
simple_gen,
gen_entrypoint,
],
Config.SIMPLE: [
simple_gen,
gen_entrypoint,
execute_entrypoint,
],
Config.USE_FEEDBACK: [use_feedback, gen_entrypoint, execute_entrypoint, human_review],
Config.EXECUTE_ONLY: [execute_entrypoint],
Config.EVALUATE: [execute_entrypoint, human_review],
Config.IMPROVE_CODE: [
set_improve_filelist,
get_improve_prompt,
improve_existing_code,
],
Config.EVAL_IMPROVE_CODE: [assert_files_ready, improve_existing_code],
Config.EVAL_NEW_CODE: [simple_gen],
}
"""
A dictionary mapping Config modes to a list of associated processing steps.
The STEPS dictionary dictates the sequence of functions or operations to be
performed based on the selected configuration mode from the Config enumeration.
This enables a flexible system where the user can select the desired mode and
the system can execute the corresponding steps in sequence.
Examples:
- For Config.DEFAULT, the system will first generate the code using `simple_gen`,
then generate the entry point with `gen_entrypoint`, execute the generated
code using `execute_entrypoint`, and finally collect human review using `human_review`.
- For Config.LITE, the system will only use the `lite_gen` function to generate the code.
This setup allows for modularity and flexibility in handling different user requirements and scenarios.
"""
# Future steps that can be added:
# run_tests_and_fix_files

- Using project's preprompts or default ones
- Verbosity level for logging
- Interact with AI, databases, and archive processes based on the user-defined parameters.
Notes:
- Ensure the .env file has the `OPENAI_API_KEY` or provide it in the working directory.
- The default project path is set to `projects/example`.
- For azure_endpoint, provide the endpoint for Azure OpenAI service.
"""
import logging
import os
from pathlib import Path
import openai
import typer
from dotenv import load_dotenv
from gpt_engineer.core.ai import AI
from gpt_engineer.core.db import DB, DBs, archive
from gpt_engineer.core.steps import STEPS, Config as StepsConfig
from gpt_engineer.cli.collect import collect_learnings
from gpt_engineer.cli.learning import collect_consent
app = typer.Typer() # creates a CLI app
def load_env_if_needed():
if os.getenv("OPENAI_API_KEY") is None:
load_dotenv()
if os.getenv("OPENAI_API_KEY") is None:
# if there is no .env file, try to load from the current working directory
load_dotenv(dotenv_path=os.path.join(os.getcwd(), ".env"))
openai.api_key = os.getenv("OPENAI_API_KEY")
def preprompts_path(use_custom_preprompts: bool, input_path: Path = None) -> Path:
original_preprompts_path = Path(__file__).parent.parent / "preprompts"
if not use_custom_preprompts:
return original_preprompts_path
custom_preprompts_path = input_path / "preprompts"
if not custom_preprompts_path.exists():
custom_preprompts_path.mkdir()
for file in original_preprompts_path.glob("*"):
if not (custom_preprompts_path / file.name).exists():
(custom_preprompts_path / file.name).write_text(file.read_text())
return custom_preprompts_path
@app.command()
def main(
project_path: str = typer.Argument("projects/example", help="path"),
model: str = typer.Argument("gpt-4", help="model id string"),
temperature: float = 0.1,
steps_config: StepsConfig = typer.Option(
StepsConfig.DEFAULT, "--steps", "-s", help="decide which steps to run"
),
improve_mode: bool = typer.Option(
False,
"--improve",
"-i",
help="Improve code from existing project.",
),
lite_mode: bool = typer.Option(
False,
"--lite",
"-l",
help="Lite mode - run only the main prompt.",
),
azure_endpoint: str = typer.Option(
"",
"--azure",
"-a",
help="""Endpoint for your Azure OpenAI Service (https://xx.openai.azure.com).
In that case, the given model is the deployment name chosen in the Azure AI Studio.""",
),
use_custom_preprompts: bool = typer.Option(
False,
"--use-custom-preprompts",
help="""Use your project's custom preprompts instead of the default ones.
Copies all original preprompts to the project's workspace if they don't exist there.""",
),
verbose: bool = typer.Option(False, "--verbose", "-v"),
):
logging.basicConfig(level=logging.DEBUG if verbose else logging.INFO)
if lite_mode:
assert not improve_mode, "Lite mode cannot improve code"
if steps_config == StepsConfig.DEFAULT:
steps_config = StepsConfig.LITE
if improve_mode:
assert (
steps_config == StepsConfig.DEFAULT
), "Improve mode not compatible with other step configs"
steps_config = StepsConfig.IMPROVE_CODE
load_env_if_needed()
ai = AI(
model_name=model,
temperature=temperature,
azure_endpoint=azure_endpoint,
)
input_path = Path(project_path).absolute()
print("Running gpt-engineer in", input_path, "\n")
workspace_path = input_path / "workspace"
project_metadata_path = input_path / ".gpteng"
memory_path = project_metadata_path / "memory"
archive_path = project_metadata_path / "archive"
dbs = DBs(
memory=DB(memory_path),
logs=DB(memory_path / "logs"),
input=DB(input_path),
workspace=DB(workspace_path),
preprompts=DB(preprompts_path(use_custom_preprompts, input_path)),
archive=DB(archive_path),
project_metadata=DB(project_metadata_path),
)
if steps_config not in [
StepsConfig.EXECUTE_ONLY,
StepsConfig.USE_FEEDBACK,
StepsConfig.EVALUATE,
StepsConfig.IMPROVE_CODE,
]:
archive(dbs)
if not dbs.input.get("prompt"):
dbs.input["prompt"] = input(
"\nWhat application do you want gpt-engineer to generate?\n"
)
steps = STEPS[steps_config]
for step in steps:
messages = step(ai, dbs)
dbs.logs[step.__name__] = AI.serialize_messages(messages)
print("Total api cost: $ ", ai.usage_cost())
if collect_consent():
collect_learnings(model, temperature, steps, dbs)
dbs.logs["token_usage"] = ai.format_token_usage_log()
if __name__ == "__main__":

"""
This module provides utilities to handle and process chat content, especially for extracting code blocks
and managing them within a specified GPT Engineer project ("workspace"). It offers functionalities like parsing chat messages to
retrieve code blocks, storing these blocks into a workspace, and overwriting workspace content based on
new chat messages. Moreover, it aids in formatting and reading file content for an AI agent's input.
Key Features:
- Parse and extract code blocks from chat messages.
- Store and overwrite files within a workspace based on chat content.
- Format files to be used as inputs for AI agents.
- Retrieve files and their content based on a provided list.
Dependencies:
- `os` and `pathlib`: For handling OS-level operations and path manipulations.
- `re`: For regex-based parsing of chat content.
- `gpt_engineer.core.db`: Database handling functionalities for the workspace.
- `gpt_engineer.file_selector`: Constants related to file selection.
Functions:
- parse_chat: Extracts code blocks from chat messages.
- to_files: Parses a chat and adds the extracted files to a workspace.
- overwrite_files: Parses a chat and overwrites files in the workspace.
- get_code_strings: Reads a file list and returns filenames and their content.
- format_file_to_input: Formats a file's content for input to an AI agent.
"""
import os
from pathlib import Path
import re
from typing import List, Tuple
from gpt_engineer.core.db import DB, DBs
from gpt_engineer.cli.file_selector import FILE_LIST_NAME
def parse_chat(chat) -> List[Tuple[str, str]]:
"""
Extracts all code blocks from a chat and returns them
as a list of (filename, codeblock) tuples.
Parameters
----------
chat : str
The chat to extract code blocks from.
Returns
-------
List[Tuple[str, str]]
A list of tuples, where each tuple contains a filename and a code block.
"""
# Get all ``` blocks and preceding filenames
regex = r"(\S+)\n\s*```[^\n]*\n(.+?)```"
matches = re.finditer(regex, chat, re.DOTALL)
files = []
for match in matches:
# Strip the filename of any non-allowed characters and convert / to \
path = re.sub(r'[\:<>"|?*]', "", match.group(1))
# Remove leading and trailing brackets
path = re.sub(r"^\[(.*)\]$", r"\1", path)
# Remove leading and trailing backticks
path = re.sub(r"^`(.*)`$", r"\1", path)
# Remove trailing ]
path = re.sub(r"[\]\:]$", "", path)
# Get the code
code = match.group(2)
# Add the file to the list
files.append((path, code))
# Get all the text before the first ``` block
readme = chat.split("```")[0]
files.append(("README.md", readme))
# Return the files
return files
def to_files(chat: str, workspace: DB):
"""
Parse the chat and add all extracted files to the workspace.
Parameters
----------
chat : str
The chat to parse.
workspace : DB
The workspace to add the files to.
"""
workspace["all_output.txt"] = chat # TODO store this in memory db instead
files = parse_chat(chat)
for file_name, file_content in files:
workspace[file_name] = file_content
def overwrite_files(chat: str, dbs: DBs) -> None:
"""
Parse the chat and overwrite all files in the workspace.
Parameters
----------
chat : str
The chat containing the AI files.
dbs : DBs
The database containing the workspace.
"""
dbs.memory["all_output_overwrite.txt"] = chat
files = parse_chat(chat)
for file_name, file_content in files:
if file_name == "README.md":
dbs.memory["LAST_MODIFICATION_README.md"] = file_content
else:
dbs.workspace[file_name] = file_content
def get_code_strings(workspace: DB, metadata_db: DB) -> dict[str, str]:
"""
Read file_list.txt and return file names and their content.
Parameters
----------
input : dict
A dictionary containing the file_list.txt.
Returns
-------
dict[str, str]
A dictionary mapping file names to their content.
"""
def get_all_files_in_dir(directory):
for root, dirs, files in os.walk(directory):
for file in files:
yield os.path.join(root, file)
for dir in dirs:
yield from get_all_files_in_dir(os.path.join(root, dir))
files_paths = metadata_db[FILE_LIST_NAME].strip().split("\n")
files = []
for full_file_path in files_paths:
if os.path.isdir(full_file_path):
for file_path in get_all_files_in_dir(full_file_path):
files.append(file_path)
else:
files.append(full_file_path)
files_dict = {}
for path in files:
assert os.path.commonpath([full_file_path, workspace.path]) == str(
workspace.path
), "Trying to edit files outside of the workspace"
file_name = os.path.relpath(path, workspace.path)
if file_name in workspace:
files_dict[file_name] = workspace[file_name]
return files_dict
def format_file_to_input(file_name: str, file_content: str) -> str:
"""
Format a file string to use as input to the AI agent.
Parameters
----------
file_name : str
The name of the file.
file_content : str
The content of the file.
Returns
-------
str
The formatted file string.
"""
file_str = f"""
{file_name}
```
{file_content}
```
"""

import textwrap
from gpt_engineer.core.chat_to_files import to_files
def test_to_files():
chat = textwrap.dedent(
"""
This is a sample program.
file1.py
```python
print("Hello, World!")
```
file2.py
```python
def add(a, b):
return a + b
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"file2.py": "def add(a, b):\n return a + b\n",
"README.md": "\nThis is a sample program.\n\nfile1.py\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_to_files_with_square_brackets():
chat = textwrap.dedent(
"""
This is a sample program.
[file1.py]
```python
print("Hello, World!")
```
[file2.py]
```python
def add(a, b):
return a + b
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"file2.py": "def add(a, b):\n return a + b\n",
"README.md": "\nThis is a sample program.\n\n[file1.py]\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_files_with_brackets_in_name():
chat = textwrap.dedent(
"""
This is a sample program.
[id].jsx
```javascript
console.log("Hello, World!")
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"[id].jsx": 'console.log("Hello, World!")\n',
"README.md": "\nThis is a sample program.\n\n[id].jsx\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_files_with_file_colon():
chat = textwrap.dedent(
"""
This is a sample program.
[FILE: file1.py]
```python
print("Hello, World!")
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"README.md": "\nThis is a sample program.\n\n[FILE: file1.py]\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_files_with_back_tick():
chat = textwrap.dedent(
"""
This is a sample program.
`file1.py`
```python
print("Hello, World!")
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"README.md": "\nThis is a sample program.\n\n`file1.py`\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_files_with_newline_between():
chat = textwrap.dedent(
"""
This is a sample program.
file1.py
```python
print("Hello, World!")
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"README.md": "\nThis is a sample program.\n\nfile1.py\n\n",
}
for file_name, file_content in expected_files.items():
assert workspace[file_name] == file_content
def test_files_with_newline_between_header():
chat = textwrap.dedent(
"""
This is a sample program.
## file1.py
```python
print("Hello, World!")
```
"""
)
workspace = {}
to_files(chat, workspace)
assert workspace["all_output.txt"] == chat
expected_files = {
"file1.py": 'print("Hello, World!")\n',
"README.md": "\nThis is a sample program.\n\n## file1.py\n\n",
}
for file_name, file_content in expected_files.items():

"""
This module provides annotations related to the steps workflow in GPT Engineer.
Imports:
- `Callable`, `List`, and `TypeVar` from `typing` module, which are used for type hinting.
- `AI` class from the `gpt_engineer.core.ai` module.
- `DBs` class from the `gpt_engineer.core.db` module.
Variables:
- `Step`: This is a generic type variable that represents a Callable or function type. The function
is expected to accept two parameters: an instance of `AI` and an instance of `DBs`. The function
is expected to return a list of dictionaries.
"""
from typing import Callable, List, TypeVar
from gpt_engineer.core.ai import AI
from gpt_engineer.core.db import DBs


Step 2: ⌨️ Coding

  • gpt_engineer/core/chat_to_files.py:get_code_strings ✅ Commit 83c9784
• Import the `codecs` module at the top of the file.
• In the `get_code_strings` function, replace the line `file_data = file.read()` with the following lines: ```python import codecs with codecs.open(file_path, 'r', encoding='utf-8', errors='ignore') as file: file_data = file.read() ``` This will open and read the file using the 'utf-8' encoding, which can handle a wider range of characters than 'cp1252'. The `errors='ignore'` argument will cause any decoding errors to be ignored, which should prevent the UnicodeDecodeError from being raised.
Sandbox Execution Logs
pre-commit run --files `gpt_engineer/core/chat_to_files.py:get_code_strings` 1/4 ✓
[INFO] Initializing environment for https://github.com/psf/black.
[INFO] Initializing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Installing environment for https://github.com/psf/black.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
[INFO] Installing environment for https://github.com/pre-commit/pre-commit-hooks.
[INFO] Once installed this environment will be reused.
[INFO] This may take a few minutes...
black....................................................................Passed
check toml...........................................(no files to check)Skipped
check yaml...........................................(no files to check)Skipped
detect private key.......................................................Passed
fix end of files.........................................................Passed
trim trailing whitespace.................................................Passed
pre-commit run --files `gpt_engineer/core/chat_to_files.py:get_code_strings` 2/4 ❌ (`1`)
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted gpt_engineer/core/chat_to_files.py
All done! ✨ 🍰 ✨
1 file reformatted.
pre-commit run --files `gpt_engineer/core/chat_to_files.py:get_code_strings` 3/4 ❌ (`1`)
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted gpt_engineer/core/chat_to_files.py
All done! ✨ 🍰 ✨
1 file reformatted.
pre-commit run --files `gpt_engineer/core/chat_to_files.py:get_code_strings` 4/4 ❌ (`1`)
black....................................................................Failed
- hook id: black
- files were modified by this hook
reformatted gpt_engineer/core/chat_to_files.py
All done! ✨ 🍰 ✨
1 file reformatted.

Step 3: 🔁 Code Review

I have finished coding the issue. I am now reviewing it for completeness.


🎉 Latest improvements to Sweep:

  • Sweep can now passively improve your repository! Check out Rules to learn more.

💡 To recreate the pull request edit the issue title or description. To tweak the pull request, leave a comment on the pull request.
Join Our Discord

@TheoMcCabe
Copy link
Collaborator

Fix is out here:

#801

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working sweep Assigns Sweep to an issue or pull request.
Projects
None yet
Development

No branches or pull requests

4 participants