-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.py files are not being created. I just get all_output.txt that I manually have to create from. #35
Comments
i have same issue |
a hack for python def parse_chat(chat):# -> List[Tuple[str, str]]:
|
Heres my solution: import re
import os
save_dir = "results/"
f = open("example/workspace/all_output.txt", "r")
s = f.read()
#pattern = re.compile(r'^\*\*(.*?\.py)\*\*\s+```python\s+.*?(^(?:.*\n)*?)^```\s*', re.MULTILINE) #Example **game.py**
#pattern = re.compile(r'^(.*?\.py):\s+```python\s+.*?(^(?:.*\n)*?)^```\s*', re.MULTILINE) #Example game.py:
pattern = re.compile(r'^.*?\((.*?\.py)\):\s+```python\s+.*?(^(?:.*\n)*?)^```\s*', re.MULTILINE) #Example Game File (game.py):
os.makedirs(save_dir, exist_ok=True)
for (file_name, file_text) in re.findall(pattern, s):
write_file = open(save_dir + file_name, "w")
write_file.write(file_text)
write_file.close()
print(file_name, "\n") but sometimes you have to change the regex because the output isn't always the same. |
where do I put this? |
+1 |
create a file call it 'create_files.py' inside the main repo directory. then run python create_files.py |
is @offiub a bot account |
it's kind of annoying and asking for private data. Please discontinue the requests of email and numbers or anything private. |
Hi, when I run your script in the main GPT-Engineer folder and point the save_dir and open paths to the correct all_output.txt file. For me the script runs, creates the results folder - but leaves it empty. The all_outputs.txt file has all of the code inside of it properly, it seems the script isn't building the .py files in /results for me. This is your code I'm using with my custom paths: (In all_output.txt is has the py title format: entrypoint.py, file_converter.py, gui.py import re save_dir = "/home/ailocal/apps/gpt-engineer/results/" |
Hopefully the repo will figure out a way to keep the formatting of the all_output.txt consistent soon. I've had to make about a dozen different patterns to capture different formating scenarios. Can you give me a copy of a chunk of your all_output.txt file where the first code file starts? I can make you a correct regex when I see it |
It's probably going to be something like this pattern = re.compile(r'^(.?.py)\s+python\s+.?(^(?:.\n)*?)^\s', re.MULTILINE) #Example game.py But can't be sure until I see a chunk of your output |
I am trying to follow these steps but I dont see any game.py. where this file should be? |
That's just an example, if you're generated project doesn't have a game.py that's fine the regex captures {some_file_name}.py and then creates a new file and prints the text into it. The issue you will probably run into is with the format of all_output.txt, which will determine which pattern you need to use. Since the repo main branch hasn't been able to formalize a format for the output consistently from gpt-# you have to figure out what to anchor on for each run of main.py If you paste a chunk of your all_output.txt I can tell you which pattern to use, or create a new pattern appropriate for how your output looks if it's not one of the ones I have above. |
Hi thank you so much. That would be amazing if I could get that script working. Here is a chunk of my all_output.txt entrypoint.py from typing import List
from tkinter import Tk, filedialog, messagebox
from file_converter import FileConverter
from gui import GUI
def main():
root = Tk()
root.withdraw()
gui = GUI(root)
root.mainloop()
if __name__ == "__main__":
main() file_converter.py from typing import List
import openpyxl
import csv
import os
class FileConverter:
def __init__(self, file_path: str):
self.file_path = file_path
self.workbook = openpyxl.load_workbook(filename=self.file_path)
gui.py
```python
from tkinter import Tk, Label, Button, filedialog, messagebox
from file_converter import FileConverter
class GUI:
def __init__(self, master: Tk):
self.master = master
self.master.title("CSV/XLSX Converter")
self.file_path = ""
self.output_path = ""
self.sheet_names = []
self.active_sheet_name = ""
self.create_widgets() |
I have improved the "parse_chat" function by treating "code discovery" and "filename discovery" as different tasks. I have created a new "extensions.txt" file in the root directory and replaced the entire chat_to_files.py file with: chat_to_files.py import re
def build_regex_from_file(filename: str = 'extensions.txt') -> str:
'''
Builds a regex from a file containing a list of extensions.
File should be formatted as follows:
```extensions.txt
py
ts
js
html
css
```
'''
with open(filename, 'r') as file:
exts = file.read().splitlines()
# Pipe acts as an OR operator in regex
extension_str = '|'.join(exts)
return r"\b[\w\-.]+?\.(?:" + extension_str + r")\b"
def parse_chat(chat): # -> List[Tuple[str, str]]:
# Get all unique filenames
filenames = re.findall(build_regex_from_file(), chat)
# Drop duplicates in case they are mentioned multiple times
filenames = list(dict.fromkeys(filenames))
# Get all ``` (code) blocks
code_matches = re.finditer(r"```(.*?)```", chat, re.DOTALL)
files = []
for i, match in enumerate(code_matches):
# path = match.group(1).split("\n")[0]
path = filenames[i]
# Get the code
code = match.group(1).split("\n")[1:]
code = "\n".join(code)
# Add the file to the list
files.append((path, code))
return files
def to_files(chat, workspace):
workspace['all_output.txt'] = chat
files = parse_chat(chat)
for file_name, file_content in files:
workspace[file_name] = file_content extensions.txt
Hope this helps! |
that is a good idea. Would you feel comfortable creating a PR with your changes? If so I would suggest using a constant for the file extensions, in a file FILE_EXTENSIONS = ['py', ..., 'css'] and the import it in Or maybe something even better, isn't there a way to avoid the extensions all together? I get the regex works well with this approach, but this solution scales badly, we would want to support all extensions, so maybe modifying the regex to just look for |
Absolutely! Working on it. I'll drop the current extensions logic as my own extensions.txt has over 100 lines... |
@goncalomoita . You are amazing, that worked right away. Thank you so much, this really streamlines GPT-Engineer. Can the developer add this code into future builds? |
This a huge improvement. I think we need to figure out a way to keep the formatting consistent on the GPT side of things if possible. We might be able to request to revaluate the output and have it check against a standard format to validate that it's format is correct and if not then correct the output. I think this is probably the best side to resolve the issues. The other option of course is to constantly monitoring for quarks and variations as they come and try to compensate for them with additional complexity in the regex, which I am not a fan of, but, if it need be that way then let it be. Perhaps we discover there is a finite of variance in formats written out. |
Also it might be advantages to use this repo to pull the extensions from: https://gist.github.com/ppisarczyk/43962d06686722d26d176fad46879d41 instead of manually having to add them. It could just download then json parse it and have an up to day list of programming languages extensions, less maintenance for the future and makes it easier on any one who is use gpt-engineer. |
Here is the change to use the link I mentioned above: import re
from urllib.request import urlopen
import json
def build_regex_from_file(filename: str = 'extensions.txt') -> str:
#Builds a regex from https://gist.githubusercontent.com/ppisarczyk/43962d06686722d26d176fad46879d41/raw/211547723b4621a622fc56978d74aa416cbd1729/Programming_Languages_Extensions.json) containing a list of programming languages file extensions.
# store the URL in url as
# parameter for urlopen
url = "https://gist.githubusercontent.com/ppisarczyk/43962d06686722d26d176fad46879d41/raw/211547723b4621a622fc56978d74aa416cbd1729/Programming_Languages_Extensions.json"
# store the response of URL
response = urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response.read())
extension_str = ""
for item in data_json:
if "extensions" in item:
print(item["extensions"])
for exts in item["extensions"]:
extension_str += exts + '|'
extension_str = extension_str[:-1]
return r"\b[\w\-.]+?\.(?:" + extension_str + r")\b"
def parse_chat(chat): # -> List[Tuple[str, str]]:
# Get all unique filenames
filenames = re.findall(build_regex_from_file(), chat)
# Drop duplicates in case they are mentioned multiple times
filenames = list(dict.fromkeys(filenames))
# Get all ``` (code) blocks
code_matches = re.finditer(r"```(.*?)```", chat, re.DOTALL)
files = []
for i, match in enumerate(code_matches):
# path = match.group(1).split("\n")[0]
path = filenames[i]
# Get the code
code = match.group(1).split("\n")[1:]
code = "\n".join(code)
# Add the file to the list
files.append((path, code))
return files
def to_files(chat, workspace):
workspace['all_output.txt'] = chat
files = parse_chat(chat)
for file_name, file_content in files:
workspace[file_name] = file_content |
@patillacode read my above post with the solution you are looking for. |
@patillacode I think I've cracked this thing. @jebarpg I thought about the problem of ensuring a consistent format when deciding whether or not to improve this parser. My initial thought was "Does better parsing make sense? Can I fix this with a "self-review" call to a LLM? Should I call the LLM just to find what the filenames are?". It is working great and it is supporting every format I've seen, including:
Note: I will use this throughout the day to find possible issues. I'll do a PR later. chat_to_files.py import re
from typing import List, Tuple
# Amount of lines within the code block to consider for filename discovery
N_CODELINES_FOR_FILENAME_TA = 5
# Default path to use if no filename is found
DEFAULT_PATH = 'unknown.txt'
def parse_chat(chat, verbose = False) -> List[Tuple[str, str]]:
'''
Parses a chat message and returns a list of tuples containing
the file path and the code content for each file.
'''
code_regex = r"```(.*?)```"
filename_regex = r'\b[\w-]+\.[\w]{1,6}\b'
# Get all ``` (code) blocks
code_matches = re.finditer(code_regex, chat, re.DOTALL)
prev_code_y_end = 0
files = []
for match in code_matches:
lines = match.group(1).split('\n')
code_y_start = match.start()
code_y_end = match.end()
# Now, we need to get the filename associated with this code block.
# We will look for the filename somewhere near the code block start.
#
# This "somewhere near" is referred to as the "filename_ta", to
# resemble a sort-of target area (ta).
#
# The target area includes the text preceding the code block that
# does not belong to previous code blocks ("no_code").
# Additionally, as sometimes the filename is defined within
# the code block itself, we will also include the first few lines
# of the code block in the filename_ta.
#
# Example:
# ```python
# # File: entrypoint.py
# import pygame
# ```
#
# The amount of lines to consider within the code block is set by
# the constant 'N_CODELINES_FOR_FILENAME_TA'.
#
# Get the "preceding" text, which is located between codeblocks
no_code = chat[prev_code_y_end:code_y_start].strip()
within_code = '\n'.join(lines[:N_CODELINES_FOR_FILENAME_TA])
filename_ta = no_code + '\n' + within_code
# Visualize the filename_ta if verbose
if verbose:
print('-' * 20)
print(filename_ta)
print('-' * 20)
# The path is the filename itself which we greedily match
filename = re.search(filename_regex, filename_ta)
path = filename.group(0) if filename else DEFAULT_PATH
prev_code_y_end = code_y_end
# Parse the entire code block
code = lines[1:]
code = "\n".join(code)
# Add the file to the list
files.append((path, code))
return files
def to_files(chat, workspace):
workspace['all_output.txt'] = chat
files = parse_chat(chat)
for file_name, file_content in files:
workspace[file_name] = file_content |
Ty @jebarpg for the alternatives, just to mention it, the way we want to add code would be through a proper PR. Let's see what @goncalomoita comes up with, if anything. @AntonOsika thoughts on this? |
Speak of the devil... 😂 OK @goncalomoita,this looks promising, we can do a proper review when the PR is up. |
Another format I have seen is **game.py** which this looks like it will work just fine for. |
@goncalomoita definitely agree if we don't have to run another inference then all the better. And your solution captures all file type cases generically so no need to pull from a list of extensions which is one less thing to have to maintain. This is a great solution so far. Looking forward to see how your testings go later today. |
Hi, I'm new here and I perhaps a small thingy; Setting gpt-3.5-turbo-16k in main.py did not work for me either and also just created the all_output.txt file. I was able to fix it though by changing row 16 in Scripts/rerun_edited_message_logs.py file. It also had a fixed reference to GPT-4 causing the filecreations to stop. Is the above code still required then?
and Setup is as:
|
Thanks everyone, these solutions went above and beyond. I have goncalomoita's first solution with the extensions.txt file working on the previous version of GPT-Engineer. The last script they posted didn't work on the same version of GPT-Engineer. I also tried downloading the newest GPT-Engineer build from 3 hours ago, where the .py files are in a subfolder /gpt-engineer/gpt-engineer/ now and neither of goncalomoita's scripts work. If possible could we get this working with the new folder structure update. This is amazing though. I'm still running the older version from yesterday with the extensions.txt script and its working perfectly. |
Updates: I found a code format that broke the function. It was:
For some reason "gpt-3.5-turbo-16k" really likes that one. It has since been solved.
def parse_chat(chat: str, verbose: bool = False) -> List[Tuple[str, str]]:
'''
Parses a chat message and returns a list of tuples containing
the file path and the code content for each file.
'''
code_regex = r'```(.*?)```'
filename_regex = r'\b[\w-]+\.[\w]{1,6}\b'
# Get all ``` (code) blocks
code_matches = re.finditer(code_regex, chat, re.DOTALL)
prev_code_y_end = 0
files = []
for match in code_matches:
lines = match.group(1).split('\n')
code_y_start = match.start()
code_y_end = match.end()
# Now, we need to get the filename associated with this code block.
# We will look for the filename somewhere near the code block start.
#
# This "somewhere near" is referred to as the "filename_ta", to
# resemble a sort-of target area (ta).
#
# The target area includes the text preceding the code block that
# does not belong to previous code blocks ("no_code").
# Additionally, as sometimes the filename is defined within
# the code block itself, we will also include the first few lines
# of the code block in the filename_ta.
#
# Example:
# ```python
# # File: entrypoint.py
# import pygame
# ```
#
# The amount of lines to consider within the code block is set by
# the constant 'N_CODELINES_FOR_FILENAME_TA'.
#
# Get the "preceding" text, which is located between codeblocks
no_code = chat[prev_code_y_end:code_y_start].strip()
within_code = '\n'.join(lines[:N_CODELINES_FOR_FILENAME_TA])
filename_ta = no_code + '\n' + within_code
# The path is the filename itself which we greedily match
filename = re.search(filename_regex, filename_ta)
path = filename.group(0) if filename else DEFAULT_PATH
# Visualize the filename_ta if verbose
if verbose:
print('-' * 20)
print(f'Path: {path}')
print('-' * 20)
print(filename_ta)
print('-' * 20)
# Check if its not a false positive
#
# For instance, the match with ```main.py``` should not be considered.
# ```main.py```
# ```python
# ...
# ```
if not re.fullmatch(filename_regex, '\n'.join(lines)):
# Update the previous code block end
prev_code_y_end = code_y_end
# File and code have been matched, add them to the list
files.append((path, '\n'.join(lines[1:])))
return files @mindwellsolutions I'll check the new build now and start working on the PR. I also make a few customizations... May I suggest the creation of a global script (bash or something) to invoke gpt-engineer from any location in your OS. It's insane lmao! |
@goncalomoita Also I just found out that we might not even need this if we just change the model inside gpt-engineer/scripts/rerun_edited_message_logs.py |
new code not work...... |
Can you elaborate? What issues are you encountering? In my fork I have this working for the pre-package build (branch:initial) and the post-package (branch:main). |
Addressed in #120 |
Hi, I absolutely love this script. This is the most accurate auto-GPT development script I have tried yet, it's so powerful!
In the demo video it shows the script creating each of the development files, in my case .py files within the workspace folder automatically. My build isn't doing this I just get an all_output.txt file with all .py files codes in one place and a single python file.
How do I ensure that GPT-Engineer automatically creates the .py files for me. Thanks
The text was updated successfully, but these errors were encountered: