-
-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Store initial implementation #830
Vector Store initial implementation #830
Conversation
Apply Sweep Rules to your PR?
|
Amazing job. I will take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, great work.
Let me summarize to make sure I understand:
vi
option searches for code files whose vector embedding fits the user prompt, and adds those to the llm prompt- the vector db is built by (a) parsing each code file into its AST, (3) traveling the AST and (4) splitting each node into chunk (if node > max len, then we split the node into multiple chunks
- to find code chunks we use llama index
Looks good to me!
What I'd change though is the directory to XML mapping, which seem overly complex to me. I think me can make it a lot simpler, see https://gist.github.com/UmerHA/a0845f17325f07c554a6dada9fc0cab2.
Sounds about right to me yeah thanks for the review @UmerHA ! |
I've updated now to not include some contentious code that wasnt required for this work . Also added more tests |
Ah @UmerHA i just saw the link at the end of this thanks a lot for the contribution - i didn't include it because i didn't see it but happy to use this instead of my list of paths if you think it's an improvement? Basically i'm really not sure how we should approach 'big context' as well as the 'small context' and this is just a start. It possibly adds no value today but hopefully we can iteratively improve Aiders approach sends a map of method signatures and orders it to send only the most important methods which sounds super powerful. It seems to me without method signatures or the ability to delve deeper into files providing the wider context of files is somewhat useless. Maybe it should be removed entirely - it's pretty much a placeholder right now Anyway the XML is gone for now but lots of room to improve in future so do please contribute |
Very good job with this. It looks good to me. I think it is a good starting point, and we can work off of this initial work. |
I'm happy to merge as it's had a few reviews and it shouldn't have any impact on current functionality but will wait for @ATheorell And @captivus who have said they want to take a look when they get time |
This is a first pass at using a vector store to automatically retrieve files from a code repository based on their relevance to the prompt.
Theres a lot more work to do in this area, but I think the scope of this work is enough to merge on its own pending review etc. It works for me for at least small use cases like snake, so seems be a good place to hand over to others to have a play around with?
Some ideas of what to do next: