-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request - load config files from inside of UI #58
Comments
Thanks for the suggestion, I will try to have a go at it next week! |
Hi, Please check the initial implementation here - https://github.com/snexus/llm-search/tree/feature-webapp-load-configs This feature is simple on the surface, however, extra care should be taken with GPU memory. Since we are not exiting the app between the reloads, all objects taking the GPU memory (old configuration) have to be cleared before the new configuration can be loaded. I've tested it with locally hosted models - it appears to be working. Caveat though - GPU memory can spike temporarily when you load a new config due to the fact that the garbage collector in Python might take a bit of time to get rid of the old object. Edit: |
excellent, will do! |
once again, I applaud your work and how fast you turn things around. I did a good test by first loading a 13B model that fits easily in my 48GB of VRAM. No problem. You can see it almost fills my memory completely. |
also, when I ran the same query (same question) after switching from the local llm to GPT-4, it just repeated the exact same answer from the local llm. It didn't re-run the query using GPT-4 to get a new response. |
Thanks for the extensive testing, really appreciate it. Can confirm the problem - as you mentioned, it can be clearly seen when loading offline followed by an OpenAI model - the memory should drop significantly. Will need to investigate deeper and check what |
Believe the memory issue is fixed at least for If you repeat the following experiment again, please pay attention that there should be still some amount (around 1-2GB) of GPU memory used, by the embedding model.
|
Thanks for confirming about the memory, you helped make this project better! I will fix the app crash bug due to multiple load button click. |
I've pushed a new version, please check if it works for you. The solution isn't as elegant as I hoped it to be due to the tricky nature of Streamlit's event loop. If you are quick enough to press multiple times before the model starts to load, the loading process will stop. In that case, you might need to press the button again to load the model. The app shouldn't crash though when pressing twice or more. As you probably noticed, there are some limitations with the Streamlit front-end. In the future, I might rewrite it in a different framework, such as Gradio or NiceGUI. Or perhaps there is an option to offload model handling to 3rd party, e.g. oobabooga via API, so can focus more on the RAG logic itself... |
yeah, I do admit that offloading the handling of the models & just focusing on the RAG would probably be a good pivot to make at some point. I'm using llama.cpp right now because your template supports it well. I think you have other model loaders but it's such a fast moving environment its hard to keep up. I personally try to use exllama2 whenever I can (exl2 models), which I'm not sure you're currently supporting. Also, with the different models, you have to take into account what prompt template to use (you use an alpaca template by default, many models use other formats), there's just so many factors to consider when you start dealing with local models. I see the RAG space is rapidly evolving too, so a singular focus there while, as you said, offloading the model handling to something else, sounds logical. But anyway, I don't want to sound negative. You fixed the bug from what I can see. I clicked the "load" button multiple times as a model was loading and it didn't crash. Your worst case scenario of having to press "load" again sounds like a rare edge case, and even then it doesn't cause the program to crash. Thank you again for implementing my requested features. This really does exactly what I need it to do now. In fact, I've found that with a 70B model (synthia v1.5), I get better answers to my queries than I do using GPT-4! Quite amazing what a good local LLM enhanced with a good RAG implementation can achieve. Keep up the good work. |
Completely agree, focusing mostly on RAG will be the best way forward. I will investigate what's the best way to integrate with existing solutions to handle the local models. |
hmmm... I tested this today and strangely, as I tried to load different models, it kept reloading the same model over and over, despite choosing different configs. Can you please confirm if you have the same behavior? |
Can you describe under which circumstances it is happening? I’ve used it yesterday, it was reloading it properly. |
it happened by simply loading the app the same way I always had in the past. it's possible that when I called the issue resolved, I only quickly tested that it didn't crash hitting the button multiple times, but never checked proper loading between config files again. I went as far as to delete my original version of the app, pull the latest version from git, then re-build the venv and try again. It still has the same issue with pressing the "load" button just reloads the same config file over again. So I did another test by reverting the When I use this version of So not sure why I'm having different results. Ultimately, we're using the same source files, so I can't see how it's only affecting me not you. If not too much to ask, can you try rebuilding from the latest like I did and see if it still works for you? In the meantime, I can see if I can troubleshoot and find the problem myself, though I may get to it tomorrow. But I can confirm that clearing the app completely & re-building it, I still have the same problem. Thanks again. |
Thanks for the details, I will rebuild from scratch and test as you suggested. |
Using a clean environment, can confirm the bug. Please test the fix in https://github.com/snexus/llm-search/tree/fix-webapp-config-reloading, will merge it to the main if it works for you. |
yep, all good now! I spent extra time doing various tests. Trying different prompts, loading different llms, and spent some time with it making sure I covered my normal use cases. |
Hi @snexus, I'd like to re-open this one, because I found an issue. It's one that we didn't really anticipate, but I really think it would be helpful to be added. Thanks! |
Good suggestion, the current approach is indeed not optimal from the UX perspective. Will try to implement it this or next week. |
Please check the implementation here - https://github.com/snexus/llm-search/tree/webui-embeddings-generation The current implementation will expose the embeddings generation only if embeddings aren't detected - in order not to clutter the UI with functionality that is used only once, at the beginning. If the user needs to recreate the index - can simply delete the embedding folder |
Excellent! I will test it today. I actually found another bug with this implementation yesterday. See if you can recreate it. I know this is not so easy to explain so I hope I didn't confuse you. But in summary, changing configuration files inside the UI, while it does change the llm correctly, it doesn't respect changes in document path. Can you look into this? Even though it was difficult to explain, I have a feeling this is probably an easy fix. But in the meantime, I will also test the update you just made too. Thanks for being open to my suggestions! I am now trying to deploy this for a research project which is how I'm finding all these little issues. |
Are you pressing "Load" when selecting the second ( 'c:\docs\protein') configuration? There is no way around it - just selecting it without pressing the button won't load it automatically (it was the initial approach, but quite fragile to implement in Streamlit). Configurations are considered independent, with own settings, model etc - so have to be reloaded completely, not just pointing to a new embedding folder.
No worries at all. Thanks for finding and reporting bugs, and suggesting useful features. |
Yes of course, I am actually loading the configuration. As I said, it loads fine, and it will change llms correctly, but if I query the app for something that should be in the protein folder, per the configuration, it will give me a result with a negative score and all the sources it lists point to files from the 'hydrogen' folder, which was loaded in the previous configuration. |
let me see if I can quickly recreate it again... |
Thanks for the detailed explanation, will try to reproduce it on my side.
You definitely shouldn't need to update embeddings after changing the doc path |
Unfortunately couldn't reproduce the issue - I followed the exact steps with 2 document sets/configs. Having a hypothesis though - are you able to share the configs you used? Is it possible that |
`cache_folder: /home/hisma/rag/cache ## specify a cache folder for embeddings models, huggingface and sentence transformers embeddings: embedding_model: # Optional embedding model specification, default is e5-large-v2. Swap to a smaller model if out of CUDA memory splade_config: # Optional batch size of sparse embeddings. Reduced if getting out-of-memory errors on CUDA. chunk_sizes: # Specify one more chunk size to split (querying multi-chunk results will be slower) document_settings:
semantic_search: Will ensure that context provided to LLM is less than max_char_size. Useful for locally hosted models and limited hardware.Reduce if out of CUDA memory.max_char_size: 4096 persist_response_db_path: "/home/hisma/rag/responsedb/response.db" # optional sqlite database filename. Allows to save responses offlien to sqlite, for future analysis. ############ An example how to use OpenAI model, requires .env file with the OpenAI key
thats my standard config when using chatgpt. Yes |
Yes, the embedding path should be unique per config file. That's where the vector database embeddings are generated for the specific document collection. Using the same Perhaps worth mentioning in the readme... |
Ahh... that's very good to know! Yeah, definitely worth mentioning in the readme. I will update my configurations with unique embeddding paths for each file and hopefully that works. I'll test it later and report the results, then we can close this again. |
tested. It worked as you suggested. And it makes total sense that you'd want different embeddings paths for different sub-topics. Please close this and you can merge this change. |
Thanks for confirming. By the way - in case you do need to have multiple sub-topics in the same config file (and same embeddings path) - it is already supported. You can specify multiple document paths in the config, pointing to different source folders and scan settings, as shown below. The advantage of doing that - you can filter by label in the UI, rather than loading a new configuration file, but for large collections, it might work slower. |
wow I did not know about that either! And that would make the "document collection" tag actually useful since you can filter by tag! I always just had one per document so I didn't think the feature had any practical use. Now it makes sense. |
OK I have one more request I'm hoping you can implement for this wonderful app.
The way I use the app, and I imagine others would too, is I am testing various models & verifying the performance to see which gives best results.
The way I do this is create a config.yaml file for each model I use that is set up as needed.
I then created a bash script that runs the webapp with the different config files as an argument, depending on what model I want to test.
If you had this feature inside of the UI it would be extremely handy and I can see this as something other users would appreciate too.
A summary
This is really the only "quality of life" feature this app is missing for me, since I have about 4 different config files I switch between because I want to test the various llms performance against my questions.
Hope this is possible!
The text was updated successfully, but these errors were encountered: