Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get a simple example to work with Llamasharp and Semantic Kernel #186

Open
bancroftway opened this issue Sep 29, 2023 · 7 comments
Labels
question Further information is requested

Comments

@bancroftway
Copy link

bancroftway commented Sep 29, 2023

I am using packages LLamasharp 0.5.1, LLamaSharp.semantic-kernel 0.5.1, and Microsoft.SemanticKernel.Core 0.24.230918.1-preview. In a very simple example inspired by this video, I am unable to get any results. Please advise what am I doing wrong?

static async Task Main(string[] args) {
     var modelPath = @"E:\Projects\Scratchpad\HuggingFace\llama-2-7b-guanaco-qlora.Q2_K.gguf";

     var seed = 1337;
     // Load weights into memory
     var parameters = new ModelParams(modelPath)
     {
         Seed = seed,
         EmbeddingMode = true
     };

     using var model = LLamaWeights.LoadFromFile(parameters);
     var embedding = new LLamaEmbedder(model, parameters);

     var kernel = Kernel.Builder
         .WithMemoryStorage(new VolatileMemoryStore())
         .WithAIService<ITextEmbeddingGeneration>("local-llama-embed", new LLamaSharpEmbeddingGeneration(embedding), true)
     .Build();

     var memories = new Dictionary<string, string>
     {
         {"rec1", "My name is Andy" },
         {"rec2", "I currently work as a tour guide" },
         {"rec3", "I have been living in Seattle since 2005" },
         {"rec4", "I have visited France and Italy five times since 2015" },
         {"rec5", "My family is from New York" },
     };

     foreach (var memory in memories)
     {
         await kernel.Memory.SaveInformationAsync(collection:"aboutme", id:memory.Key, text:memory.Value);
     }

     var query = "what is my name?";
     var results = kernel.Memory.SearchAsync("aboutme", query, 2);
     await foreach (var result in results)
     {
         Console.WriteLine(result.Metadata.Text);
         Console.WriteLine(result.Relevance);
     }


     query = "what do I do for work?";
     results = kernel.Memory.SearchAsync("aboutme", query, 2);
     await foreach (var result in results)
     {
         Console.WriteLine(result.Metadata.Text);
         Console.WriteLine(result.Relevance);
     }
 }
@negatron99
Copy link

Hi, using your code as source, I had to change the minRelevanceScore in SearchAsync to 0.1, the relevance it is returning is 0.203498270155788 for the "My name is Andy" result.

@bancroftway
Copy link
Author

@negatron99 thanks for spotting this.

In your opinion, is the performance acceptable? For the second question, even setting the minrelevance to 0.1, does not result in correct memory being recalled. Is this a model issue, or an issue related to SK integration?

@negatron99
Copy link

@bancroftway I got that too, the questions often returned with the expected answer in 2nd place.

I'm running an AI locally, and I think that has a severe impact because of the size of model I'm using.

(note: I'm new to this)

@AsakusaRinne
Copy link
Collaborator

@xbotter Could you please have a quick review for this issue to see if it's a problem of semantic-kernel integration or LLamaSharp it self?

@xbotter
Copy link
Collaborator

xbotter commented Nov 5, 2023

I think it is closely related to the model. I tried the following 5 models, and the similarity results obtained are as follows.

─────────────────────────── llama-2-13b.Q5_K_S.gguf ────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec4          │ rec3          │ rec1          │ rec5          │ rec2         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.60397837355 │ 0.57756964179 │ 0.56708705753 │ 0.56642546454 │ 0.5238228503 │
│ 65276         │ 15415         │ 12752         │ 61733         │ 507946       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec4          │ rec3          │ rec1          │ rec5          │ rec2         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.63661711800 │ 0.62127687438 │ 0.59923824071 │ 0.59725856344 │ 0.5503394661 │
│ 78882         │ 7576          │ 91169         │ 73778         │ 753722       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
──────────────────────────── llama-2-13b.Q4_0.gguf ─────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec4          │ rec3          │ rec5          │ rec1          │ rec2         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.59911805798 │ 0.57109881567 │ 0.56804939780 │ 0.56242585396 │ 0.5194026572 │
│ 635           │ 18331         │ 57511         │ 2513          │ 582072       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec4          │ rec3          │ rec5          │ rec1          │ rec2         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.64371360174 │ 0.62165431068 │ 0.60384414156 │ 0.59154748210 │ 0.5519506179 │
│ 84283         │ 41886         │ 33206         │ 02391         │ 57862        │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
───────────────────────────── llama-2-7b.Q6_K.gguf ─────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec1          │ rec2          │ rec3          │ rec5          │ rec4         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.18273758637 │ -0.2513613774 │ -0.3654693736 │ -0.3870037852 │ -0.399781303 │
│ 271353        │ 0550846       │ 230258        │ 883792        │ 434732       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec5          │ rec3          │ rec4          │ rec2          │ rec1         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.38100557980 │ 0.26378284664 │ 0.23677382416 │ 0.15605442982 │ 0.1064241094 │
│ 647803        │ 020213        │ 063356        │ 68359         │ 2596608      │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
──────────────────────────── llama-2-7b.Q5_K_S.gguf ────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec1          │ rec2          │ rec3          │ rec5          │ rec4         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.21707250428 │ -0.2340426771 │ -0.3586767363 │ -0.3804505754 │ -0.396052113 │
│ 374697        │ 6418634       │ 888217        │ 365624        │ 3365741      │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec5          │ rec3          │ rec4          │ rec2          │ rec1         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.41390582241 │ 0.30484629116 │ 0.28162591944 │ 0.17762617968 │ 0.0909184078 │
│ 379037        │ 732204        │ 60438         │ 673447        │ 6347069      │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
───────────────────────────── llama-2-7b.Q4_0.gguf ─────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec1          │ rec2          │ rec3          │ rec5          │ rec4         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.25795902034 │ -0.2473741335 │ -0.3636082114 │ -0.3706191111 │ -0.391008143 │
│ 45245         │ 8757243       │ 20225         │ 105214        │ 99201617     │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec5          │ rec3          │ rec4          │ rec2          │ rec1         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.35964302792 │ 0.24689123570 │ 0.21876425990 │ 0.13875857455 │ 0.0958155428 │
│ 416625        │ 060933        │ 23262         │ 840124        │ 1466385      │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘

As a comparison, here are the results generated using OpenAI.

──────────────────────────── text-embedding-ada-002 ────────────────────────────
                                what is my name?                               
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec1          │ rec2          │ rec5          │ rec3          │ rec4         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.84728719568 │ 0.77466556646 │ 0.77053716662 │ 0.74477922802 │ 0.7109783032 │
│ 99114         │ 17795         │ 28756         │ 63698         │ 151401       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘
                             what do I do for work?                            
┌───────────────┬───────────────┬───────────────┬───────────────┬──────────────┐
│ rec2          │ rec1          │ rec3          │ rec5          │ rec4         │
├───────────────┼───────────────┼───────────────┼───────────────┼──────────────┤
│ 0.82378054709 │ 0.76084283155 │ 0.74403286934 │ 0.73862394434 │ 0.7086218716 │
│ 23869         │ 11899         │ 64854         │ 2763          │ 424992       │
└───────────────┴───────────────┴───────────────┴───────────────┴──────────────┘

I am using LLamaSharp.semantic-kernel 0.7.0, and Microsoft.SemanticKernel.Core 1.0.1-beta.Compared to version 0.5.1, there is not much change in terms of logic.
However, @AsakusaRinne I found that the embeddings generated using GPU and CPU are different, but the difference is not significant.

@AsakusaRinne
Copy link
Collaborator

However, I found that the embeddings generated using GPU and CPU are different, but the difference is not significant.

There're different optimization strategies on cpu and cuda so that slight difference is okay.

@bancroftway @negatron99 Would you like to try it with v0.7.0. I remember that there were some fix in semantic-kernel integration package after v0.5.1.

@martindevans martindevans added the question Further information is requested label Nov 6, 2023
@xbotter
Copy link
Collaborator

xbotter commented Nov 7, 2023

This is another comparison result from the bge-large-en-v1.5 model.

                        what is my name?
┌───────────┬───────────┬────────────┬────────────┬────────────┐
│ rec1      │ rec5      │ rec2       │ rec3       │ rec4       │
├───────────┼───────────┼────────────┼────────────┼────────────┤
│ 0.6125676 │ 0.4444277 │ 0.44250283 │ 0.40801963 │ 0.31189275 │
└───────────┴───────────┴────────────┴────────────┴────────────┘
                     what do I do for work?
┌────────────┬───────────┬────────────┬────────────┬────────────┐
│ rec2       │ rec3      │ rec5       │ rec1       │ rec4       │
├────────────┼───────────┼────────────┼────────────┼────────────┤
│ 0.61269283 │ 0.4735341 │ 0.45722228 │ 0.42069164 │ 0.39852148 │
└────────────┴───────────┴────────────┴────────────┴────────────┘

The result seems relatively better. But the model is base BERT, unable to convert into a usable format for llama.cpp at the moment. I used bert.cpp for processing.
According to the issue ggerganov/llama.cpp#2872, llama.cpp has startd working. Looking forward to the final completion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants