Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952

hooyao · 2023-07-11T15:31:40Z

Describe the bug
A clear and concise description of what the bug is.
When experimenting with sample copilot-chat-app, using MemoriesStore:Type:volatile, and config AIService:Type:AzureOpenAI with embedding model text-embedding-ada-002, a single "hello" chat can take 100s to complete. When switched to Azure Cognitive Search, the same chat could be done in 6s.

I was using a Azure openai deployment for testing purposes, the model is clearly not throttled.

To Reproduce
Steps to reproduce the behavior:

Go to '...'
Click on '....'
Scroll down to '....'
See error

Expected behavior
I tried to manually adding index following notebook sample 06-memory-and-embeddings.ipynb, it takes 45 seconds to index and save 5 simple sentences to memory store. Each embedding api call takes around 10s, I also tried the same embedding model from the same azure openai endpoint with langchain, it takes only 0.2s for each request.

The expected behavior is that generating a single embedding should take less than 1s instead of 10s to complete.

Screenshots
If applicable, add screenshots to help explain your problem.
copilot-chat-app is super slow with Azure Openai embedding

Platform

OS: Window and devcontainer
IDE: VS Code
Language: C#
Source: NuGet package version 0.17.230704.3-preview

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

TaoChenOSU · 2023-07-20T20:21:49Z

Hello @hooyao,

Thank you for reporting the issue!

The Copilot Chat app can potentially make a few calls to the embedding model to generate multiple embeddings for each message. We are actively working on improving the performance.

However, 100s still sounds abnormal. Could you let me know how long the conversation had been going when you see the increase in time?

glahaye · 2023-07-26T00:15:45Z

@hooyao I was able to reproduce the problem. Indexing simple text files containing only a handful of sentences takes 10 to 15 seconds.

Investigating what could be the root cause.

glahaye · 2023-07-26T01:55:16Z

So I eliminated other calls and network factors and it seems the bulk of the time spent saving embeddings is spent within the call to kernel.Memory.SaveInformationAsync() (in this case using (AzureTextEmbeddingGeneration + VolatileMemoryStore).

I then used Fiddler to confirm my hunch that most of that time, we are busy establishing a TLS connection to the Azure Open AI server (7.6 seconds in my case). Once the TLS tunnel is set up, the actual call itself takes minimal time (0.2 seconds in my case).

I believe we create a connection from scratch every time we connect to the service to create embeddings.

That would could us a lot of time as opposed to maintaining the same connection to Azure Open AI.

@awharrison-28 Any thoughts on this? Looks like we aren't re-using an HttpClient in the SK when maybe we should!

vgpro54321 · 2023-07-30T13:18:07Z

I confirm slowness when using 06-memory-and-embeddings.ipynb in semantic-kernel samples against Azure OpenAI.

const string MemoryCollectionName = "aboutMe";

await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info1", text: "My name is Andrea");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info2", text: "I currently work as a tourist operator");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info3", text: "I currently live in Seattle and have been living there since 2005");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info4", text: "I visited France and Italy five times since 2015");
await kernel.Memory.SaveInformationAsync(MemoryCollectionName, id: "info5", text: "My family is from New York");

ran for 45s

await Chat("Hello, I think we've met before, remember? my name is...");
ran for 2m 25 s.

I assume that volatile memory should be pretty much instant while saving embedding and time is most likely spend on embedding generation.

glahaye · 2023-07-31T03:33:03Z

I'm looking into how we could re-use HttpClient instances in order to minimize HTTPS tunnel creation:
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/http-requests?view=aspnetcore-7.0#consumption-patterns

hooyao · 2023-07-31T06:22:32Z

Thank you @glahaye @vgpro54321 for helping to reproduce the issue. My concern is even the httpclient is not reused, the observed behavior is not expected. I tried to inject a dedicated httpclient to AzureTextEmbeddingGeneration, no luck.
And to be honest, the httpclient part of SK are filled with hand crafted code, instead of following the "normal" practice, for example using Polly to make industry-level retry policy instead of DefaultHttpRetryHandler, it a waste of time, you need more test coverage, and sometimes buggy. For httpclient best practice, I preferer injecting IHttpClientFactory instead of an httpclient instance.

glahaye · 2023-07-31T17:35:53Z

@hooyao I was just about to try what you did? Are you saying that even with a dedicated (singleton?) HttpClient given to AzureTextEmbeddingGeneration, you experienced the HTTPS set up delay?

Indeed, injecting HttpClient is not the ideal solution. I'll see if I can get some traction on the IHttpClientFactory (which is usually the way to go)...

vgpro54321 · 2023-07-31T22:01:40Z

I assume that volatile memory should be pretty much instant while saving embedding and time is most likely spend on embedding generation.

I noticed that quota for embedding model was very limited. Bumping the limits for quota increased the speed, so now it takes 2.5 s to get 10 embeddings in the notebooks, which is allright.

Framework retries accessing endpoint after a wait of a few seconds because it has retry handler on HttpClient. This also contributes to the slowness.

glahaye · 2023-09-05T21:47:59Z

I noticed this was closed so I opened a new item in the new chat-copilot repo to pursue this further:
microsoft/chat-copilot#302

shawncal added python Pull requests for the Python Semantic Kernel triage labels Jul 11, 2023

shawncal changed the title ~~Dotnet AzureTextEmbeddingGeneration is extremely slow~~ Python: Dotnet AzureTextEmbeddingGeneration is extremely slow Jul 11, 2023

hooyao changed the title ~~Python: Dotnet AzureTextEmbeddingGeneration is extremely slow~~ C#: Dotnet AzureTextEmbeddingGeneration is extremely slow Jul 11, 2023

shawncal changed the title ~~C#: Dotnet AzureTextEmbeddingGeneration is extremely slow~~ Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow Jul 17, 2023

TaoChenOSU self-assigned this Jul 20, 2023

TaoChenOSU removed the triage label Jul 20, 2023

TaoChenOSU added this to Apps & Services Semantic Kernel Jul 20, 2023

TaoChenOSU moved this to In progress in Apps & Services Semantic Kernel Jul 20, 2023

TaoChenOSU added kernel Issues or pull requests impacting the core kernel copilot chat labels Jul 20, 2023

TaoChenOSU assigned glahaye Jul 24, 2023

alliscode closed this as completed Sep 5, 2023

github-project-automation bot moved this from In progress to Done in Apps & Services Semantic Kernel Sep 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952

Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952

hooyao commented Jul 11, 2023 •

edited

Loading

TaoChenOSU commented Jul 20, 2023

glahaye commented Jul 26, 2023

glahaye commented Jul 26, 2023

vgpro54321 commented Jul 30, 2023

glahaye commented Jul 31, 2023

hooyao commented Jul 31, 2023 •

edited

Loading

glahaye commented Jul 31, 2023

vgpro54321 commented Jul 31, 2023

glahaye commented Sep 5, 2023

Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952

Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952

Comments

hooyao commented Jul 11, 2023 • edited Loading

TaoChenOSU commented Jul 20, 2023

glahaye commented Jul 26, 2023

glahaye commented Jul 26, 2023

vgpro54321 commented Jul 30, 2023

glahaye commented Jul 31, 2023

hooyao commented Jul 31, 2023 • edited Loading

glahaye commented Jul 31, 2023

vgpro54321 commented Jul 31, 2023

glahaye commented Sep 5, 2023

hooyao commented Jul 11, 2023 •

edited

Loading

hooyao commented Jul 31, 2023 •

edited

Loading