-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copilot Chat: Dotnet AzureTextEmbeddingGeneration is extremely slow #1952
Comments
Hello @hooyao, Thank you for reporting the issue! The Copilot Chat app can potentially make a few calls to the embedding model to generate multiple embeddings for each message. We are actively working on improving the performance. However, 100s still sounds abnormal. Could you let me know how long the conversation had been going when you see the increase in time? |
@hooyao I was able to reproduce the problem. Indexing simple text files containing only a handful of sentences takes 10 to 15 seconds. Investigating what could be the root cause. |
So I eliminated other calls and network factors and it seems the bulk of the time spent saving embeddings is spent within the call to kernel.Memory.SaveInformationAsync() (in this case using (AzureTextEmbeddingGeneration + VolatileMemoryStore). I then used Fiddler to confirm my hunch that most of that time, we are busy establishing a TLS connection to the Azure Open AI server (7.6 seconds in my case). Once the TLS tunnel is set up, the actual call itself takes minimal time (0.2 seconds in my case). I believe we create a connection from scratch every time we connect to the service to create embeddings. That would could us a lot of time as opposed to maintaining the same connection to Azure Open AI. @awharrison-28 Any thoughts on this? Looks like we aren't re-using an HttpClient in the SK when maybe we should! |
I confirm slowness when using 06-memory-and-embeddings.ipynb in semantic-kernel samples against Azure OpenAI.
ran for 45s
I assume that volatile memory should be pretty much instant while saving embedding and time is most likely spend on embedding generation. |
I'm looking into how we could re-use HttpClient instances in order to minimize HTTPS tunnel creation: |
Thank you @glahaye @vgpro54321 for helping to reproduce the issue. My concern is even the httpclient is not reused, the observed behavior is not expected. I tried to inject a dedicated httpclient to |
@hooyao I was just about to try what you did? Are you saying that even with a dedicated (singleton?) HttpClient given to AzureTextEmbeddingGeneration, you experienced the HTTPS set up delay? Indeed, injecting HttpClient is not the ideal solution. I'll see if I can get some traction on the IHttpClientFactory (which is usually the way to go)... |
I noticed that quota for embedding model was very limited. Bumping the limits for quota increased the speed, so now it takes 2.5 s to get 10 embeddings in the notebooks, which is allright. Framework retries accessing endpoint after a wait of a few seconds because it has retry handler on HttpClient. This also contributes to the slowness. |
I noticed this was closed so I opened a new item in the new chat-copilot repo to pursue this further: |
Describe the bug
A clear and concise description of what the bug is.
When experimenting with sample
copilot-chat-app
, usingMemoriesStore:Type:volatile
, and configAIService:Type:AzureOpenAI
with embedding modeltext-embedding-ada-002
, a single "hello" chat can take 100s to complete. When switched to Azure Cognitive Search, the same chat could be done in 6s.I was using a Azure openai deployment for testing purposes, the model is clearly not throttled.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I tried to manually adding index following notebook sample 06-memory-and-embeddings.ipynb, it takes 45 seconds to index and save 5 simple sentences to memory store. Each embedding api call takes around 10s, I also tried the same embedding model from the same azure openai endpoint with langchain, it takes only 0.2s for each request.
The expected behavior is that generating a single embedding should take less than 1s instead of 10s to complete.
Screenshots
![image](https://private-user-images.githubusercontent.com/659145/252712773-d2bd8000-6949-4233-8958-d7fe74e55624.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk2ODkwNTksIm5iZiI6MTczOTY4ODc1OSwicGF0aCI6Ii82NTkxNDUvMjUyNzEyNzczLWQyYmQ4MDAwLTY5NDktNDIzMy04OTU4LWQ3ZmU3NGU1NTYyNC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjE2JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIxNlQwNjUyMzlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT04YzE3NTYwNzViYWExZDhjM2EwODZiYTU4YTE5MjgyZWQ0MWRhYjE1YTgyZjNlNDk3YzQ4OWU4OWUwYzQ1YzU5JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.z3NgP2Fx00JgqV_IJ-ruqEYf5majzsDCofnSIi_4GqM)
If applicable, add screenshots to help explain your problem.
copilot-chat-app
is super slow with Azure Openai embeddingPlatform
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: