You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to create cluster via community_detection, total I've more than 10M sentences.
The embedding/enconding generation is pretty fast and it works like a charm.
The problems is the process for the community_detection clustering.
In this case, the process (even with larger batch_size) is really slow.
I'm trying to figure out a way to speed up the process and to make it fit in RAM.
Do you have any suggestions? Guidelines or solutions?
The text was updated successfully, but these errors were encountered:
Clustering 10M embeddings is hard, as it computes 10M * 10M scores.
I would reduce the vector space, e.g. first run k-means (e.g. with faiss) to break the vector space in 10-100 smaller spaces. Then cluster each space individually.
#2381 should improve the efficiency of calling community_detection on GPU. It will be included in the next release. Hopefully that helps with your issue. Beyond that, Nils' suggestion is good as well, as it would still need to compute 10M * 10M scores, which will just always be somewhat slow.
Hi everyone!
I'm trying to create cluster via
community_detection
, total I've more than 10M sentences.The embedding/enconding generation is pretty fast and it works like a charm.
The problems is the process for the community_detection clustering.
In this case, the process (even with larger batch_size) is really slow.
I'm trying to figure out a way to speed up the process and to make it fit in RAM.
Do you have any suggestions? Guidelines or solutions?
The text was updated successfully, but these errors were encountered: