-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider increasing the dimension limit for vector fields. #40492
Comments
Pinging @elastic/es-search |
Thanks @jtibshirani I don't see a problem of increasing the number of dimensions to @jpountz do you see any problems for |
This sounds good to me. I care more about the fact that there is a reasonable limit than about the actual value of the limit. |
Hi, @mayya-sharipova! For example mobilenet_v2 which produces 1280d vectors, as pointed out by @etudor in issue SthPhoenix/elastik-nearest-neighbors-extended#4 |
@SthPhoenix what should be a reasonable dims limit? |
Actually I'm not sure, 1280d is largest embedding I've seen so far for common models. |
Thanks! Hope this would be enough ) |
Hello guys, its possible to inscrease to 3072 dims? |
@gabrielcustodio We have not encountered models or use case that require more than 2048 dims. Can you please describe your use-case or models that need this big number of dims? |
I used this model Actually the output is 3 layers with 1024 dims. 3x1024 Source: flairNLP/flair#886 I load this model using flair library and then extract the embeddings. |
Flair stacked embeddings (forward, backward, glove/flair) would produce vectors of more than 4096. |
OpenAI / GPT-3 uses the following embedding sizes: Getting support for at least Curie's embeddings would probably be a good idea. I understand that Davinci is extremely large but this technical limitation is probably going to force us to look for alternative solutions to Elastic Cloud if 4096 dimensions cannot be supported. |
It's 2022 , maybe its time to increase number of dimensions? |
Also adding references to the Lucene discussions about increasing dims and why it may not be a good idea:
|
The
dense_vector
andsparse_vector
fields place a hard limit of 500 on the number of dimensions per vector. However, many of the common pretrained text embeddings like BERT, ELMo, and Universal Sentence Encoder produce vectors of larger dimensions, typically ranging from 512 to 1024.Currently users must truncate the vectors, or perform an additional dimensionality reduction step. Perhaps we could make the dimension limit configurable, or at least increase it to a larger value?
The text was updated successfully, but these errors were encountered: