Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: EmbeddingsBuilder redesign #52

Closed
11 of 14 tasks
marieaurore123 opened this issue Oct 8, 2024 · 1 comment · Fixed by #120
Closed
11 of 14 tasks

feat: EmbeddingsBuilder redesign #52

marieaurore123 opened this issue Oct 8, 2024 · 1 comment · Fixed by #120
Assignees
Labels
design External API Design
Milestone

Comments

@marieaurore123
Copy link
Contributor

marieaurore123 commented Oct 8, 2024

Feature Request

Any object that implements the trait Embedabble can be added to the embeddings builder and therefore can be embedded.

Motivation

Currently, embeddings builder is tightly coupled with the DocumentEmbedding type (only objects of this type can be used by the builder). Downsides: user needs to type cast their types to DocumentEmbedding and the structure of DocumentEmbedding is predetermined so the user needs to design their embedding code around this.

Proposal

  1. Define the trait Embedabble which contains a method fn embedabble() which returns a list of strings (the strings that should be embedded).
  2. Change the EmbeddingsBuilder to accept any type that implements the trait Embedabble.
  3. Write a custom derive macro that can be used on any struct which derives the Embedabble trait automatically.

Linked issues:

#60
#61
#62

Todo:

  • Give trait and macro same name like serde

  • implement Embeddable trait for many primitive types

  • macro fails when there are no embed tags

  • macro fails when embed with is empty string

  • need to handle error in custom embed function

  • review borrow and owning eveywhere. done for embeddings.rs

  • update docstrings. done for embeddings.rs

  • clean up embedding builder definition with traits maybe?

  • should build() return Embedding object or just string? return Embedding object - document string used for debugging

  • in memory vector store, rethink type of document D --> move to another branch

  • why is vector search of in memory vector store so bad? bug found

  • in memory store - implement from trait from both returns types of build into add_document

  • tests

  • make macro as feature flag

@marieaurore123 marieaurore123 self-assigned this Oct 8, 2024
@marieaurore123 marieaurore123 added the design External API Design label Oct 8, 2024
@marieaurore123 marieaurore123 changed the title refactor: EmbeddingsBuilder feat: EmbeddingsBuilder redesign Oct 8, 2024
@cvauclair
Copy link
Contributor

@garance-buricatu can you add some formatting and a little bit more context to this issue? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design External API Design
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants