Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] deeplake retriever implementation #326

Merged
merged 6 commits into from
Feb 19, 2024

Conversation

AniLeo-01
Copy link
Contributor

No description provided.

@okhat
Copy link
Collaborator

okhat commented Feb 1, 2024

This looks good, but we usually like to have the import for external dependencies happen inside the class, so that it does not affect people who don’t need the library

@AniLeo-01
Copy link
Contributor Author

@okhat could you please take a look 👀. I've made the changes that you have requested. Also, sorry for the inconvenience.

@okhat
Copy link
Collaborator

okhat commented Feb 2, 2024

Thanks! I think that import code needs to be inside init for example

@AniLeo-01
Copy link
Contributor Author

But putting the import inside init would lead to an error in the type hinting of the deeplake_client which would be VectorStore.

def __init__(
        self,
        deeplake_vectorstore_name: str,
        deeplake_client: VectorStore, #this will throw: "VectorStore" is not defined
        k: int = 3,
    ):
        try:
          from deeplake import VectorStore
        except ImportError:
          raise ImportError(
              "The 'deeplake' extra is required to use DeepLakeRM. Install it with `pip install dspy-ai[deeplake]`"
          )
        self._deeplake_vectorstore_name = deeplake_vectorstore_name
        self._deeplake_client = deeplake_client

        super().__init__(k=k)

@AniLeo-01
Copy link
Contributor Author

@okhat Except for that import part, I have fixed the code and made it workable (it's working if the import check is made outside init). Should I create a notebook for the demonstration of the retriever?

def __init__(
self,
deeplake_vectorstore_name: str,
deeplake_client: VectorStore,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type won't work right? The import is happening inside the function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also if possible please conduct a test with and without the dependencies installed to make sure it will work in both cases)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a generic Type type hint would resolve the issue. This is useful if we have a function that expects an argument of type VectorStore, but we do not want to specify the exact subclass of VectorStore that is required.

Suggested change
deeplake_client: VectorStore,
deeplake_client: Type['VectorStore'],

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to remove the type hint, as it's not enforcing any type either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the code with/without the dependencies, it's working now! Throwing the import error if the dependencies are not installed which is expected, and running fine if installation is done.

@insop
Copy link
Contributor

insop commented Feb 19, 2024

Could you also update this document?

  • docs/retrieval_models_client.md


super().__init__(k=k)

def embedding_function(self, texts, model="text-embedding-ada-002"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, deeplake doesn't provide any wrapper class for EmbeddingFunction, so we have to create it from scratch. I have implemented the openAI because it's the most popular and as a template as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supporting my above comment with this excerpt from the official documentation:

> embedding_function (Optional[Any], optional) – Function or class that converts the embeddable data into embeddings. Input to embedding_function is a list of data and output is a list of embeddings. Defaults to None.

Suggest me what options I would keep for embedding function, and I would definitely update them in the code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, looks good to me, please add the entry to the document that I mentioned.
Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@okhat okhat merged commit 7a6421a into stanfordnlp:main Feb 19, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants