Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for the semantic_text field and semantic query type #1881

Merged
merged 3 commits into from
Aug 19, 2024

Conversation

miguelgrinberg
Copy link
Collaborator

@miguelgrinberg miguelgrinberg commented Aug 15, 2024

This change adds support for semantic text, introduced in Elasticsearch 8.15, plus an example application. Unfortunately running this application requires a somewhat beefy ES instance, so I'm not going to add integration tests.


name: str
summary: str
content: Any = dsl.mapped_field(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm typing the semantic text field as Any because it has different types during ingest and search. On ingest it is a plain string, while on search it is returned as an object with the original string in the text attribute. The object also includes an inference attribute with the autogenerated chunks and their embeddings.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM.



async def search(query: str) -> dsl.AsyncSearch[WorkplaceDoc]:
return WorkplaceDoc.search()[:5].query(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Doing [:5] before the query call is a bit confusing IMO.


async def search(query: str) -> dsl.AsyncSearch[WorkplaceDoc]:
return WorkplaceDoc.search()[:5].query(
"semantic",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use the class here? One advantage is that it avoids typos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason at all, except that historically it has been preferred to use the names. I actually prefer the classes myself and started using them on my tests. I'll update this.

@miguelgrinberg miguelgrinberg added the backport 8.x Backport to 8.x label Aug 19, 2024
@miguelgrinberg miguelgrinberg merged commit 7fa4f8c into elastic:main Aug 19, 2024
17 checks passed
@miguelgrinberg miguelgrinberg deleted the semantic-text-support branch August 19, 2024 10:44
github-actions bot pushed a commit that referenced this pull request Aug 19, 2024
…#1881)

* Added support for the `semantic_text` field and `semantic` query type

* Fix nltk code... again

* feedback

(cherry picked from commit 7fa4f8c)
miguelgrinberg added a commit that referenced this pull request Aug 19, 2024
…#1881) (#1882)

* Added support for the `semantic_text` field and `semantic` query type

* Fix nltk code... again

* feedback

(cherry picked from commit 7fa4f8c)

Co-authored-by: Miguel Grinberg <[email protected]>
miguelgrinberg added a commit to miguelgrinberg/elasticsearch-dsl-py that referenced this pull request Dec 9, 2024
…elastic#1881)

* Added support for the `semantic_text` field and `semantic` query type

* Fix nltk code... again

* feedback
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 8.x Backport to 8.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants