[Community] [Text Splitters]: Fix TEI support and html section splitter #22595
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
[Community]: Fix HuggingFaceHubEmbeddings
Modified
langchain_community.embeddings.HuggingFaceHubEmbeddings
, where the API call function does not match current Text Embeddings Inference API.One example is:
Parameters in
_model_kwargs
are not passed properly in the latest version. By the way, the issue why cause 413? #50 might be solved.[Text Splitters]: Fix HTMLSectionSplitter
Modified
langchain_text_splitters.HTMLSectionSplitter
, where in the latest versiondict
data structure is used to store sections from a html document, in functionsplit_html_by_headers
. The header/section element names serve as dict keys. This can be a problem when duplicate header/section element names are present in a single html document. Latter ones can replace former ones with the same name. Therefore some contents can be miss after html text splitting is conducted.Using a list to store sections can hopefully solve the problem.