Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix] Correct the _split_text/_split_sentence logic to give proper splits #164

Merged
merged 5 commits into from
Feb 6, 2025

Conversation

bhavnicksm
Copy link
Collaborator

This pull request includes several changes to the chunking functionality, focusing on the handling of delimiters, sentence splitting, and ensuring minimum characters per chunk or sentence. The most important changes include adding the include_delim parameter, modifying sentence splitting logic, and updating test cases to cover new functionality.

Enhancements to chunking functionality:

Updates to tests:

@bhavnicksm bhavnicksm self-assigned this Feb 6, 2025
@bhavnicksm bhavnicksm added the bug Something isn't working label Feb 6, 2025
@bhavnicksm bhavnicksm added this to the v0.4.2 milestone Feb 6, 2025
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 75.00000% with 14 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/chonkie/chunker/late.py 63.15% 5 Missing and 2 partials ⚠️
src/chonkie/chunker/semantic.py 75.00% 3 Missing and 1 partial ⚠️
src/chonkie/chunker/recursive.py 75.00% 2 Missing and 1 partial ⚠️
Flag Coverage Δ
python-3.10 66.22% <75.00%> (+0.64%) ⬆️
python-3.11 66.22% <75.00%> (+0.64%) ⬆️
python-3.12 66.22% <75.00%> (+0.64%) ⬆️
python-3.13 66.22% <75.00%> (+0.64%) ⬆️
python-3.9 66.20% <75.00%> (+0.70%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/chonkie/chunker/sentence.py 72.93% <100.00%> (+1.83%) ⬆️
src/chonkie/chunker/recursive.py 67.91% <75.00%> (+0.98%) ⬆️
src/chonkie/chunker/semantic.py 59.01% <75.00%> (+0.46%) ⬆️
src/chonkie/chunker/late.py 70.11% <63.15%> (-3.06%) ⬇️

... and 1 file with indirect coverage changes

@bhavnicksm bhavnicksm merged commit 0810932 into main Feb 6, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant