Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update fixed.py #1704

Merged
merged 2 commits into from
Jan 7, 2025
Merged

Update fixed.py #1704

merged 2 commits into from
Jan 7, 2025

Conversation

Gruhit13
Copy link
Contributor

@Gruhit13 Gruhit13 commented Jan 5, 2025

method chunk was not taking into consideration the overlap length when checking for the while loop and as a result it was running endlessly making the RAM go OOM. So added following functionality:

  1. If the length of current document is lesser than overlap then no need to chunk it just return it.
  2. Check if the start + overlap is lesser than content length to avoid endless chunking.

Fixes #1703

Type of change

Please check the options that are relevant:

  • Bug fix (non-breaking change which fixes an issue)
  • [] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Model update
  • Infrastructure change

Checklist

  • My code follows Phidata's style guidelines and best practices
  • I have performed a self-review of my code
  • I have added docstrings and comments for complex logic
  • My changes generate no new warnings or errors
  • I have added cookbook examples for my new addition (if needed)
  • I have updated requirements.txt/pyproject.toml (if needed)
  • I have verified my changes in a clean environment

method ```chunk``` was not taking into consideration the overlap length when checking for the ```while``` loop and as a result it was running endlessly making the RAM go OOM. So added following functionality:
1. If the length of current document is lesser than overlap then no need to chunk it just return it.
2. Check if the start + overlap is lesser than content length to avoid endless chunking.
Copy link
Contributor

@manthanguptaa manthanguptaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch @Gruhit13! Thanks for this fix!

@Gruhit13
Copy link
Contributor Author

Gruhit13 commented Jan 6, 2025

Hey @manthanguptaa I am not able to push to changes to main branch. Would you be able to merge my code. I have tested it does not collide with any exiting code so you would get issue merging it. I am do not have the authorization for merging.

@manthanguptaa
Copy link
Contributor

Hey @manthanguptaa I am not able to push to changes to main branch. Would you be able to merge my code. I have tested it does not collide with any exiting code so you would get issue merging it. I am do not have the authorization for merging.

Please be patient @Gruhit13 we will merge it in the next release :)

@Gruhit13
Copy link
Contributor Author

Gruhit13 commented Jan 6, 2025

okay 👍

@manthanguptaa manthanguptaa merged commit c583a95 into phidatahq:main Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FixedSizeChunking runs endlessly when using overlap
2 participants