Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make chardet optional or use charset-normalizer instead #222

Closed
nijel opened this issue May 22, 2024 · 2 comments · Fixed by #223
Closed

Make chardet optional or use charset-normalizer instead #222

nijel opened this issue May 22, 2024 · 2 comments · Fixed by #223

Comments

@nijel
Copy link
Contributor

nijel commented May 22, 2024

While memory profiling Weblate I've noticed that 2+ MB is consumed on chardet module which we directly don't depend on.

The only reverse dependency for chardet is gaupol in our case. Everybody else seems to have switched to charset-normalizer instead, which is a maintained, faster and low memory footprint alternative.

I'm willing to contribute a pull request, but first I'd need to know which direction you prefer. Two approaches I can see:

  • Switch to charset-normalizer, but it supports only Python 3.5 and newer while aeidon supports older Python versions, so it would mean raising the bar. I don't think this should be an issue these days.
  • Remove chardet from required dependencies, moving it to extras.
@otsaloma
Copy link
Owner

chardet is not required in aeidon/gaupol, it's imported only under the aeidon.encodings.detect function and guarded with if aeidon.util.chardet_available():. I think I put in the setup-aeidon.py's install_requires for convenience. Do I understand correctly that there's no opt-out type depedencies, i.e. ones that would be installed by default but that you could opt out of with some syntax?

I think I was a bit eager to make dependencies optional back when writing these. Encoding auto-detection is something that probably 95% of users want, since they're downloading random subtitle files from the internet and can't really know the encoding.

nijel added a commit to nijel/gaupol that referenced this issue May 23, 2024
nijel added a commit to nijel/gaupol that referenced this issue May 23, 2024
@nijel
Copy link
Contributor Author

nijel commented May 23, 2024

If you do pip install aeidon you end up with chardet. You can manually uninstall it, but pip will then complain about unmet dependencies.

Anyway, I've created #223 to migrate to charset-normalizer, please review it if this is something you want.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants