Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemetoot may die when encountering utf8 encoded toots #11

Closed
billalbertson opened this issue Jul 21, 2019 · 5 comments
Closed

Ephemetoot may die when encountering utf8 encoded toots #11

billalbertson opened this issue Jul 21, 2019 · 5 comments
Assignees
Labels
documentation Something needs to be documented better or at all

Comments

@billalbertson
Copy link

Adding "PYTHONIOENCODING=utf-8" fixes the error. Error and post error with utf-8 setting below.

bills-grimoire:ephemetoot> python3 ephemetoot.py --test                          
This is a test run...
Fetching account details...
Checking 1314 toots...
Traceback (most recent call last):
  File "ephemetoot.py", line 80, in checkToots
    + toot.created_at.strftime("%d %b %Y")
UnicodeEncodeError: 'ascii' codec can't encode character '\u274c' in position 0: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "ephemetoot.py", line 139, in <module>
    checkToots(timeline)
  File "ephemetoot.py", line 121, in checkToots
    checkToots(next_batch, deleted_count)
  File "ephemetoot.py", line 121, in checkToots
    checkToots(next_batch, deleted_count)
  File "ephemetoot.py", line 121, in checkToots
    checkToots(next_batch, deleted_count)
  [Previous line repeated 4 more times]
  File "ephemetoot.py", line 111, in checkToots
    print("\U0001f6d1 Unknown ERROR deleting toot - " + str(toot.id))
UnicodeEncodeError: 'ascii' codec can't encode character '\U0001f6d1' in position 0: ordinal not in range(128)

bills-grimoire:ephemetoot> PYTHONIOENCODING=utf-8 python3 ephemetoot.py --test 
This is a test run...
Fetching account details...
Checking 1314 toots...
<snip>
Test run. This would have removed 1044 toots.

@hughrun hughrun added the bug Something isn't working label Jul 22, 2019
@hughrun hughrun self-assigned this Jul 22, 2019
@hughrun
Copy link
Owner

hughrun commented Jul 22, 2019

Hmm, weird, Python3 should read and write strings using utf-8 by default. Can you tell me more about your environment, @billalbertson? What OS and Python version are you running?

@billalbertson
Copy link
Author

billalbertson commented Jul 22, 2019 via email

@hughrun
Copy link
Owner

hughrun commented Jul 22, 2019

Hey @billalbertson I had another look at this and it looks like this should be more or less resolved as of Python 3.7

Essentially the problem is that Python likes to cooperate with other tools and libraries, and prior to v3.7 will always use the locale settings in the host OS. LIke several other *nix systems, OpenBSD uses US ASCII as the default locale character encoding. As of Python 3.7 there is now a setting called PYTHONCOERCECLOCALE which defaults to being on and coercing the locale value to UTF-8. If you're interested in the background there's a big explanation in Python PEP 538 which led to the change.

So I guess there are three possible solutions for anyone encountering this:

  1. Use the PYTHONIOENCODING=utf-8 env setting when running the script, as you have described
  2. Set the OS locale to UTF-8 in your ~/.profile file (e.g. export LC_CTYPE=en_US.UTF-8) - though this might have unintended consequences
  3. Upgrade to Python 3.7

I probably should add something in the docs to this effect.

@billalbertson
Copy link
Author

billalbertson commented Jul 23, 2019 via email

@hughrun hughrun added documentation Something needs to be documented better or at all and removed bug Something isn't working labels Jul 23, 2019
@hughrun
Copy link
Owner

hughrun commented Jul 23, 2019

Thanks for bringing it to my attention!

@hughrun hughrun closed this as completed Jul 23, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Something needs to be documented better or at all
Projects
None yet
Development

No branches or pull requests

2 participants