Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_call.py - UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 32: invalid continuation byte #239

Open
lassekh opened this issue Jan 28, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@lassekh
Copy link

lassekh commented Jan 28, 2025

On my Windows PC my user folder are with the danish letter 'Æ' and this gives an error for me:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main       
  File "<frozen runpy>", line 88, in _run_code
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Scripts\pip-audit.exe\__main__.py", line 4, in <module>
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_cli.py", line 17, in <module>
    from pip_audit._audit import AuditOptions, Auditor
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_audit.py", line 11, in <module>
    from pip_audit._dependency_source import DependencySource   
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_dependency_source\__init__.py", line 5, in <module>
    from .interface import (
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_dependency_source\interface.py", line 11, in <module>
    from pip_audit._fix import ResolvedFixVersion
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_fix.py", line 13, in <module>
    from pip_audit._service import (
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_service\__init__.py", line 14, in <module>
    from .osv import OsvService
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_service\osv.py", line 15, in <module>     
    from pip_audit._cache import caching_session
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_audit\_cache.py", line 15, in <module>
    import pip_api
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_api\__init__.py", line 9, in <module>
    PIP_VERSION: Version = packaging_version.parse(version())  # type: ignore
                                                   ^^^^^^^^^    
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_api\_version.py", line 5, in version
    result = call("--version")
             ^^^^^^^^^^^^^^^^^
  File "c:\Users\LasseKjærHansen\P-Secure\P-Secure\.venv\Lib\site-packages\pip_api\_call.py", line 12, in call
    return result.decode()
           ^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 32: invalid continuation byte

It occured running pip-audit. I tried lots of different things to fix it, but the trick that finally got it to work, was modifying the call function to this:

def call(*args, cwd=None):
    python_location = os.environ.get("PIPAPI_PYTHON_LOCATION", sys.executable)
    env = {**os.environ, **{"PIP_YES": "true", "PIP_DISABLE_PIP_VERSION_CHECK": "true"}}
    result = subprocess.check_output(
        [python_location, "-m", "pip"] + list(args), cwd=cwd, env=env
    )
    try:
        return result.decode('cp1252')  # Explicitly use Windows-1252 encoding
    except UnicodeDecodeError:
        # If that fails, try other encodings
        for encoding in ['utf-8', 'latin1', 'iso-8859-1']:
            try:
                return result.decode(encoding)
            except UnicodeDecodeError:
                continue
        # If all else fails, use 'replace' to handle unknown characters
        return result.decode('utf-8', errors='replace')
@woodruffw
Copy link
Collaborator

Thanks for the report @lassekh!

To make sure I understand: you ran pip-audit which then failed within pip-api, right?

From the sys.stdout docs, it looks like the default non-chardev output for Windows is the "ANSI" codepage, which is really whatever the system locale is currently configured with.

We probably can't handle every possible locale, so our best option here is probably to spawn pip with a controlled encoding on the Python side. In particular, I think PYTHONIOENCODING="utf8" in the environment will normalize things on Windows hosts.

@lassekh would you be able to give the above a try and see if it helps?

CC @di for thoughts as well 🙂

@woodruffw woodruffw added the bug Something isn't working label Jan 28, 2025
@di
Copy link
Owner

di commented Jan 28, 2025

I agree, I think that makes more sense than trying to enumerate every possible encoding, and since we spawn pip in a subprocess this shouldn't affect the user.

@woodruffw
Copy link
Collaborator

Sounds good! I don't have a Windows machine to test on but I can try and reproduce in CI; I'll have a fix PR up shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants