Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io: improve bytes handling #9059

Merged
merged 4 commits into from
Nov 2, 2022
Merged

io: improve bytes handling #9059

merged 4 commits into from
Nov 2, 2022

Conversation

sobolevn
Copy link
Member

@sobolevn sobolevn commented Nov 1, 2022

IncrementalNewlineDecoder.decode does not support str, but supports bytearray and memoryview:

>>> import codecs, io
>>> dec = codecs.getincrementaldecoder('utf8')()
>>> d = io.IncrementalNewlineDecoder(dec, translate=True)
>>> d.decode('str')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/sobolev/.pyenv/versions/3.10.0/lib/python3.10/codecs.py", line 321, in decode
    data = self.buffer + input
TypeError: can't concat str to bytes

Buffers:

>>> d.decode(bytearray(b'\r'))
'\n'
>>> d.decode(memoryview(b'\r'))
'\n'
>>> d.decode(b'\r')
'\n'

@github-actions

This comment has been minimized.

@srittau
Copy link
Collaborator

srittau commented Nov 1, 2022

It isn't that easy, unfortunately:

Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import io
>>> d = io.IncrementalNewlineDecoder(None, translate=True)
>>> d.decode('str')
'str'
>>> d.decode(b'bytes')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: decoder should return a string result, not 'bytes'

This is the implemenation: https://github.com/python/cpython/blob/934b25dcc492dcbca4da9d63d0d71dc940fc0375/Modules/_io/textio.c#L284

PyObject *
_PyIncrementalNewlineDecoder_decode(PyObject *myself,
                                    PyObject *input, int final)
{
    PyObject *output;
    Py_ssize_t output_len;
    nldecoder_object *self = (nldecoder_object *) myself;

    if (self->decoder == NULL) {
        PyErr_SetString(PyExc_ValueError,
                        "IncrementalNewlineDecoder.__init__ not called");
        return NULL;
    }

    /* decode input (with the eventual \r from a previous pass) */
    if (self->decoder != Py_None) {
        output = PyObject_CallMethodObjArgs(self->decoder,
            &_Py_ID(decode), input, final ? Py_True : Py_False, NULL);
    }
    else {
        output = input;
        Py_INCREF(output);
    }

    if (check_decoded(output) < 0)
        return NULL;

    /* ... */

So, there are code paths, where only str is allowed, as check_decoded() explicity checks for a str.

@sobolevn
Copy link
Member Author

sobolevn commented Nov 1, 2022

Ok, let's go with an easy solution. I don't have any other ideas :(

@github-actions

This comment has been minimized.

@JelleZijlstra
Copy link
Member

I'm not sure this is quite right either. The argument gets passed to codecs.IncrementalDecoder.decode, which is an abstract method. I left it as taking bytes when I reviewed codecs, because that's what BufferedIncrementalDecoder initializes its buffer to. There may be decoders that accept ReadableBuffer, but I'm not sure it's required. If we make the change in this PR, we should also change codecs.

@sobolevn
Copy link
Member Author

sobolevn commented Nov 2, 2022

Yes, you are correct. Here's the full sequence:

(abstract)codecs.IncrementalDecoder -> (abstract)codecs.BufferedIncrementalDecoder -> encodings.*.IncrementalDecoder

Then io.IncrementalNewlineDecoder uses some of the concrete types from encodings.*.
Types in encodings.* already have ReadableBuffer set in this PR: #9043

So, I will create a new PR to fix codecs. types in a moment.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 2, 2022

According to mypy_primer, this change has no effect on the checked open source code. 🤖🎉

@sobolevn
Copy link
Member Author

sobolevn commented Nov 2, 2022

There's an unrelated failure in the CI.

@JelleZijlstra JelleZijlstra merged commit f972bdf into python:main Nov 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants