Skip to content

Latest commit

 

History

History
279 lines (214 loc) · 10.7 KB

pep-0688.rst

File metadata and controls

279 lines (214 loc) · 10.7 KB

PEP: 688 Title: Making the buffer protocol accessible in Python Author: Jelle Zijlstra <[email protected]> Discussions-To: https://discuss.python.org/t/15265 Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 23-Apr-2022 Python-Version: 3.12 Post-History: 23-Apr-2022,

25-Apr-2022

Abstract

This PEP proposes a mechanism for Python code to inspect whether a type supports the C-level buffer protocol. This allows type checkers to evaluate whether objects implement the protocol.

Motivation

The CPython C API provides a versatile mechanism for accessing the underlying memory of an object—the buffer protocol introduced in PEP 3118. Functions that accept binary data are usually written to handle any object implementing the buffer protocol. For example, at the time of writing, there are around 130 functions in CPython using the Argument Clinic Py_buffer type, which accepts the buffer protocol.

Currently, there is no way for Python code to inspect whether an object supports the buffer protocol. Moreover, the static type system does not provide a type annotation to represent the protocol. This is a common problem when writing type annotations for code that accepts generic buffers.

Rationale

Current options

There are two current workarounds for annotating buffer types in the type system, but neither is adequate.

First, the current workaround for buffer types in typeshed is a type alias that lists well-known buffer types in the standard library, such as bytes, bytearray, memoryview, and array.array. This approach works for the standard library, but it does not extend to third-party buffer types.

Second, the documentation for typing.ByteString currently states:

This type represents the types bytes, bytearray, and memoryview of byte sequences.

As a shorthand for this type, bytes can be used to annotate arguments of any of the types mentioned above.

Although this sentence has been in the documentation since 2015, the use of bytes to include these other types is not specified in any of the typing PEPs. Furthermore, this mechanism has a number of problems. It does not include all possible buffer types, and it makes the bytes type ambiguous in type annotations. After all, there are many operations that are valid on bytes objects, but not on memoryview objects, and it is perfectly possible for a function to accept bytes but not memoryview objects. A mypy user reports that this shortcut has caused significant problems for the psycopg project.

Kinds of buffers

The C buffer protocol supports many options, affecting strides, contiguity, and support for writing to the buffer. Some of these options would be useful in the type system. For example, typeshed currently provides separate type aliases for writable and read-only buffers.

However, in the C buffer protocol, these options cannot be queried directly on the type object. The only way to figure out whether an object supports a writable buffer is to actually ask for the buffer. For some types, such as memoryview, whether the buffer is writable depends on the instance: some instances are read-only and others are not. As such, we propose to expose only whether a type implements the buffer protocol at all, not whether it supports more specific options such as writable buffers.

Specification

types.Buffer

A new class, types.Buffer, will be added. It cannot be instantiated or subclassed at runtime, but supports the __instancecheck__ and __subclasscheck__ hooks. In CPython, these will check for the presence of the bf_getbuffer slot in the type object:

>>> from types import Buffer
>>> isinstance(b"xy", Buffer)
True
>>> issubclass(bytes, Buffer)
True
>>> issubclass(memoryview, Buffer)
True
>>> isinstance("xy", Buffer)
False
>>> issubclass(str, Buffer)
False

The new class can also be used in type annotations:

def need_buffer(b: Buffer) -> memoryview:
    return memoryview(b)

need_buffer(b"xy")  # ok
need_buffer("xy")  # rejected by static type checkers

Usage in stub files

For static typing purposes, types defined in C extensions usually require stub files, as :pep:`described in PEP 484 <484#stub-files>`. In stub files, types.Buffer may be used as a base class to indicate that a class implements the buffer protocol.

For example, memoryview may be declared as follows in a stub:

class memoryview(types.Buffer, Sized, Sequence[int]):
    ...

The types.Buffer class does not require any special treatment by type checkers.

Equivalent for older Python versions

New typing features are usually backported to older Python versions in the typing_extensions package. Because the buffer protocol is accessible only in C, types.Buffer cannot be implemented in a pure-Python package like typing_extensions. As a temporary workaround, a typing_extensions.Buffer abstract base class will be provided for Python versions that do not have types.Buffer available.

For the benefit of static type checkers, typing_extensions.Buffer can be used as a base class in stubs to mark types as supporting the buffer protocol. For runtime uses, the ABC.register API can be used to register buffer classes with typing_extensions.Buffer.

When types.Buffer is available, typing_extensions should simply re-export it. Thus, users who register their buffer class manually with typing_extensions.Buffer.register should use a guard to make sure their code continues to work once types.Buffer is in the standard library.

No special meaning for bytes

The special case stating that bytes may be used as a shorthand for other ByteString types will be removed from the typing documentation. With types.Buffer available as an alternative, there will be no good reason to allow bytes as a shorthand. We suggest that type checkers currently implementing this behavior should deprecate and eventually remove it.

Backwards Compatibility

As the runtime changes in this PEP only add a new class, there are no backwards compatibility concerns.

However, the recommendation to remove the special behavior for bytes in type checkers does have a backwards compatibility impact on their users. An experiment with mypy shows that several major open source projects that use it for type checking will see new errors if the bytes promotion is removed. Many of these errors can be fixed by improving the stubs in typeshed, as has already been done for the builtins, binascii, pickle, and re modules. Overall, the change improves type safety and makes the type system more consistent, so we believe the migration cost is worth it.

How to Teach This

We will add notes pointing to types.Buffer in appropriate places in the documentation, such as typing.readthedocs.io and the mypy cheat sheet. Type checkers may provide additional pointers in their error messages. For example, when they encounter a buffer object being passed to a function that is annotated to only accept bytes, the error message could include a note suggesting the use of types.Buffer instead.

Reference Implementation

An implementation of types.Buffer is available in the author's fork.

Rejected Ideas

Buffer ABC

An earlier proposal suggested adding a collections.abc.Buffer abstract base class to represent buffer objects. This idea stalled because an ABC with no methods does not fit well into the collections.abc module. Furthermore, it required manual registration of buffer classes, including those in the standard library. This PEP's approach of using the __instancecheck__ hook is more natural and does not require explicit registration.

Nevertheless, the ABC proposal has the advantage that it does not require C changes. This PEP proposes to adopt a version of it in the third-party typing_extensions package for the benefit of users of older Python versions.

Keep bytearray compatible with bytes

It has been suggested to remove the special case where memoryview is always compatible with bytes, but keep it for bytearray, because the two types have very similar interfaces. However, several standard library functions (e.g., re.compile and socket.getaddrinfo) accept bytes but not bytearray. In most codebases, bytearray is also not a very common type. We prefer to have users spell out accepted types explicitly (or use Protocol from PEP 544 if only a specific set of methods is required).

Open Issues

Read-only and writable buffers

To avoid making changes to the buffer protocol itself, this PEP currently does not provide a way to distinguish between read-only and writable buffers. That's unfortunate, because some APIs require a writable buffer, and one of the most common buffer types (bytes) is always read-only. Should we add a new mechanism in C to declare that a type implementing the buffer protocol is potentially writable?

Copyright

This document is placed in the public domain or under the CC0-1.0-Universal license, whichever is more permissive.