forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpep-3112.txt
159 lines (113 loc) · 4.77 KB
/
pep-3112.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
PEP: 3112
Title: Bytes literals in Python 3000
Version: $Revision$
Last-Modified: $Date$
Author: Jason Orendorff <[email protected]>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Requires: 358
Created: 23-Feb-2007
Python-Version: 3.0
Post-History: 23-Feb-2007
Abstract
========
This PEP proposes a literal syntax for the ``bytes`` objects
introduced in :pep:`358`. The purpose is to provide a convenient way to
spell ASCII strings and arbitrary binary data.
Motivation
==========
Existing spellings of an ASCII string in Python 3000 include::
bytes('Hello world', 'ascii')
'Hello world'.encode('ascii')
The proposed syntax is::
b'Hello world'
Existing spellings of an 8-bit binary sequence in Python 3000 include::
bytes([0x7f, 0x45, 0x4c, 0x46, 0x01, 0x01, 0x01, 0x00])
bytes('\x7fELF\x01\x01\x01\0', 'latin-1')
'7f454c4601010100'.decode('hex')
The proposed syntax is::
b'\x7f\x45\x4c\x46\x01\x01\x01\x00'
b'\x7fELF\x01\x01\x01\0'
In both cases, the advantages of the new syntax are brevity, some
small efficiency gain, and the detection of encoding errors at compile
time rather than at runtime. The brevity benefit is especially felt
when using the string-like methods of bytes objects::
lines = bdata.split(bytes('\n', 'ascii')) # existing syntax
lines = bdata.split(b'\n') # proposed syntax
And when converting code from Python 2.x to Python 3000::
sok.send('EXIT\r\n') # Python 2.x
sok.send('EXIT\r\n'.encode('ascii')) # Python 3000 existing
sok.send(b'EXIT\r\n') # proposed
Grammar Changes
===============
The proposed syntax is an extension of the existing string
syntax [#stringliterals]_.
The new syntax for strings, including the new bytes literal, is::
stringliteral: [stringprefix] (shortstring | longstring)
stringprefix: "b" | "r" | "br" | "B" | "R" | "BR" | "Br" | "bR"
shortstring: "'" shortstringitem* "'" | '"' shortstringitem* '"'
longstring: "'''" longstringitem* "'''" | '"""' longstringitem* '"""'
shortstringitem: shortstringchar | escapeseq
longstringitem: longstringchar | escapeseq
shortstringchar:
<any source character except "\" or newline or the quote>
longstringchar: <any source character except "\">
escapeseq: "\" NL
| "\\" | "\'" | '\"'
| "\a" | "\b" | "\f" | "\n" | "\r" | "\t" | "\v"
| "\ooo" | "\xhh"
| "\uxxxx" | "\Uxxxxxxxx" | "\N{name}"
The following additional restrictions apply only to bytes literals
(``stringliteral`` tokens with ``b`` or ``B`` in the
``stringprefix``):
- Each ``shortstringchar`` or ``longstringchar`` must be a character
between 1 and 127 inclusive, regardless of any encoding
declaration [#encodings]_ in the source file.
- The Unicode-specific escape sequences ``\u``\ *xxxx*,
``\U``\ *xxxxxxxx*, and ``\N{``\ *name*\ ``}`` are unrecognized in
Python 2.x and forbidden in Python 3000.
Adjacent bytes literals are subject to the same concatenation rules as
adjacent string literals [#concat]_. A bytes literal adjacent to a
string literal is an error.
Semantics
=========
Each evaluation of a bytes literal produces a new ``bytes`` object.
The bytes in the new object are the bytes represented by the
``shortstringitem`` or ``longstringitem`` parts of the literal, in the
same order.
Rationale
=========
The proposed syntax provides a cleaner migration path from Python 2.x
to Python 3000 for most code involving 8-bit strings. Preserving the
old 8-bit meaning of a string literal is usually as simple as adding a
``b`` prefix. The one exception is Python 2.x strings containing
bytes >127, which must be rewritten using escape sequences.
Transcoding a source file from one encoding to another, and fixing up
the encoding declaration, should preserve the meaning of the program.
Python 2.x non-Unicode strings violate this principle; Python 3000
bytes literals shouldn't.
A string literal with a ``b`` in the prefix is always a syntax error
in Python 2.5, so this syntax can be introduced in Python 2.6, along
with the ``bytes`` type.
A bytes literal produces a new object each time it is evaluated, like
list displays and unlike string literals. This is necessary because
bytes literals, like lists and unlike strings, are
mutable [#eachnew]_.
Reference Implementation
========================
Thomas Wouters has checked an implementation into the Py3K branch,
r53872.
References
==========
.. [#stringliterals]
http://docs.python.org/reference/lexical_analysis.html#string-literals
.. [#encodings]
http://docs.python.org/reference/lexical_analysis.html#encoding-declarations
.. [#concat]
http://docs.python.org/reference/lexical_analysis.html#string-literal-concatenation
.. [#eachnew]
https://mail.python.org/pipermail/python-3000/2007-February/005779.html
Copyright
=========
This document has been placed in the public domain.