forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpep-0455.txt
282 lines (205 loc) · 8.67 KB
/
pep-0455.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
PEP: 455
Title: Adding a key-transforming dictionary to collections
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <[email protected]>
BDFL-Delegate: Raymond Hettinger
Status: Rejected
Type: Standards Track
Content-Type: text/x-rst
Created: 13-Sep-2013
Python-Version: 3.5
Post-History:
Abstract
========
This PEP proposes a new data structure for the ``collections`` module,
called "TransformDict" in this PEP. This structure is a mutable mapping
which transforms the key using a given function when doing a lookup, but
retains the original key when reading.
Rejection
---------
See the rationale at
https://mail.python.org/pipermail/python-dev/2015-May/140003.html
and for an earlier partial review, see
https://mail.python.org/pipermail/python-dev/2013-October/129937.html .
Rationale
=========
Numerous specialized versions of this pattern exist. The most common
is a case-insensitive case-preserving dict, i.e. a dict-like container
which matches keys in a case-insensitive fashion but retains the original
casing. It is a very common need in network programming, as many
protocols feature some arrays of "key / value" properties in their
messages, where the keys are textual strings whose case is specified to
be ignored on receipt but by either specification or custom is to be
preserved or non-trivially canonicalized when retransmitted.
Another common request is an identity dict, where keys are matched
according to their respective id()s instead of normal matching.
Both are instances of a more general pattern, where a given transformation
function is applied to keys when looking them up: that function being
``str.lower`` or ``str.casefold`` in the former example and the built-in
``id`` function in the latter.
(It could be said that the pattern *projects* keys from the user-visible
set onto the internal lookup set.)
Semantics
=========
TransformDict is a ``MutableMapping`` implementation: it faithfully
implements the well-known API of mutable mappings, like ``dict`` itself
and other dict-like classes in the standard library. Therefore, this PEP
won't rehash the semantics of most TransformDict methods.
The transformation function needn't be bijective, it can be strictly
surjective as in the case-insensitive example (in other words, different
keys can lookup the same value)::
>>> d = TransformDict(str.casefold)
>>> d['SomeKey'] = 5
>>> d['somekey']
5
>>> d['SOMEKEY']
5
TransformDict retains the first key used when creating an entry::
>>> d = TransformDict(str.casefold)
>>> d['SomeKey'] = 1
>>> d['somekey'] = 2
>>> list(d.items())
[('SomeKey', 2)]
The original keys needn't be hashable, as long as the transformation
function returns a hashable one::
>>> d = TransformDict(id)
>>> l = [None]
>>> d[l] = 5
>>> l in d
True
Constructor
-----------
As shown in the examples above, creating a TransformDict requires passing
the key transformation function as the first argument (much like creating
a ``defaultdict`` requires passing the factory function as first argument).
The constructor also takes other optional arguments which can be used
to initialize the TransformDict with certain key-value pairs. Those
optional arguments are the same as in the ``dict`` and ``defaultdict``
constructors::
>>> d = TransformDict(str.casefold, [('Foo', 1)], Bar=2)
>>> sorted(d.items())
[('Bar', 2), ('Foo', 1)]
Getting the original key
------------------------
TransformDict also features a lookup method returning the stored key
together with the corresponding value::
>>> d = TransformDict(str.casefold, {'Foo': 1})
>>> d.getitem('FOO')
('Foo', 1)
>>> d.getitem('bar')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: 'bar'
The method name ``getitem()`` follows the standard ``popitem()`` method
on mutable mappings.
Getting the transformation function
-----------------------------------
TransformDict has a simple read-only property ``transform_func`` which
gives back the transformation function.
Alternative proposals and questions
===================================
Retaining the last original key
-------------------------------
Most python-dev respondents found retaining the first user-supplied key
more intuitive than retaining the last. Also, it matches the dict
object's own behaviour when using different but equal keys::
>>> d = {}
>>> d[1] = 'hello'
>>> d[1.0] = 'world'
>>> d
{1: 'world'}
Furthermore, explicitly retaining the last key in a first-key-retaining
scheme is still possible using the following approach::
d.pop(key, None)
d[key] = value
while the converse (retaining the first key in a last-key-retaining
scheme) doesn't look possible without rewriting part of the container's
code.
Using an encoder / decoder pair
-------------------------------
Using a function pair isn't necessary, since the original key is retained
by the container. Moreover, an encoder / decoder pair would require the
transformation to be bijective, which prevents important use cases
like case-insensitive matching.
Providing a transformation function for values
----------------------------------------------
Dictionary values are not used for lookup, their semantics are totally
irrelevant to the container's operation. Therefore, there is no point in
having both an "original" and a "transformed" value: the transformed
value wouldn't be used for anything.
Providing a specialized container, not generic
----------------------------------------------
It was asked why we would provide the generic TransformDict construct
rather than a specialized case-insensitive dict variant. The answer
is that it's nearly as cheap (code-wise and performance-wise) to provide
the generic construct, and it can fill more use cases.
Even case-insensitive dicts can actually elicit different transformation
functions: ``str.lower``, ``str.casefold`` or in some cases ``bytes.lower``
when working with text encoded in an ASCII-compatible encoding.
Other constructor patterns
--------------------------
Two other constructor patterns were proposed by Serhiy Storchaka:
* A type factory scheme::
d = TransformDict(str.casefold)(Foo=1)
* A subclassing scheme::
class CaseInsensitiveDict(TransformDict):
__transform__ = str.casefold
d = CaseInsensitiveDict(Foo=1)
While both approaches can be defended, they don't follow established
practices in the standard library, and therefore were rejected.
Implementation
==============
A patch for the collections module is tracked on the bug tracker at
http://bugs.python.org/issue18986.
Existing work
=============
Case-insensitive dicts are a popular request:
* http://twistedmatrix.com/documents/current/api/twisted.python.util.InsensitiveDict.html
* https://mail.python.org/pipermail/python-list/2013-May/647243.html
* https://mail.python.org/pipermail/python-list/2005-April/296208.html
* https://mail.python.org/pipermail/python-list/2004-June/241748.html
* http://bugs.python.org/msg197376
* http://stackoverflow.com/a/2082169
* http://stackoverflow.com/a/3296782
* http://code.activestate.com/recipes/66315-case-insensitive-dictionary/
* https://gist.github.com/babakness/3901174
* http://www.wikier.org/blog/key-insensitive-dictionary-in-python
* http://en.sharejs.com/python/14534
* http://www.voidspace.org.uk/python/archive.shtml#caseless
Identity dicts have been requested too:
* https://mail.python.org/pipermail/python-ideas/2010-May/007235.html
* http://www.gossamer-threads.com/lists/python/python/209527
Several modules in the standard library use identity lookups for object
memoization, for example ``pickle``, ``json``, ``copy``, ``cProfile``,
``doctest`` and ``_threading_local``.
Other languages
---------------
C# / .Net
^^^^^^^^^
.Net has a generic ``Dictionary`` class where you can specify a custom
``IEqualityComparer``: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
Using it is the recommended way to write case-insensitive dictionaries:
http://stackoverflow.com/questions/13230414/case-insensitive-access-for-generic-dictionary
Java
^^^^
Java has a specialized ``CaseInsensitiveMap``:
http://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/map/CaseInsensitiveMap.html
It also has a separate ``IdentityHashMap``:
http://docs.oracle.com/javase/6/docs/api/java/util/IdentityHashMap.html
C++
^^^
The C++ Standard Template Library features an ``unordered_map``
with customizable hash and equality functions:
http://www.cplusplus.com/reference/unordered_map/unordered_map/
Copyright
=========
This document has been placed in the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8
End: