forked from python/peps
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathpep-0428.txt
724 lines (522 loc) · 21.4 KB
/
pep-0428.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
PEP: 428
Title: The pathlib module -- object-oriented filesystem paths
Version: $Revision$
Last-Modified: $Date$
Author: Antoine Pitrou <[email protected]>
Status: Final
Type: Standards Track
Content-Type: text/x-rst
Created: 30-Jul-2012
Python-Version: 3.4
Post-History: `05-Oct-2012 <https://mail.python.org/pipermail/python-ideas/2012-October/016338.html>`__
Resolution: https://mail.python.org/pipermail/python-dev/2013-November/130424.html
Abstract
========
This PEP proposes the inclusion of a third-party module, `pathlib`_, in
the standard library. The inclusion is proposed under the provisional
label, as described in :pep:`411`. Therefore, API changes can be done,
either as part of the PEP process, or after acceptance in the standard
library (and until the provisional label is removed).
The aim of this library is to provide a simple hierarchy of classes to
handle filesystem paths and the common operations users do over them.
.. _`pathlib`: http://pypi.python.org/pypi/pathlib/
Related work
============
An object-oriented API for filesystem paths has already been proposed
and rejected in :pep:`355`. Several third-party implementations of the
idea of object-oriented filesystem paths exist in the wild:
* The historical `path.py module`_ by Jason Orendorff, Jason R. Coombs
and others, which provides a ``str``-subclassing ``Path`` class;
* Twisted's slightly specialized `FilePath class`_;
* An `AlternativePathClass proposal`_, subclassing ``tuple`` rather than
``str``;
* `Unipath`_, a variation on the str-subclassing approach with two public
classes, an ``AbstractPath`` class for operations which don't do I/O and a
``Path`` class for all common operations.
This proposal attempts to learn from these previous attempts and the
rejection of :pep:`355`.
.. _`path.py module`: https://github.com/jaraco/path.py
.. _`FilePath class`: http://twistedmatrix.com/documents/current/api/twisted.python.filepath.FilePath.html
.. _`AlternativePathClass proposal`: http://wiki.python.org/moin/AlternativePathClass
.. _`Unipath`: https://bitbucket.org/sluggo/unipath/overview
Implementation
==============
The implementation of this proposal is tracked in the ``pep428`` branch
of pathlib's `Mercurial repository`_.
.. _`Mercurial repository`: https://bitbucket.org/pitrou/pathlib/
Why an object-oriented API
==========================
The rationale to represent filesystem paths using dedicated classes is the
same as for other kinds of stateless objects, such as dates, times or IP
addresses. Python has been slowly moving away from strictly replicating
the C language's APIs to providing better, more helpful abstractions around
all kinds of common functionality. Even if this PEP isn't accepted, it is
likely that another form of filesystem handling abstraction will be adopted
one day into the standard library.
Indeed, many people will prefer handling dates and times using the high-level
objects provided by the ``datetime`` module, rather than using numeric
timestamps and the ``time`` module API. Moreover, using a dedicated class
allows to enable desirable behaviours by default, for example the case
insensitivity of Windows paths.
Proposal
========
Class hierarchy
---------------
The `pathlib`_ module implements a simple hierarchy of classes::
+----------+
| |
---------| PurePath |--------
| | | |
| +----------+ |
| | |
| | |
v | v
+---------------+ | +-----------------+
| | | | |
| PurePosixPath | | | PureWindowsPath |
| | | | |
+---------------+ | +-----------------+
| v |
| +------+ |
| | | |
| -------| Path |------ |
| | | | | |
| | +------+ | |
| | | |
| | | |
v v v v
+-----------+ +-------------+
| | | |
| PosixPath | | WindowsPath |
| | | |
+-----------+ +-------------+
This hierarchy divides path classes along two dimensions:
* a path class can be either pure or concrete: pure classes support only
operations that don't need to do any actual I/O, which are most path
manipulation operations; concrete classes support all the operations
of pure classes, plus operations that do I/O.
* a path class is of a given flavour according to the kind of operating
system paths it represents. `pathlib`_ implements two flavours: Windows
paths for the filesystem semantics embodied in Windows systems, POSIX
paths for other systems.
Any pure class can be instantiated on any system: for example, you can
manipulate ``PurePosixPath`` objects under Windows, ``PureWindowsPath``
objects under Unix, and so on. However, concrete classes can only be
instantiated on a matching system: indeed, it would be error-prone to start
doing I/O with ``WindowsPath`` objects under Unix, or vice-versa.
Furthermore, there are two base classes which also act as system-dependent
factories: ``PurePath`` will instantiate either a ``PurePosixPath`` or a
``PureWindowsPath`` depending on the operating system. Similarly, ``Path``
will instantiate either a ``PosixPath`` or a ``WindowsPath``.
It is expected that, in most uses, using the ``Path`` class is adequate,
which is why it has the shortest name of all.
No confusion with builtins
--------------------------
In this proposal, the path classes do not derive from a builtin type. This
contrasts with some other Path class proposals which were derived from
``str``. They also do not pretend to implement the sequence protocol:
if you want a path to act as a sequence, you have to lookup a dedicated
attribute (the ``parts`` attribute).
The key reasoning behind not inheriting from ``str`` is to prevent accidentally
performing operations with a string representing a path and a string that
doesn't, e.g. ``path + an_accident``. Since operations with a string will not
necessarily lead to a valid or expected file system path, "explicit is better
than implicit" by avoiding accidental operations with strings by not
subclassing it. A `blog post`_ by a Python core developer goes into more detail
on the reasons behind this specific design decision.
.. _blog post: http://www.snarky.ca/why-pathlib-path-doesn-t-inherit-from-str
Immutability
------------
Path objects are immutable, which makes them hashable and also prevents a
class of programming errors.
Sane behaviour
--------------
Little of the functionality from os.path is reused. Many os.path functions
are tied by backwards compatibility to confusing or plain wrong behaviour
(for example, the fact that ``os.path.abspath()`` simplifies ".." path
components without resolving symlinks first).
Comparisons
-----------
Paths of the same flavour are comparable and orderable, whether pure or not::
>>> PurePosixPath('a') == PurePosixPath('b')
False
>>> PurePosixPath('a') < PurePosixPath('b')
True
>>> PurePosixPath('a') == PosixPath('a')
True
Comparing and ordering Windows path objects is case-insensitive::
>>> PureWindowsPath('a') == PureWindowsPath('A')
True
Paths of different flavours always compare unequal, and cannot be ordered::
>>> PurePosixPath('a') == PureWindowsPath('a')
False
>>> PurePosixPath('a') < PureWindowsPath('a')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: PurePosixPath() < PureWindowsPath()
Paths compare unequal to, and are not orderable with instances of builtin
types (such as ``str``) and any other types.
Useful notations
----------------
The API tries to provide useful notations all the while avoiding magic.
Some examples::
>>> p = Path('/home/antoine/pathlib/setup.py')
>>> p.name
'setup.py'
>>> p.suffix
'.py'
>>> p.root
'/'
>>> p.parts
('/', 'home', 'antoine', 'pathlib', 'setup.py')
>>> p.relative_to('/home/antoine')
PosixPath('pathlib/setup.py')
>>> p.exists()
True
Pure paths API
==============
The philosophy of the ``PurePath`` API is to provide a consistent array of
useful path manipulation operations, without exposing a hodge-podge of
functions like ``os.path`` does.
Definitions
-----------
First a couple of conventions:
* All paths can have a drive and a root. For POSIX paths, the drive is
always empty.
* A relative path has neither drive nor root.
* A POSIX path is absolute if it has a root. A Windows path is absolute if
it has both a drive *and* a root. A Windows UNC path (e.g.
``\\host\share\myfile.txt``) always has a drive and a root
(here, ``\\host\share`` and ``\``, respectively).
* A path which has either a drive *or* a root is said to be anchored.
Its anchor is the concatenation of the drive and root. Under POSIX,
"anchored" is the same as "absolute".
Construction
------------
We will present construction and joining together since they expose
similar semantics.
The simplest way to construct a path is to pass it its string representation::
>>> PurePath('setup.py')
PurePosixPath('setup.py')
Extraneous path separators and ``"."`` components are eliminated::
>>> PurePath('a///b/c/./d/')
PurePosixPath('a/b/c/d')
If you pass several arguments, they will be automatically joined::
>>> PurePath('docs', 'Makefile')
PurePosixPath('docs/Makefile')
Joining semantics are similar to os.path.join, in that anchored paths ignore
the information from the previously joined components::
>>> PurePath('/etc', '/usr', 'bin')
PurePosixPath('/usr/bin')
However, with Windows paths, the drive is retained as necessary::
>>> PureWindowsPath('c:/foo', '/Windows')
PureWindowsPath('c:/Windows')
>>> PureWindowsPath('c:/foo', 'd:')
PureWindowsPath('d:')
Also, path separators are normalized to the platform default::
>>> PureWindowsPath('a/b') == PureWindowsPath('a\\b')
True
Extraneous path separators and ``"."`` components are eliminated, but not
``".."`` components::
>>> PurePosixPath('a//b/./c/')
PurePosixPath('a/b/c')
>>> PurePosixPath('a/../b')
PurePosixPath('a/../b')
Multiple leading slashes are treated differently depending on the path
flavour. They are always retained on Windows paths (because of the UNC
notation)::
>>> PureWindowsPath('//some/path')
PureWindowsPath('//some/path/')
On POSIX, they are collapsed except if there are exactly two leading slashes,
which is a special case in the POSIX specification on `pathname resolution`_
(this is also necessary for Cygwin compatibility)::
>>> PurePosixPath('///some/path')
PurePosixPath('/some/path')
>>> PurePosixPath('//some/path')
PurePosixPath('//some/path')
Calling the constructor without any argument creates a path object pointing
to the logical "current directory" (without looking up its absolute path,
which is the job of the ``cwd()`` classmethod on concrete paths)::
>>> PurePosixPath()
PurePosixPath('.')
.. _pathname resolution: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap04.html#tag_04_11
Representing
------------
To represent a path (e.g. to pass it to third-party libraries), just call
``str()`` on it::
>>> p = PurePath('/home/antoine/pathlib/setup.py')
>>> str(p)
'/home/antoine/pathlib/setup.py'
>>> p = PureWindowsPath('c:/windows')
>>> str(p)
'c:\\windows'
To force the string representation with forward slashes, use the ``as_posix()``
method::
>>> p.as_posix()
'c:/windows'
To get the bytes representation (which might be useful under Unix systems),
call ``bytes()`` on it, which internally uses ``os.fsencode()``::
>>> bytes(p)
b'/home/antoine/pathlib/setup.py'
To represent the path as a ``file:`` URI, call the ``as_uri()`` method::
>>> p = PurePosixPath('/etc/passwd')
>>> p.as_uri()
'file:///etc/passwd'
>>> p = PureWindowsPath('c:/Windows')
>>> p.as_uri()
'file:///c:/Windows'
The repr() of a path always uses forward slashes, even under Windows, for
readability and to remind users that forward slashes are ok::
>>> p = PureWindowsPath('c:/Windows')
>>> p
PureWindowsPath('c:/Windows')
Properties
----------
Several simple properties are provided on every path (each can be empty)::
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
>>> p.drive
'c:'
>>> p.root
'\\'
>>> p.anchor
'c:\\'
>>> p.name
'pathlib.tar.gz'
>>> p.stem
'pathlib.tar'
>>> p.suffix
'.gz'
>>> p.suffixes
['.tar', '.gz']
Deriving new paths
------------------
Joining
^^^^^^^
A path can be joined with another using the ``/`` operator::
>>> p = PurePosixPath('foo')
>>> p / 'bar'
PurePosixPath('foo/bar')
>>> p / PurePosixPath('bar')
PurePosixPath('foo/bar')
>>> 'bar' / p
PurePosixPath('bar/foo')
As with the constructor, multiple path components can be specified, either
collapsed or separately::
>>> p / 'bar/xyzzy'
PurePosixPath('foo/bar/xyzzy')
>>> p / 'bar' / 'xyzzy'
PurePosixPath('foo/bar/xyzzy')
A joinpath() method is also provided, with the same behaviour::
>>> p.joinpath('Python')
PurePosixPath('foo/Python')
Changing the path's final component
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ``with_name()`` method returns a new path, with the name changed::
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
>>> p.with_name('setup.py')
PureWindowsPath('c:/Downloads/setup.py')
It fails with a ``ValueError`` if the path doesn't have an actual name::
>>> p = PureWindowsPath('c:/')
>>> p.with_name('setup.py')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 875, in with_name
raise ValueError("%r has an empty name" % (self,))
ValueError: PureWindowsPath('c:/') has an empty name
>>> p.name
''
The ``with_suffix()`` method returns a new path with the suffix changed.
However, if the path has no suffix, the new suffix is added::
>>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz')
>>> p.with_suffix('.bz2')
PureWindowsPath('c:/Downloads/pathlib.tar.bz2')
>>> p = PureWindowsPath('README')
>>> p.with_suffix('.bz2')
PureWindowsPath('README.bz2')
Making the path relative
^^^^^^^^^^^^^^^^^^^^^^^^
The ``relative_to()`` method computes the relative difference of a path to
another::
>>> PurePosixPath('/usr/bin/python').relative_to('/usr')
PurePosixPath('bin/python')
ValueError is raised if the method cannot return a meaningful value::
>>> PurePosixPath('/usr/bin/python').relative_to('/etc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pathlib.py", line 926, in relative_to
.format(str(self), str(formatted)))
ValueError: '/usr/bin/python' does not start with '/etc'
Sequence-like access
--------------------
The ``parts`` property returns a tuple providing read-only sequence access
to a path's components::
>>> p = PurePosixPath('/etc/init.d')
>>> p.parts
('/', 'etc', 'init.d')
Windows paths handle the drive and the root as a single path component::
>>> p = PureWindowsPath('c:/setup.py')
>>> p.parts
('c:\\', 'setup.py')
(separating them would be wrong, since ``C:`` is not the parent of ``C:\\``).
The ``parent`` property returns the logical parent of the path::
>>> p = PureWindowsPath('c:/python33/bin/python.exe')
>>> p.parent
PureWindowsPath('c:/python33/bin')
The ``parents`` property returns an immutable sequence of the path's
logical ancestors::
>>> p = PureWindowsPath('c:/python33/bin/python.exe')
>>> len(p.parents)
3
>>> p.parents[0]
PureWindowsPath('c:/python33/bin')
>>> p.parents[1]
PureWindowsPath('c:/python33')
>>> p.parents[2]
PureWindowsPath('c:/')
Querying
--------
``is_relative()`` returns True if the path is relative (see definition
above), False otherwise.
``is_reserved()`` returns True if a Windows path is a reserved path such
as ``CON`` or ``NUL``. It always returns False for POSIX paths.
``match()`` matches the path against a glob pattern. It operates on
individual parts and matches from the right:
>>> p = PurePosixPath('/usr/bin')
>>> p.match('/usr/b*')
True
>>> p.match('usr/b*')
True
>>> p.match('b*')
True
>>> p.match('/u*')
False
This behaviour respects the following expectations:
- A simple pattern such as "\*.py" matches arbitrarily long paths as long
as the last part matches, e.g. "/usr/foo/bar.py".
- Longer patterns can be used as well for more complex matching, e.g.
"/usr/foo/\*.py" matches "/usr/foo/bar.py".
Concrete paths API
==================
In addition to the operations of the pure API, concrete paths provide
additional methods which actually access the filesystem to query or mutate
information.
Constructing
------------
The classmethod ``cwd()`` creates a path object pointing to the current
working directory in absolute form::
>>> Path.cwd()
PosixPath('/home/antoine/pathlib')
File metadata
-------------
The ``stat()`` returns the file's stat() result; similarly, ``lstat()``
returns the file's lstat() result (which is different iff the file is a
symbolic link)::
>>> p.stat()
posix.stat_result(st_mode=33277, st_ino=7483155, st_dev=2053, st_nlink=1, st_uid=500, st_gid=500, st_size=928, st_atime=1343597970, st_mtime=1328287308, st_ctime=1343597964)
Higher-level methods help examine the kind of the file::
>>> p.exists()
True
>>> p.is_file()
True
>>> p.is_dir()
False
>>> p.is_symlink()
False
>>> p.is_socket()
False
>>> p.is_fifo()
False
>>> p.is_block_device()
False
>>> p.is_char_device()
False
The file owner and group names (rather than numeric ids) are queried
through corresponding methods::
>>> p = Path('/etc/shadow')
>>> p.owner()
'root'
>>> p.group()
'shadow'
Path resolution
---------------
The ``resolve()`` method makes a path absolute, resolving any symlink on
the way (like the POSIX realpath() call). It is the only operation which
will remove "``..``" path components. On Windows, this method will also
take care to return the canonical path (with the right casing).
Directory walking
-----------------
Simple (non-recursive) directory access is done by calling the iterdir()
method, which returns an iterator over the child paths::
>>> p = Path('docs')
>>> for child in p.iterdir(): child
...
PosixPath('docs/conf.py')
PosixPath('docs/_templates')
PosixPath('docs/make.bat')
PosixPath('docs/index.rst')
PosixPath('docs/_build')
PosixPath('docs/_static')
PosixPath('docs/Makefile')
This allows simple filtering through list comprehensions::
>>> p = Path('.')
>>> [child for child in p.iterdir() if child.is_dir()]
[PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')]
Simple and recursive globbing is also provided::
>>> for child in p.glob('**/*.py'): child
...
PosixPath('test_pathlib.py')
PosixPath('setup.py')
PosixPath('pathlib.py')
PosixPath('docs/conf.py')
PosixPath('build/lib/pathlib.py')
File opening
------------
The ``open()`` method provides a file opening API similar to the builtin
``open()`` method::
>>> p = Path('setup.py')
>>> with p.open() as f: f.readline()
...
'#!/usr/bin/env python3\n'
Filesystem modification
-----------------------
Several common filesystem operations are provided as methods: ``touch()``,
``mkdir()``, ``rename()``, ``replace()``, ``unlink()``, ``rmdir()``,
``chmod()``, ``lchmod()``, ``symlink_to()``. More operations could be
provided, for example some of the functionality of the shutil module.
Detailed documentation of the proposed API can be found at the `pathlib
docs`_.
.. _pathlib docs: https://pathlib.readthedocs.org/en/pep428/
Discussion
==========
Division operator
-----------------
The division operator came out first in a `poll`_ about the path joining
operator. Initial versions of `pathlib`_ used square brackets
(i.e. ``__getitem__``) instead.
.. _poll: https://mail.python.org/pipermail/python-ideas/2012-October/016544.html
joinpath()
----------
The joinpath() method was initially called join(), but several people
objected that it could be confused with str.join() which has different
semantics. Therefore, it was renamed to joinpath().
Case-sensitivity
----------------
Windows users consider filesystem paths to be case-insensitive and expect
path objects to observe that characteristic, even though in some rare
situations some foreign filesystem mounts may be case-sensitive under
Windows.
In the words of one commenter,
"If glob("\*.py") failed to find SETUP.PY on Windows, that would be a
usability disaster".
-- Paul Moore in
https://mail.python.org/pipermail/python-dev/2013-April/125254.html
Copyright
=========
This document has been placed into the public domain.
..
Local Variables:
mode: indented-text
indent-tabs-mode: nil
sentence-end-double-space: t
fill-column: 70
coding: utf-8