-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-107801: Improve io.*.seek docs and docstrings #107899
Conversation
- Name positional parameters consistently: seek(offset, whence, /) - Add param docstrings to _io._IOBase.seek, so the various implementations interit them. - Use io.SEEK_*, not os.SEEK_* - Override docstrings for subclasses where 'offset' must be zero for SEEK_CUR and SEEK_END.
FTR, this won't backport cleanly. |
the *whence* argument. | ||
A *whence* value of :data:`io.SEEK_SET` measures from the beginning of the file, | ||
:data:`io.SEEK_CUR` uses the current file position, | ||
and :data:`io.SEEK_END` uses the end of the file as the reference point. | ||
*whence* can be omitted and defaults to :data:`!io.SEEK_SET`, | ||
using the beginning of the file as the reference point. :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do find it strange that the tutorial delves into weird low-level APIs like seek
. I can't even remember the last time I needed seek()
for a real use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's reasonable to show that it's possible, especially after covering tell
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if we introduce tell
, surely we must also talk about seek
. But I think an "how to do advanced I/O" is more fitting for these APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, conceptually it's not really advanced nor is it difficult to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept is easy enough to grasp. The APIs are a pain, though.
I'm not sure we should backport the clinic changes; perhaps we should only backport the Doc/ changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an innocuous method, seek
really is a can of worms... I've made some suggestions on wording, and potentially lifting a few items out of docstrings into the documentation. Thanks for doing the hard work here!
A
|
||
.. seealso:: | ||
|
||
:data:`os.SEEK_SET`, :data:`os.SEEK_CUR`, and :data:`os.SEEK_END`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the os
constants were added in 2.5 for the os.lseek
function (and io
's in 2.7 & 3.1 (#48822)). Given the values are the same there's little difference, but should we provide which module's constants you're 'meant' to use in which cases?
Edit: Note that fnctl.lockf
, mmap.seek
and sqlite3.Blob.seek
use the os
constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As os
constants were added earlier, I think we can just refer to them from here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no opinion on which constants to use, but my gut reaction would be to use the constants from io
for io
stuff.
* :data:`SEEK_CUR` -- seek from current stream position; | ||
*offset* may be negative | ||
* :data:`SEEK_END` -- seek from the end of the stream; | ||
*offset* is usually negative | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the pitfalls of seeking with files in text-mode, perhaps IOBase.seek
should link to TextIOBase.seek
, which has more specific information on the limitations here:
.. note:: | |
When working with a text stream :py:meth:`!seek` has limitations | |
beyond the general interface described here, | |
which are described in more detail in :py:meth:`TextIOBase.seek`. |
* :data:`SEEK_SET` -- seek from the start of the stream (the default); | ||
*offset* must either be a number returned by | ||
:meth:`TextIOBase.tell`, or zero. | ||
Any other *offset* value produces undefined behaviour. | ||
* :data:`SEEK_CUR` -- seek from current stream position; | ||
*offset* must be zero, which is a no-operation. | ||
All other values are unsupported. | ||
* :data:`SEEK_END` -- seek from the end of the stream; | ||
*offset* must be zero. | ||
All other values are unsupported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* :data:`SEEK_SET` -- seek from the start of the stream (the default); | |
*offset* must either be a number returned by | |
:meth:`TextIOBase.tell`, or zero. | |
Any other *offset* value produces undefined behaviour. | |
* :data:`SEEK_CUR` -- seek from current stream position; | |
*offset* must be zero, which is a no-operation. | |
All other values are unsupported. | |
* :data:`SEEK_END` -- seek from the end of the stream; | |
*offset* must be zero. | |
All other values are unsupported. | |
* :data:`SEEK_SET` -- seek from the start of the stream (the default); | |
*offset* must either be a number returned by | |
:meth:`TextIOBase.tell`, or zero. | |
Any other *offset* value produces undefined behaviour. | |
* :data:`SEEK_CUR` -- seek from the current stream position; | |
*offset* must be zero, which does not change the stream position. | |
All other values are unsupported, and will raise an exception. | |
* :data:`SEEK_END` -- seek from the end of the stream; | |
*offset* must be zero, which does not change the stream position. | |
All other values are unsupported, and will raise an exception. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is better:
* :data:`SEEK_SET` -- seek from the start of the stream (the default); | |
*offset* must either be a number returned by | |
:meth:`TextIOBase.tell`, or zero. | |
Any other *offset* value produces undefined behaviour. | |
* :data:`SEEK_CUR` -- seek from current stream position; | |
*offset* must be zero, which is a no-operation. | |
All other values are unsupported. | |
* :data:`SEEK_END` -- seek from the end of the stream; | |
*offset* must be zero. | |
All other values are unsupported. | |
* :data:`SEEK_SET` -- seek from the start of the stream (the default); | |
*offset* must either be a number returned by | |
:meth:`TextIOBase.tell`, or zero. | |
Any other *offset* value produces undefined behaviour. | |
* :data:`SEEK_CUR` -- seek from the current stream position; | |
*offset* must be zero, leaving the stream position unchanged. | |
All other values are unsupported, and will raise an exception. | |
* :data:`SEEK_END` -- seek from the end of the stream; | |
*offset* must be zero, leaving the stream position at the end of the stream. | |
All other values are unsupported, and will raise an exception. |
|
||
Return the new absolute position. | ||
Note that not all file objects are seekable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should include this admonition in the documentation for IOBase.seek
(with reference to seekable()
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
|
||
Change the file position and return the new absolute position. | ||
|
||
Offsets relative to the start of the file are usually zero or positive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TextIOWrapper
only the opaque int returned by tell()
is valid -- would that ever be negative?
Returns the new absolute position. | ||
Offsets relative to the start of the stream are usually zero or positive. | ||
Offsets relative to the current and end of file positions must zero; | ||
any other values are unsupported, and will raise OSError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to the documentation for StringIO
, as it is different from the TextIOBase
behaviour (raising OSError
in the stead of UnsupportedOperation
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that that it may be easier to remove the mention of exception type. Not all should be included in docstrings. For more details users can look in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just started working on it. Good thing I didn't advance far before your request for review.
But I noticed few things which you did not change in this PR, so I included them in comments.
@@ -168,6 +168,19 @@ High-level Module Interface | |||
:func:`os.stat`) if possible. | |||
|
|||
|
|||
.. data:: SEEK_SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These constants are defined in os
and io
modules. Instead of duplicating the description, I think that it is better to document them in one place and refer to them from other module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are exposed API in both modules, so in my opinion they should be documented in both io
and os
docs. I think .. seealso::
for cross linking is fine.
|
||
.. seealso:: | ||
|
||
:data:`os.SEEK_SET`, :data:`os.SEEK_CUR`, and :data:`os.SEEK_END`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As os
constants were added earlier, I think we can just refer to them from here.
|
||
* :data:`SEEK_SET` or ``0`` -- start of the stream (the default); | ||
* :data:`SEEK_SET` -- seek from the start of the stream (the default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the literal values. They are standard and widely used. Removing the literal values will make the reader confused when they see file.seek(0, 2)
in someone's other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The text above contains: "offset is interpreted relative to the position indicated by whence." The text in this paragraph describes that position. So if you consider it as a continuation of the former phrase, "seek from" may be not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep the literal values. They are standard and widely used. Removing the literal values will make the reader confused when they see
file.seek(0, 2)
in someone's other code.
Good call. I agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also keep the literal values.
* :data:`SEEK_CUR` -- seek from current stream position; | ||
*offset* may be negative | ||
* :data:`SEEK_END` -- seek from the end of the stream; | ||
*offset* is usually negative |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*offset* is usually negative | |
*offset* is usually zero or negative |
*offset* must be zero (all other values are unsupported). | ||
|
||
Return the new absolute position as an opaque number. | ||
Change the stream position to the given byte *offset*, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not a byte offset.
I suggest also to change the name of the first argument to something other. For example to "cookie" as it is use in the code. Or to "pos" as it was in older documentation. But "offset" is misleading.
Removing references to the name further in the documentation and using "the first argument" instead of the name if it is impossible can also help, because it is hard to came with good name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing docs/docstrings use offset, cookie, and pos. I don't understand what's meant by cookie. I understand what's meant by offset or position. If there is no need to use the same name, there is no need for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think offset is fine, as it is still an offset into the file even if not an integer byte offset. I agree we should stay away from cookie et al.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Cookie" means an opaque value. No, it is not an offset. It is several values (position of the buffer in the file, position in the buffer, state of the encoder and decoder, packed in a 257-bit integer. See code of _pack_cookie()
and _unpack_cookie()
and very complex code of tell()
and seek()
in _pyio.py
.
This PR is needed because semantic of tell()
and seek()
in binary and text files is so different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, yes TextIOWrapper is a different thing. Very unfortunate.
Returns the new absolute position. | ||
Offsets relative to the start of the stream are usually zero or positive. | ||
Offsets relative to the current and end of file positions must zero; | ||
any other values are unsupported, and will raise OSError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that that it may be easier to remove the mention of exception type. Not all should be included in docstrings. For more details users can look in the documentation.
offset as pos: Py_ssize_t | ||
whence: int(c_default='0') = io.SEEK_SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also copy parameter docstrings. Otherwise the function docstring will look incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The param docstrings are rendered to be a part of the docstring during clinic input parsing. It will be part of the whole docstring after make clinic
. The subclasses will inherit this, unless they override the docstring explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. Subclasses "inherit" docstrings only if they do not have their own docstrings.
I put "inherit" in quotes because actually they do not do this. It is pydoc who falls back to looking up docstrings in the parent classes.
Look at the Argument Clinic generated docstrings and you will see that they are incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
docstrings as of this PR:
Seek docs for <class '_io.BufferedRWPair'>:
BufferedRWPair.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
BufferedRWPair.__doc__:
Change the stream position and return the new absolute position.
offset
Offset as byte count.
Relative to the position given by 'whence'.
whence
Relative position, used by 'offset'. Valid values are:
* io.SEEK_SET -- Start of stream (the default)
* io.SEEK_CUR -- Current position
* io.SEEK_END -- End of stream
Offsets relative to the start of the stream are usually zero or
positive, offsets relative to the current stream position may be
zero, positive or negative, and offsets relative to the end of the
stream are usually zero or negative.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io.BufferedRandom'>:
BufferedRandom.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
BufferedRandom.__doc__:
None
Seek docs for <class '_io.BufferedReader'>:
BufferedReader.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
BufferedReader.__doc__:
None
Seek docs for <class '_io.BufferedWriter'>:
BufferedWriter.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
BufferedWriter.__doc__:
None
Seek docs for <class '_io.BytesIO'>:
BytesIO.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
BytesIO.__doc__:
Change stream position and return the new absolute position.
Set the byte offset 'offset', relative to position indicated by 'whence':
Seek docs for <class '_io.FileIO'>:
FileIO.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
FileIO.__doc__:
None
Seek docs for <class '_io.StringIO'>:
StringIO.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
StringIO.__doc__:
Change the stream position and return the new absolute position.
Offsets relative to the start of the stream are usually zero or positive.
Offsets relative to the current and end of file positions must zero;
any other values are unsupported, and will raise OSError.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io.TextIOWrapper'>:
TextIOWrapper.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
TextIOWrapper.__doc__:
Change the file position and return the new absolute position.
Offsets relative to the start of the file are usually zero or positive.
Offsets relative to the current and end of file positions must zero;
any other values are unsupported, and will raise the
io.UnsupportedOperation exception.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io._BufferedIOBase'>:
_BufferedIOBase.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
_BufferedIOBase.__doc__:
Change the stream position and return the new absolute position.
offset
Offset as byte count.
Relative to the position given by 'whence'.
whence
Relative position, used by 'offset'. Valid values are:
* io.SEEK_SET -- Start of stream (the default)
* io.SEEK_CUR -- Current position
* io.SEEK_END -- End of stream
Offsets relative to the start of the stream are usually zero or
positive, offsets relative to the current stream position may be
zero, positive or negative, and offsets relative to the end of the
stream are usually zero or negative.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io._IOBase'>:
_IOBase.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
_IOBase.__doc__:
Change the stream position and return the new absolute position.
offset
Offset as byte count.
Relative to the position given by 'whence'.
whence
Relative position, used by 'offset'. Valid values are:
* io.SEEK_SET -- Start of stream (the default)
* io.SEEK_CUR -- Current position
* io.SEEK_END -- End of stream
Offsets relative to the start of the stream are usually zero or
positive, offsets relative to the current stream position may be
zero, positive or negative, and offsets relative to the end of the
stream are usually zero or negative.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io._RawIOBase'>:
_RawIOBase.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
_RawIOBase.__doc__:
Change the stream position and return the new absolute position.
offset
Offset as byte count.
Relative to the position given by 'whence'.
whence
Relative position, used by 'offset'. Valid values are:
* io.SEEK_SET -- Start of stream (the default)
* io.SEEK_CUR -- Current position
* io.SEEK_END -- End of stream
Offsets relative to the start of the stream are usually zero or
positive, offsets relative to the current stream position may be
zero, positive or negative, and offsets relative to the end of the
stream are usually zero or negative.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
Seek docs for <class '_io._TextIOBase'>:
_TextIOBase.__text_signature__:
($self, offset, whence=io.SEEK_SET, /)
_TextIOBase.__doc__:
Change the stream position and return the new absolute position.
offset
Offset as byte count.
Relative to the position given by 'whence'.
whence
Relative position, used by 'offset'. Valid values are:
* io.SEEK_SET -- Start of stream (the default)
* io.SEEK_CUR -- Current position
* io.SEEK_END -- End of stream
Offsets relative to the start of the stream are usually zero or
positive, offsets relative to the current stream position may be
zero, positive or negative, and offsets relative to the end of the
stream are usually zero or negative.
Some platforms allow seeking beyond the end of a file.
Note that not all file objects are seekable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the Argument Clinic generated docstrings and you will see that they are incomplete.
Of course. Only the _IOBase
docstring will be generated completely with param docstrings inlined, as the PR stands now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps param docstrings are not a good fit for these functions, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All docstrings should be complete. If possible, make them empty, then pydoc will show the docstring from the parent class. Perhaps you only need two docstrings: for binary and text streams.
Offsets relative to the current and end of file positions must zero; | ||
any other values are unsupported, and will raise OSError. | ||
|
||
Some platforms allow seeking beyond the end of a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the behavior of StringIO platform-depending?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is a copy and paste error. Thanks for noticing.
|
||
Some platforms allow seeking beyond the end of a file. | ||
|
||
Note that not all file objects are seekable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is StringIO.
any other values are unsupported, and will raise the | ||
io.UnsupportedOperation exception. | ||
|
||
Some platforms allow seeking beyond the end of a file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not related to TextIOWrapper.
There may be an open issue about unification of parameter names of |
I'll try to find it. I'll also mark this as a draft PR; there is a lot of discussions to land before proceeding with this work. |
1 Current position - pos may be negative; | ||
2 End of stream - pos usually negative. | ||
Returns the new absolute position. | ||
Set the byte offset 'offset', relative to position indicated by 'whence': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to keep a consistent terminology instead of alternating between "offset" and "position". "position" is probably more easily understood IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Position is used to describe the whence param. Would it not be confusing if 'position' is used to describe both parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the summary string does say "Change stream position and return the new absolute position". In the next line "stream position" suddenly becomes "byte offset".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this could also be "Set the stream position to the byte 'offset', relative to ...".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set the byte offset 'offset', relative to position indicated by 'whence': | |
Set the stream position to the byte offset 'offset', relative to the position indicated by 'whence': |
I think this PR tries to do too much. There will be too many scattered discussions. I'll open smaller and more focused PRs; I'll tag you for review. I'll start with the low-hanging fruit. |
implementations interit them.
Useio.SEEK_*
, notos.SEEK_*
or magical numbers.io.SEEK_*
SEEK_CUR and SEEK_END.
📚 Documentation preview 📚: https://cpython-previews--107899.org.readthedocs.build/