Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raises section parsing of Google-style docstrings fails with multiple exceptions #17

Closed
thebigmunch opened this issue Apr 7, 2020 · 12 comments
Labels
bug Something isn't working docstrings Docstrings parsing docstrings-format:google

Comments

@thebigmunch
Copy link

d6561f8 stopped build errors in regards to this. But the results of a raises section with multiple exceptions (like this) end up like:

High-Level_API_-audio-metadata-_Google_Chrome_2020-04-07_19-02-10

@pawamoy
Copy link
Member

pawamoy commented Apr 8, 2020

Oops 😨 I guess some tests are missing. I'll fix this asap and add tests. Thanks for the report!

@pawamoy
Copy link
Member

pawamoy commented Apr 8, 2020

It's working correctly on my end. What version of mkdocstrings and pytkdocs are you using?

@thebigmunch
Copy link
Author

It's working correctly on my end. What version of mkdocstrings and pytkdocs are you using?

Newest releases of each, pytkdocs 0.2.1 and mkdocstrings 0.10.2.

@pawamoy
Copy link
Member

pawamoy commented Apr 8, 2020

Alright, found what's causing the issue.

We get the docstrings with inspect.getdoc, which cleans the docstring with inspect.cleandoc, which in turn expands tabs to spaces with str.expandtabs() with the default value of 8 spaces:

expandtabs(self, /, tabsize=8)
    Return a copy where all tab characters are expanded using spaces.
    
    If tabsize is not given, a tab size of 8 characters is assumed.

Then when the parser processes the docstring, since lines start with 6 or more spaces, it thinks it's a continuation line from the previous item / exception.

I was hoping str.expandtabs would use an environment variable for the tab width, unfortunately there's nothing like that in the docs: https://docs.python.org/3/library/stdtypes.html#str.expandtabs.

A temporary solution to this would be to use spaces instead of tabs in your code, though I know it's a delicate subject 😅

Of course pytkdocs code must be fixed to support these expanded tabs, though I'm not sure how right now. Maybe we could simply stop using inspect.getdoc, pick the docstring through the __doc__ attribute, and simply dedent it, leaving tabs untouched, and update the parser code to handle tabs.

@thebigmunch
Copy link
Author

Then when the parser processes the docstring, since lines start with 6 or more spaces, it thinks it's a continuation line from the previous item / exception.

This seems like a poor way to determine line continuation. Using relative indentation levels between lines would be the proper way to solve this. Indeed, this seems to be how other docstring parsers tend to do it. pytkdocs also seems to assume indentation levels of 4, which is not necessarily the case.

@pawamoy
Copy link
Member

pawamoy commented Apr 9, 2020

Could you explain a bit more how these relative indentation levels between lines would be implemented? I'm all ears for better code but I'm not sure how to tackle this. Maybe you would even like to contribute a PR?

@thebigmunch
Copy link
Author

So, after running through inspect.cleandoc, you'll end up with this:

"""
Load audio metadata from a filepath or file-like object.

Parameters:
    f (str, os.PathLike, or file-like object):
        A filepath, path-like object or file-like object of an audio file.

Returns:
    Format: An audio format object of the appropriate type.

Raises:
    FormatError: If the audio file is not valid.
    UnsupportedFormat: If the audio file is not of a supported format.
    ValueError: If ``f`` is not a valid str, path-like object,
        file-like object, or is unreadable.
"""

It's easy to get the section header through pattern. It has 0 indentation. The next line tells you the indentation level for all items in that section. In this case it's 4, but could be 2 or 6 or whatever indentation increment someone is using. Any indentation less than the section item indentation (empty line, unindented) marks the end of the section. Any line that has the section item indentation level is an item in that section. Any line that has an indentation greater than than the section item indentation is a continuation line.

@pawamoy
Copy link
Member

pawamoy commented Apr 9, 2020

OK thanks, great explanation 🙂

Seems way better indeed. Let's do this!

@pawamoy
Copy link
Member

pawamoy commented Apr 9, 2020

I wonder if we should enforce the same extra indentation for continuation lines:

Allowed:

"""
Raises:
    ValueError: If ``f`` is not a valid str, path-like object,
        file-like object, or is unreadable.
"""

Not allowed, or generates an error:

"""
Raises:
    ValueError: If ``f`` is not a valid str, path-like object,
      file-like object, or is unreadable.
"""

The use-case is that if we want to support more complex markup in items description (instead of concatenating as one line), we know we can simply join lines with \n by removing the first indent * 2 spaces:

current_item_lines.append(line[indent * 2:])

Or we could do a second pass on continuation lines, getting the initial continuation indent level, and remove as much from continuation lines:

current_item_lines.append(line[indent + cont_indent:])

@pawamoy
Copy link
Member

pawamoy commented Apr 9, 2020

Looking at the Google style example from Sphinx's Napoleon extension's documentation, I think I'll go with enforcing the same indentation for continuation lines: 4-space indentation means 8-space continuation indentation. Between 4 and 8 spaces will generate a parsing error but still append the line with leading spaces removed.

What do you think?

@thebigmunch
Copy link
Author

What do you think?

Doesn't bother me for the docstrings I write. But I don't know what others might do/think. So long as it is done relatively, not absolutely, in code and in documentation. That is, not 4 and 8 but section_item_indent and section_item_indent * 2.

@pawamoy
Copy link
Member

pawamoy commented Apr 9, 2020

Of course, 4 and 8 was just an example. For others (me included since I was using 2 spaces for continuation indent until now), it will work exactly the same, but warnings will appear in mkdocs output, asking users to use the same indentation level for continuation lines.

@pawamoy pawamoy added bug Something isn't working docstrings Docstrings parsing labels Apr 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working docstrings Docstrings parsing docstrings-format:google
Projects
None yet
Development

No branches or pull requests

2 participants