Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bring back the warnings when the docstring could not be parsed #196

Closed
Gabriel-p opened this issue Dec 30, 2024 · 8 comments
Closed

Bring back the warnings when the docstring could not be parsed #196

Gabriel-p opened this issue Dec 30, 2024 · 8 comments

Comments

@Gabriel-p
Copy link
Contributor

This issue originated in #122 (comment) which led to this (now closed) issue #139, to which we circled back to in #169 (comment)

Before June 2024 pydoclint used to alert me when it couldn't parse a dostring using the selected style. This was very useful to me because it served as a remainder of functions that did not contain a valid (or any) docstring.

Then the package was updated and these warnings disappeared.

I do not need pydoclint to automatically recognize the style (as requested in #169), I'd just like those warnings back

@jsh9
Copy link
Owner

jsh9 commented Jan 10, 2025

Hi @Gabriel-p , it was probably a bug in pydoclint that was able to inform you that it couldn't parse the docstring. After I fixed that bug, this "ability" disappeared.

What you are asking is quite valid and a good feature, but the difficult thing is: the definition of "could not parse" is vague.

For example, you have a docstring written in the Google style, and you tried to use the numpy docstring parser to parse it, you won't end up with nothing. Instead, you often end up with a non-empty Doc object, leading you to think that this docstring is indeed written in the numpy style.

@Gabriel-p
Copy link
Contributor Author

Gabriel-p commented Jan 11, 2025

I guess the only way to address this that doesn't require some complicated method is to identify the minimum pattern expected for a given style and raise a warning if the docstring fails.

For example, what does pydoclint do if a function contains input arguments and missing or no description and/or typehints for them? As far as I can tell it does nothing. Should this be a warning? Something like "you have undocumented arguments"?

@jsh9
Copy link
Owner

jsh9 commented Jan 11, 2025

Hi @Gabriel-p , what you described is true.

Because for example, if a docstring is written in the numpy style, and we ask the docstring parser to parse it in another style (such as Google), it won't be able to extract the arg list, the return section, etc.

I think the task of detecting which style a docstring is written in falls a bit out of the scope of pydoclint. I think we need another linter to detect the style, and to throw a violation when the detected style is different from the intended style.

There are two ways to potentially design this new linter:

  1. Use a machine learning model to predict the style given a docstring. It will be a 4-class classifier: numpy, Google, Sphinx, unknown
  2. Use docstring_parser_fork to attempt parsing the same docstring with 3 different styles: numpy, Google, Sphinx. Whichever gives the "largest" parsed Docstring object is the style.

@jsh9
Copy link
Owner

jsh9 commented Jan 11, 2025

It's not difficult to do No.2 above, and I can create a new project and publish it quite quickly.

No.1 would require more work:

  • Setting up the machine learning model (either tf-idf based random forest or gradient boosting tree, or TextCNN; no need for anything fancier)
  • Scraping GitHub to collect docstrings of different styles to use as training data

I'm not inclined to do No.1 simply because it's a bit time consuming.

Also I'm not sure about how to name this new linter though. Note that the name "pydocstyle" is already taken.

@jsh9
Copy link
Owner

jsh9 commented Jan 13, 2025

Hi @Gabriel-p , I've published a new version of pydoclint (0.6.0) to add this check. (I didn't end up creating another linter.)

You can read more about this new feature here: https://jsh9.github.io/pydoclint/style_mismatch.html

@jsh9 jsh9 closed this as completed Jan 13, 2025
@Gabriel-p
Copy link
Contributor Author

Upgraded to the latest version and unfortunately it doesn't address the issue. I have many functions that do not follow my specified format (sphinx) and pydoclint gives no warning:

$ pydoclint --style=sphinx .
Skipping files that match this pattern: \.git|\.tox
asteca/__init__.py
asteca/cluster.py
asteca/isochrones.py
asteca/likelihood.py
asteca/membership.py
asteca/modules/__init__.py
asteca/modules/bayesian_da.py
asteca/modules/cluster_priv.py
asteca/modules/fastmp.py
asteca/modules/imfs.py
asteca/modules/isochrones_priv.py
asteca/modules/likelihood_priv.py
asteca/modules/mass_binary.py
asteca/modules/nmembers.py
asteca/modules/synth_cluster_priv.py
asteca/plot.py
asteca/synthetic.py
docs/_build/html/_static/IMF_plot.py
docs/_build/html/_static/asteca_icon.py
docs/_build/html/_static/binary_distr.py
docs/_build/html/_static/q_distr_plot.py
docs/_build/html/_static/triple_systems.py
docs/_static/IMF_plot.py
docs/_static/asteca_icon.py
docs/_static/binary_distr.py
docs/_static/q_distr_plot.py
docs/_static/triple_systems.py
docs/conf.py
tests/__init__.py
🎉 No violations 🎉

All the modules that and in _priv contain do not follow the sphinx format (or any format actually). The old version (before the bug you mentioned was fixed) used to flag these

@jsh9
Copy link
Owner

jsh9 commented Jan 14, 2025

Hi @Gabriel-p , you need to manually turn this feature on: pydoclint --style=sphinx --check-style-mismatch=True .

And I ended up with:

Skipping files that match this pattern: \.git|\.tox
asteca/__init__.py
asteca/cluster.py
asteca/isochrones.py
asteca/likelihood.py
asteca/membership.py
asteca/modules/__init__.py
asteca/modules/bayesian_da.py
<unknown>:152: SyntaxWarning: invalid escape sequence '\s'
asteca/modules/cluster_priv.py
asteca/modules/fastmp.py
asteca/modules/imfs.py
asteca/modules/isochrones_priv.py
asteca/modules/likelihood_priv.py
asteca/modules/mass_binary.py
asteca/modules/nmembers.py
asteca/modules/synth_cluster_priv.py
asteca/plot.py
asteca/synthetic.py
docs/_static/IMF_plot.py
docs/_static/asteca_icon.py
docs/_static/binary_distr.py
docs/_static/q_distr_plot.py
docs/_static/triple_systems.py
docs/conf.py

asteca/membership.py
    150: DOC503: Method `Membership.fastmp` exceptions in the "Raises" section in the docstring do not match those in the function body. Raised exceptions in the docstring: ['ValueError']. Raised exceptions in the body: ['AttributeError', 'ValueError'].

asteca/modules/isochrones_priv.py
    20: DOC003: Function/method `load`: Docstring style mismatch. (Please read more at https://jsh9.github.io/pydoclint/style_mismatch.html ). You specified "sphinx" style, but the docstring is likely not written in this style.

asteca/modules/nmembers.py
    150: DOC003: Function/method `rkfunc`: Docstring style mismatch. (Please read more at https://jsh9.github.io/pydoclint/style_mismatch.html ). You specified "sphinx" style, but the docstring is likely not written in this style.

asteca/modules/synth_cluster_priv.py
    193: DOC003: Function/method `sample_imf`: Docstring style mismatch. (Please read more at https://jsh9.github.io/pydoclint/style_mismatch.html ). You specified "sphinx" style, but the docstring is likely not written in this style.
    365: DOC003: Function/method `properModel`: Docstring style mismatch. (Please read more at https://jsh9.github.io/pydoclint/style_mismatch.html ). You specified "sphinx" style, but the docstring is likely not written in this style.

asteca/synthetic.py
    146: DOC503: Method `Synthetic.calibrate` exceptions in the "Raises" section in the docstring do not match those in the function body. Raised exceptions in the docstring: ['ValueError', 'ValueError']. Raised exceptions in the body: ['ValueError'].

@Gabriel-p
Copy link
Contributor Author

Oh I didn't realise that I had to turn the option on! It works great, thank you very much for the hard work @jsh9!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants