-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use autoapi for the internal reference documentation #22105
Comments
I think we might run into issues with docstrings that are not yet in Sphinx and will fail the build. For testing, we should try to do this for just one file or submodule just to see what it looks like so we avoid issues with trying to pull in the entire codebase. |
I opened a PR that exercises one submodule see #22214 |
Another option is to use sphinx.ext.autosummary with the recursive option. That is what is described in one of the stackoverflow answers. I created a file with
and it generated a file that looks fine to me. It also didn't have any build errors. Furthermore, based on the SO answer, it looks like autosummary is easier to configure, since you can modify the template. And it doesn't involve running any commands to generate rst files. It is itself just an rst directive. So if we need to do something special for some specific module for some reason, it will be easy to do. |
See sphinx-doc/sphinx#7912, and the template in the SO question. We have to modify the template to get it to actually generate files for each function. Otherwise it will just link to the existing docstrings in the public section, which we don't want. |
Maybe we can copy the template files from https://github.com/JamesALeedham/Sphinx-Autosummary-Recursion |
It turns out you need to use .. autosummary::
:toctree: _autosummary
:recursive:
sympy.calculus
to get it to generate the separate pages. And if you want it to use the template, you should use .. autosummary::
:toctree: _autosummary
:template: custom-module-template.rst
:recursive:
sympy.calculus Without using the template from that GitHub repo, I get build errors. I'm not exactly sure what all that template is doing, so we should take a closer look at it. |
I'm curious what will be labeled as "private". SymPy doesn't really have much private code, i.e. most all of SymPy has always been importable with no notion of privateness. If things are now going to be called private, I think it is most appropriate to have a fairly long deprecation cycles to transform the names of private classes, methods, variables, functions by preprending underscores. Leading underscores are the only softly accepted method of signalling to users not to rely on that functionally. Suddenly calling some things private in the docs and thus allowing people to change those things without deprecation is only going to cause downstream frustration. |
"Private" may not be the best name for this anymore. We originally wanted a way to include internal functions in Sphinx, but ended up with a design that includes everything, without discrimination. There is also "public" which we want to make only include things that are to be considered public API going forward. Having better documentation on what is and isn't private is definitely something we want to improve. I don't think we can rely solely on underscores. At any rate, this issue isn't really about deciding which is which. It's just about creating the sections in the documentation so that we can start to organize things. I thought there was another issue about actually determining what is and isn't public, but I can't find it right now. It will be a very nontrivial process because as you say, quite a few things that probably ought to be internal are being used by users. |
Ok, good to know. I'd avoid the name "private" for now. Having a definitive statement about what is and isn't private going forward needs more discussion and careful attention. |
The idea is that we should be able to document everything without it implicitly seeming public. Whether or not to document the function is a separate decision from whether or not the function is private. Right now it isn't easy to make that distinction in the docs. With this PR absolutely everything will be documented in the "private" section of the docs but those will show up separately from the public API section. Maybe "internal" is a better word than "private". I do think though that by default everything should not be public unless an explicit decision is made to make something public. |
We should actually indicate somehow that the functions documented here might be private. Otherwise we will just worsen the current situation, where people will find some internal function via Google and assume it is public. Maybe each page in this section should have a header explaining that the functions documented there might be internal and not guaranteed to have a public stable API. If we ever make a cleaner delineation between public and private we can make the header more specific. |
That's why I like putting everything under a "private" directory so that the word private is clearly in the URL. Also we can add a header to the top of every page in the private API docs like "this is internal documentation and includes things that are mostly not public API". The private docs include everything though so they would also include some things that are public so maybe that's confusing. Perhaps something like "internal" is better than "private". Also as you say there should be something somewhere near the top of the docs hierarchy that explains in general terms how to distinguish public from private and what that means in terms of backwards compatibility guarantees. Whatever happens we need to have a much clearer distinction between something that happens to have documentation and something that is public so that we can document internal code but also have a clear explanation for contributors of how to manage backwards compatibility when making changes in the codebase. |
It would be helpful to have it in the URL, but that isn't sufficient. Most people won't even look at the URL. Some browsers such as mobile browsers hide the URL by default. Another thing we can do is use a completely different sphinx theme, so that it looks different from the docs site. I've seen this used effectively in other projects, for example, the h5py public documentation vs. the h5py low-level API reference. Since it looks like we are going to want to make this a separate sphinx build anyway, this should be straightforward to do. |
Yeah, maybe we can make the public docs look shiny and polished and have the internal docs look all screwy so that you know just by looking at it that you're possibly in the wrong place :) |
My understanding of public and private API is this: public: modules, variables, functions/methods, and classes that users and downstream users make use of that should be stable and follow deprecation procedures if it needs to be changed/removed in backwards compatible ways private: modules, variables, functions/methods, and classes that users and downstream libraries should generally not make use of, but they can use if they are willing to deal with non-deprecated changes For the 15+ years of SymPy, most every variable, function/method, and class has tacitly fallen under "public" as there are few to no explicit indicators of privateness otherwise. We do have these existing soft indicators of privateness:
I don't see how suddenly labeling anything that has tacitly been public for 15+ years as private is a good idea. If code fits one of the four items in the above list, then it seems reasonable to explicitly label as private right now with no deprecation. But otherwise, we should deprecate everything that the community wants to label as private so that downstream users and libraries get a fair warning on this change. Secondly, SymPy's notion of private/public needs to be defined. That should be the first thing done in a new documentation page regarding this. |
You consider that most things are tacitly public. I consider that most things are tacitly private. This is the problem with not being clear about these things. One of your indicators of publicness is if the docstring has been included in the sphinx docs but it should be possible to include something in the docs without implicitly marking it as public: many internal functions are documented and that is a good thing. That is the problem that this PR aims to solve by having a place in the docs that is not implicitly public. |
I consider them tacitly public because they are public and there is nothing that has warned to prevent people from using these in their downstream code. You can't really consider them private if people have been using them in a public way for 15+ years. It isn't just my opinion, it's that we have hundreds of thousands of users that have used these things.
I have nothing against this. Sure, we can setup a new documentation paradigm that clearly labels things as private and public (with definitions of private/public) and use that moving forward. But we can't undo that people have treated almost all of SymPy as public over the course of its history. |
That's why we propose to begin fixing that by making it as clear as possible now in the docs which things should be considered public. For the most part this is clear from docs already if we take "documented" as meaning "public unless otherwise stated" and "undocumented" as meaning "private". The difficulty is being clear about what "otherwise stated" means and it's a problem that nothing anywhere explains unambiguously how a user (or contributor) should understand what is public and private. The other problem is that some things that are or should be public are under-documented. Let's take some examples to make this more explicit: These are the ode module docs: Another example: In both of those cases what you can also see is that we would end up having much better public and private docs by separating them. The public docs themselves would be easier to navigate for users if they only list the small number of public APIs. The private docs would be automatically generated with the same structure as the codebase making them easier for contributors to navigate. |
I'm not sure how you can declare that anything on those two pages are private. I think we have a different understanding of this, maybe it's tied to user vs. developer view. If there was nothing to warn people from using a SymPy function/module/class in the past, then I see that as public. Why? Because people use those things and they are out in user-land and downstream library code and SymPy has never made explicit declarations of public/private code. I am not against having public and private defined and organized in clear ways. My point is that I think that we should deprecate any existing API that we might want to label as "private" moving forward. Users should at least be warned that API they rely on can no longer be relied on. Scientific Python codes are not strict enough on stable API and this is extremely frustrating for users. I'd like SymPy to make a strong stance here and be strict about our API stability. Labeling a large chunk of existing public API as private with no warning to users is not best practice. |
This is the warning: it is the clarification in the docs about what is public and what is private. Clarifying that in the docs does not mean that any downstream code suddenly breaks. |
Jason I think we are agreeing with each other. SymPy should take a strong stance against breaking API changes, and we've made an effort to do that with deprecations. Clarifying what is and isn't public API helps this goal. Currently, some functions are ambiguous whether they are public or private. That creates a tension for development. Developers like you who consider things as public by default will not want to make breaking changes to them. Developers like Oscar who consider things private might feel a breaking change is fine. Both situations have disadvantages. If a function is treated as public, its API cannot be changed. That can make the code harder to expand and refactor. And obviously on the flip side, if a function is treated private and changed, but is actually used by users, their code will break. I think we should be clear about a few things
|
This serves primarily as a demonstration of autosummary. If we like the way this looks, in both the RST and final HTML, we can look at using it everywhere. Some notes here: - You have to manually add :toctree: ref to each autosummary to tell it where to put the autogenerated files. You should always use the name "ref" for this for consistency, but you can use something like :toctree: ../ref to put it on level up. - The classes I have documented here really need to be organized in a better way. However, I have not done that in this PR. An advantage of autosummary is that we can move where a class is documented in the organization without breaking the URL to it, since it lives on a separate page. - This will break any existing links to specific anchors on this page. - The class template hardcodes :members:. However, it would be easy to use other combinations by adding additional templates, although each autosummary list can only use one template at a time. - The docs for classes still list each method on the same page. This is because autosummary is not really capable of doing recursive autosummary on class members in a way that only includes the members we want to include. See the commit message of the previous commit. - Autosummary is very particular about how you spell the name of the function so that it can find it. I recommend avoiding `.. module::` or `.. currentmodule::` and just spelling everything explicitly, using `~` before the name to truncate the module name for public names (eventually, we will have *only* public names in this part of documentation, see sympy#22105, but this is a separate issue to clean this up). - When building, be aware that files for the autosummary are generated automatically, but are only cleaned when you do a `make clean`. Thus if you add something to an autosummary and then delete it, you will get warnings in the build from the generated file that no longer is referenced anywhere.
Need to investigate https://pypi.org/project/sphinxcontrib-apidoc/ |
@nanjekyejoannah didn't mention it, but we will have a section "internal reference" section which will document all functions in SymPy, both public and private, and will be organized completely based on the code structure. We can use a tool like autoapi and autosummary to generate this automatically (see https://stackoverflow.com/questions/2701998/sphinx-autodoc-is-not-automatic-enough/).
The text was updated successfully, but these errors were encountered: