-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorflow Type Stubs #7144
Comments
This seems like a good idea to me. I think there are tools that automatically merge type annotations from
Yes, it is acceptable :)
We usually auto-generate the stubs (see CONTRIBUTING.md for instructions). They are initially very incomplete and full of
Awesome :) |
Trying stubgen on tensorflow, you have to raise timeout as default timeout is too short given libraries size. With a high timeout stubgen crashes,
I think that file is valid python given importing it works fine. So looks like a mypy bug (mypy also crashes on that file). I'll open a bug report for mypy later. After tweaking file a bit I was able to get stubs. I'd recommend not using generated stubs from past experience using generated tf stubs (in past pyright ones). The generated stubs look better today, but still have some key issues I encountered before. Tensorflow does a lot of import/method patching magic that even with Any used still produces many false positive errors. Most tensorflow types are not defined where they are publicly exported. Instead they're defined in private/internal files. An example is tensorflow Layer class is defined in keras.engine.base_layer. keras.engine is not a public part of the api. Instead it's defined like, @keras_export('keras.layers.Layer')
class Layer(tf.Module, version_utils.LayerVersionSelector):
... and the decorator (keras_export or tf_export) dynamically exports the class to a different public location. Most tensorflow objects rely on a decorator to modify how imports work. If you generate stubs, the stubs will document the private api matching the source code, but will not describe the public api/documentation properly and most classes will be defined in internal location instead. The worst case is occasionally an object is re-exported to a different module that doesn't even exist in the source code and the module is only dynamically defined. An example is this class. At runtime it is available as tf.initializers.Zero, but generated stubs lack that definition since it's only there from decorator magic. This also brings one question of should I even have stubs document internal api. I would prefer not to as there's no backwards compatibility expectations of importing undocumented paths. Instead I plan to have stubs match public documentation. Almost all of generated stubs are internal files. A secondary challenge with stub generation is tensorflow uses some dynamic method definition. Several basic methods are not defined in class normally, but monkey patched elsewhere including basic ones like A couple design questions before I start. These cover main things I've experienced so far working on these stubs,
2-4 are about type hinting tensors. If unfamiliar with tensorflow, tensors in tensorflow are very similar to arrays in numpy and I expect numpy has same challenges here.
There are a couple more design questions I've encountered, but smaller design issues I think prs can discuss them. A couple people that may be interested in this or their advice would be helpful, @BvB93, @mdanatg, @acvander, @shuttle1987, and @pradeep90. I've included people I know that have worked on tensor types before even if not tensorflow (assuming numpy/pytorch challenges will overlap). |
It has been quite some time since I looked at the type stubs situation with tensorflow but I clearly remember the issues with the imports being a major challenge. It is highly likely others have more up to date knowledge of this than me but I'll share what I did before in case it's helpful. What I had considered doing for the project I was working on was to make a code generation script that went extracted the symbols created by the decorators like |
Instead of generating new stubs, it would be better to move existing stubs to typeshed, if that's possible. And again, it doesn't matter if the stubs are outdated or incomplete at first :) |
Internally, we generated some type stubs with pytype; I don't know if those are exported (they might be for Keras, though). Making Tensor a generic class is blocked by the C++ implementation of EagerTensor. I think it'd unlock reasonable dtype-generic types, at least for the core APIs. Shapes are more complicated, in many instanced they'd quickly collapse to Shape[Any], I think. |
There are some limitiations, but mypy does currently support recursive protocols as outlined in python/mypy#731 (comment). I'd say that this approach has worked quite satisfactory in numpy thus far.
I would recommend this approach as well. While the likes of dtype (and to a lesser extent shape)-typing-support are highly valuable, I can tell from experience that it adds a notable layer of additional complexity (i.e. overloads). There is also the practical issue that many type-checkers currently do not deal well with combinations of
It could be helpful here to define (and potentially expose) a common alias for array-like objects as accepted by tensorflow, similar to what was done with |
There is https://www.tensorflow.org/api_docs/python/tf/types/experimental/TensorLike. It probably needs a few extra items. |
TensorLike there is an incomplete type alias. It does not specify what arguments of list/tuple. So that type is list[Any] and tuple[Any, ...]. Specifying types is also problematic from experience. list[X] is a very inconvenient type because of covariance/invariance rules. list[float | str | int] is incompatible with list[int]. This invariance makes list types a pain to work with. My current leaning here is the type should be Sequence[TensorLike]. Which is broader then the truth as not all Sequence[X] are valid in tensorflow, but list[X] is also a bad choice when tensorflow functions act covariantly on X, but there's no way to tell type system that. Ideal solution would be adjust convert_to_tensor function to work with any sequence (convert non-list/tuple to list) so types match implementation. I currently plan to use Sequence[X] for stubs to avoid invariance problems. The rest of that alias I have been using successfully. Looking at persephone-tools/tensorflow-stubs, they are smaller then my current stubs and they also document tensorflow v1 api. tensorflow did a large backwards incompatible change to v2 so a fair number of those stubs are no longer true. I will read over them more closely to see if I can re-use parts. On imports I lean manual is good enough for a first attempt. So far manually reading aliases in each docs page has been working for me. We can improve them later if needed. If we did generate imports I'd likely generate them from parsing public docs. |
I think this will is blocked by this issue, #5768. tensorflow has a heavy type dependency on numpy and looks like it's not yet possible to have stubs here depend on non-stub packages. |
While I very much appreciate you working on #5952 , typing all of tensorflow is already a lot to chew on. If you ever end up feeling like you're going down a rabbit hole, just wanted to say that typeshed maintainers are usually happy to help with stubs that live outside of typeshed, so hacking on some stubs in another repo and moving them here later is a viable option :-) |
Main value to typeshed is I think visibility. I'm happy to make a public repo with my draft tensorflow stubs I'm just worried if I made them few people would ever make a pr to them to assist. Here I expect initial prs, I'll work on, but am hoping after getting basic types usable that others will also contribute. I'll probably continue to do occasional prs, but most of my motivation come from my full-time job working on a library built heavily on top of tensorflow. Small relative to tensorflow but still like O(10k) lines and most of our type issues are because of tensorflow. So I would consider it victory once my team's codebase passes --strict and be happy to leave less common parts of tensorflow to other contributors. There's a secondary bonus of I can rely on the CI typeshed already has for checking and uploading new stubs vs copying that over. So for now I'm thinking try and see what progress I can make on #5952. Probably a good idea to just upload my existing tf stubs somewhere public too in case anyone wants to use them as rough draft stubs. |
cc @mrahtz who had created some stubs for tensorflow here and might have useful advice for non-variadic stubs (since you said you don't want to use PEP 646 shape types just yet).
Pyre supports recursive aliases. |
I see there is a start here now with #8974. I am unsure if this is a typeshed or a pylance problem, but with pylance v2023.2.20 the first tensorflow stubs are included from here and after that almost no autocompletion works anymore. I don't know if this is a typeshed or a pylance problem, so I wanted to ask here if someone can point me in the right direction, if an issue here or at pylance would be useful? |
The typeshed tensorflow stubs are very incomplete. Pylance by default infers types from library implementation if there are no stubs. If there are stubs it uses them. So existence of some stubs is causing pylance to not inspect rest of implementation. The main trade off with inferring stubs from implementation is that it will be less accurate then actual stubs. For auto completion the errors don’t matter, but if you enable type errors with pyright then you will get false positives with tensorflow relying on inferred types. I think this is more an issue for pylance and maybe merge inferred types from implementation with stubs available. Over time stubs should get more filled out here and that will help but I’d be very surprised if stubs had majority of common methods/classes in a month. If there are specific methods/classes you commonly use and want to write stubs for I’d be happy to review that pr here. |
I have understood that so far, I think. My question was aimed at the fact that pyright no longer looks at the implementation as soon as stubs are present. I know myself unfortunately too little with the details of typeshed so far. With the pandas stubs there was something like partial, where I think existing stubs were used and what was missing I think came from the actual code, but that may also have been a https://github.com/microsoft/python-type-stubs quirk. I use pyright strict typing in my projects and have a lot pyright: ignore for tensorflow currently. I think it's very good that there is progress in this area. It just restricts because autocompletion and at least seeing the untyped parameters is an essential tool for me to work with. But then I will create an issue at pylance indicating that this is an issue during the transition and it hopefully will be resolved before there are issues at Tensorflow itself that they broke autocompletion again like (tensorflow/tensorflow#56231). I would like to support, but unfortunately I am completely occupied in the near future. Currently I use mainly tf.keras... and tf.data.... |
That's helpful. I'll prioritize tf.keras/data folders. For tf.keras I'll cover main common base classes first (like here for defining Layer/Initializer/etc) and it should be easier to fill in any specific subclasses you need. Stub wise often for specific layer (dense, batch norm, etc)/optimizer only thing needed is constructor signature. |
idk if we have a way to mark stubs as partial in typeshed (I mean add the Also make sure that you have Finally, any stubs found on https://github.com/microsoft/python-type-stubs will give you different results between Pylance and pyright. As they are included in the former, but not the latter. (tensorflow isn't one of them, so that doesn't apply here) |
Stub packages currently do not have py.typed file as they don't require one. Checking with pylance issue it also looks like it wouldn't make a difference if it had partial. So while adjusting stub uploader to add partial py.typed sounds fine, I don't think it would help on autocompletion issue. |
If I understand PEP 561 correctly, typeshed is the last fallback and the PEP does not anticipate partial stubs on typeshed. |
We now have tensorflow stubs in typeshed, thanks to @hmc-cs-mdrissi's diligent work! I'm going to close this as completed for now. If anybody has any specific issues with our tensorflow stubs, please feel free to open a followup issue. |
I'd prefer moving the discussion to a new issue. That being said, I wonder if pyright is picking up |
I would be interested in contributing and working on typestubs for tensorflow. It does have a fairly massive api though and am unsure whether it would be reasonable to have them start at typeshed and then eventually go back to tensorflow one day when they decide to resolve this issue.
So first question is would working on them here be acceptable? If yes, what would be recommended way to start them? Start with a partial stub and just work on getting directory structure reasonable with a lot of
__getattr__(name: str) -> Any
? I think other main thing would be common type aliases that will be used for most functions. I already have been working on stubs for tensorflow for work usage and could start by cutting up that into small prs.The text was updated successfully, but these errors were encountered: