Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Provide an API to get the type of the expressions in the AST #4868

Open
sk- opened this issue Apr 6, 2018 · 25 comments
Open

Comments

@sk-
Copy link

sk- commented Apr 6, 2018

Knowing the type of the expressions in the AST is useful for IDEs (and IDE plugins) to support autocompletion, for static analyzers and for refactoring tools. As a matter of fact, both jedi and pylint have their own heuristics for type inference.

In my specific use case I would like to use the type information to write a safe refactoring tool. The idea is to be able to say that you want to refactor specific methods. For example one could want to refactor string.find into string.index, as:

pos = expr.find(x)
if  pos >= 0:
  # do something with pos
else:
  # do something else

where expr is any string expression, like 'string', string_var, (var + 'foo'), string_var.replace(' ', '-'), etc.

into

try:
  pos = var.index(x)
  # do something with pos
except ValueError:
  # do something else

To safely do this refactoring, one needs to be able to query the type of sub expressions, given their location in the source file. Otherwise, given that find is a common name present in many different classes, the refactoring, would blindly be applied to all of them.

Note: this is the feature request version of issue #4713.

@kamahen
Copy link
Contributor

kamahen commented Apr 8, 2018

If you're doing refactoring, wouldn't it be better to use lib2to3's AST? I've got something that resolves the non-dynamic names in lib2to3's AST and am (slowly) working on resolving the dynamic names (e.g., y in x = MyClass(); x.y) and imported names (just a "small matter of programming").

@danthedaniel
Copy link

danthedaniel commented Jun 10, 2018

I'm interested in this - particularly for convenient inspection of source code in an IDE. Most other typed languages can support "type reveal on hover" for expressions.

@gvanrossum
Copy link
Member

gvanrossum commented Jun 11, 2018 via email

@kamahen
Copy link
Contributor

kamahen commented Jul 11, 2018

Do you need an annotated AST, or is it sufficient to have the tokens in the file mapped to fully qualified names (FQNs) and types? I have some code for mapping all the tokens to FQNs and partial code for their type information (the main thing missing is for imports, which I'm working on but it's summer and I'm doing it in my spare time).

@sk-
Copy link
Author

sk- commented Jul 16, 2018

@kamahen What do you mean by tokens? I'm guessing you refer to names/bindings or do you refer to the tokens as outputted by the tokenize module.

That would still be very helpful, as it would allow to easily refactor calls to a modules without having to make assumptions on how it is imported.

@kamahen
Copy link
Contributor

kamahen commented Jul 16, 2018

@sk- Yes, it's more-or-less the tokens as output by the tokenize module, although I use lib2to3, which has its own tokenizer. My output is a simplified AST with the tokens and fully qualified names. It would probably be easy to merge this information back into the original AST by doing a simple tree traversal of the AST while progressing through the list of tokens with their FQNs.)

Eventually, I hope to have inferred types with all the tokens, but that code is currently missing some features, such as proper handling of import.

If you want to play around with my code, I can give you my latest version (which isn't yet on github).

Are you planning on using the AST in lib2to3, which has the source location information in its AST, or something else?

@sk-
Copy link
Author

sk- commented Jul 18, 2018

@kamahen That'd be perfect, as I'm also using lib2to3.

I'd be happy to play with the code you have so far.

@kamahen
Copy link
Contributor

kamahen commented Jul 18, 2018

OK, let me get things into a slightly better shape, then I'll send it to you (or put it on github if it's not too awful). This week is rather busy, but hopefully some time next week.

There are two parts of the code -- the first produces a simplified AST with fully qualified names (currently, it outputs in JSON); the second takes that simplified AST and figures out how to resolve . operations (which is what you want for your smarter refactoring). Most of the resolving logic is done, except for handling imports. The output is also JSON, in a somewhat unfriendly format. But that can be easily changed.

(BTW, the most expensive part of the code seems to be the JSON marshaling/unmarshaling).

@kamahen
Copy link
Contributor

kamahen commented Aug 1, 2018

I've pushed an interim version of my code to https://github.com/kamahen/pykythe

Its outputs will take some explaining, so (assuming you want to play with it), I suggest you follow the setup instructions and run the test (make all_tests all_test2). At that point, I can tell you what to look at and how to interpret the outputs (and, if you wish, I can probably produce the outputs in a different and easier to use format ... for example, if you only want to use the fully-qualified name outputs, there's a simpler way of running the code).

The main things that are missing:

  • import is partially implemented and still has some bugs
  • builtins aren't yet handled
  • no caching of imports, so processing will be slow when I finish implementing imports and builtins
  • incomplete resolution of "." operation for imports

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 29, 2020

I'm closing this since there is no concrete proposal and there doesn't seem to be much active interest in this issue.

@sk-
Copy link
Author

sk- commented Aug 27, 2020

Just wanted to mention that [LibCST](https://github.com/Instagram/LibCST) has a TypeInferenceProvider which uses Pyre's query functionality. See https://pyre-check.org/docs/querying-pyre.html and https://libcst.readthedocs.io/en/latest/metadata.html#type-inference-metadata.

It'd be great if we could use Mypy instead of Pyre.

@gvanrossum
Copy link
Member

Hm, dmypy (mypy's daemon mode) has much of the same information available, there's just no API for it yet. We do have an experimental API that suggests the signature for an unannotated function based on how it's called (dmypy suggest).

Maybe we should develop something similar to Pyre's API? Maybe we could even just copy the same API style, to make it easier for clients to switch.

@JukkaL
Copy link
Collaborator

JukkaL commented Aug 27, 2020

Okay, let's open this since there is renewed interest.

Maybe we should develop something similar to Pyre's API? Maybe we could even just copy the same API style, to make it easier for clients to switch.

This would a reasonable thing to have. At least most of the Pyre API features should be easy enough to implement on top of dmypy.

The core team doesn't have a lot of spare cycles, but if somebody wants to look into this, I'm happy to give some help.

@sobolevn
Copy link
Member

sobolevn commented Mar 4, 2021

I have made a tool to enhance ast with metadata from mypy:

import sys

a = 1
b = 2
print(a is b)

Output:

» typed-linter ex.py
Original AST:
Module(body=[Import(names=[alias(name='sys')]), Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value=1)), Assign(targets=[Name(id='b', ctx=Store())], value=Constant(value=2)), Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Compare(left=Name(id='a', ctx=Load()), ops=[Is()], comparators=[Name(id='b', ctx=Load())])], keywords=[]))], type_ignores=[])

Format:
-- ast.Node mypy.Node
metdata

-- <class 'ast.Module'> <class 'mypy.nodes.MypyFile'>
{'fullname': 'ex', 'is_stub': False, 'path': 'ex.py', 'is_partial_stub_package': False, 'is_package_init_file': False, 'names': {'__builtins__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'builtins', 'type': None}, '__name__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__name__', 'type': 'builtins.str'}, '__doc__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__doc__', 'type': 'builtins.str'}, '__file__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__file__', 'type': 'builtins.str'}, '__package__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__package__', 'type': 'builtins.str'}, 'sys': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'module_hidden': True, 'module_public': False, 'cross_ref': 'sys', 'type': None}, 'a': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.a', 'type': 'builtins.int'}, 'b': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.b', 'type': 'builtins.int'}}, 'imports': [{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}]}
-- <class 'ast.Import'> <class 'mypy.nodes.Import'>
{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 1, 'type': Literal[1]?}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 2, 'type': Literal[2]?}
-- <class 'ast.Expr'> <class 'mypy.nodes.ExpressionStmt'>
{}
-- <class 'ast.Call'> <class 'mypy.nodes.CallExpr'>
{'arg_kinds': [0], 'arg_names': [None], 'is_analyzed': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'print', 'fullname': 'builtins.print', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': def (*values: builtins.object, *, sep: Union[builtins.str, None] =, end: Union[builtins.str, None] =, file: Union[_typeshed.SupportsWrite[builtins.str], None] =, flush: builtins.bool =)}
-- <class 'ast.Compare'> <class 'mypy.nodes.ComparisonExpr'>
{'operators': ['is'], 'method_types': [None, None, None], 'type': builtins.bool}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}

I am going to release it soon.

@jjlee
Copy link

jjlee commented Mar 17, 2022

@sobolevn did you release that?

@devmessias
Copy link
Contributor

devmessias commented May 30, 2022

I have made a tool to enhance ast with metadata from mypy:

import sys

a = 1
b = 2
print(a is b)

Output:

» typed-linter ex.py
Original AST:
Module(body=[Import(names=[alias(name='sys')]), Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value=1)), Assign(targets=[Name(id='b', ctx=Store())], value=Constant(value=2)), Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Compare(left=Name(id='a', ctx=Load()), ops=[Is()], comparators=[Name(id='b', ctx=Load())])], keywords=[]))], type_ignores=[])

Format:
-- ast.Node mypy.Node
metdata

-- <class 'ast.Module'> <class 'mypy.nodes.MypyFile'>
{'fullname': 'ex', 'is_stub': False, 'path': 'ex.py', 'is_partial_stub_package': False, 'is_package_init_file': False, 'names': {'__builtins__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'builtins', 'type': None}, '__name__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__name__', 'type': 'builtins.str'}, '__doc__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__doc__', 'type': 'builtins.str'}, '__file__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__file__', 'type': 'builtins.str'}, '__package__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__package__', 'type': 'builtins.str'}, 'sys': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'module_hidden': True, 'module_public': False, 'cross_ref': 'sys', 'type': None}, 'a': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.a', 'type': 'builtins.int'}, 'b': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.b', 'type': 'builtins.int'}}, 'imports': [{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}]}
-- <class 'ast.Import'> <class 'mypy.nodes.Import'>
{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 1, 'type': Literal[1]?}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 2, 'type': Literal[2]?}
-- <class 'ast.Expr'> <class 'mypy.nodes.ExpressionStmt'>
{}
-- <class 'ast.Call'> <class 'mypy.nodes.CallExpr'>
{'arg_kinds': [0], 'arg_names': [None], 'is_analyzed': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'print', 'fullname': 'builtins.print', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': def (*values: builtins.object, *, sep: Union[builtins.str, None] =, end: Union[builtins.str, None] =, file: Union[_typeshed.SupportsWrite[builtins.str], None] =, flush: builtins.bool =)}
-- <class 'ast.Compare'> <class 'mypy.nodes.ComparisonExpr'>
{'operators': ['is'], 'method_types': [None, None, None], 'type': builtins.bool}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}

I am going to release it soon.

Hi @sobolevn
That sounds good! It's already available? I looking for how to annotate the ast with mypy information for my project.

https://github.com/pyastrx/pyastrx/

@devmessias
Copy link
Contributor

Hm, dmypy (mypy's daemon mode) has much of the same information available, there's just no API for it yet. We do have an experimental API that suggests the signature for an unannotated function based on how it's called (dmypy suggest).

Maybe we should develop something similar to Pyre's API? Maybe we could even just copy the same API style, to make it easier for clients to switch.

I think the most relevant query from pyre is the list types from a file(s). The output is quite simple. Is just a list containing the annotations for each token

image

There are many more queries in pyre. But being able to get a list of types direct from dmypy or mypy will be enough.

@devmessias
Copy link
Contributor

@sobolevn did you release that?

I working on something to address that
pyastrx/pyastrx#44

But this is just a workaround, later on, I'll try to create a better and faster way to do this.

mypyq file1.py file2.py 

image

@juanchoflorez
Copy link

I have made a tool to enhance ast with metadata from mypy:

import sys

a = 1
b = 2
print(a is b)

Output:

» typed-linter ex.py
Original AST:
Module(body=[Import(names=[alias(name='sys')]), Assign(targets=[Name(id='a', ctx=Store())], value=Constant(value=1)), Assign(targets=[Name(id='b', ctx=Store())], value=Constant(value=2)), Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Compare(left=Name(id='a', ctx=Load()), ops=[Is()], comparators=[Name(id='b', ctx=Load())])], keywords=[]))], type_ignores=[])

Format:
-- ast.Node mypy.Node
metdata

-- <class 'ast.Module'> <class 'mypy.nodes.MypyFile'>
{'fullname': 'ex', 'is_stub': False, 'path': 'ex.py', 'is_partial_stub_package': False, 'is_package_init_file': False, 'names': {'__builtins__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'builtins', 'type': None}, '__name__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__name__', 'type': 'builtins.str'}, '__doc__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__doc__', 'type': 'builtins.str'}, '__file__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__file__', 'type': 'builtins.str'}, '__package__': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.__package__', 'type': 'builtins.str'}, 'sys': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'module_hidden': True, 'module_public': False, 'cross_ref': 'sys', 'type': None}, 'a': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.a', 'type': 'builtins.int'}, 'b': {'.class': 'SymbolTableNode', 'kind': 'Gdef', 'cross_ref': 'ex.b', 'type': 'builtins.int'}}, 'imports': [{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}]}
-- <class 'ast.Import'> <class 'mypy.nodes.Import'>
{'is_unreachable': False, 'is_top_level': True, 'is_mypy_only': False, 'assignments': [], 'class_name': 'Import', 'ids': [{'imported': 'sys', 'alias': None}]}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 1, 'type': Literal[1]?}
-- <class 'ast.Assign'> <class 'mypy.nodes.AssignmentStmt'>
{'type': builtins.int, 'unanalyzed_type': None, 'new_syntax': False, 'is_alias_def': False, 'is_final_def': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': True, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Constant'> <class 'mypy.nodes.IntExpr'>
{'value': 2, 'type': Literal[2]?}
-- <class 'ast.Expr'> <class 'mypy.nodes.ExpressionStmt'>
{}
-- <class 'ast.Call'> <class 'mypy.nodes.CallExpr'>
{'arg_kinds': [0], 'arg_names': [None], 'is_analyzed': False}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'print', 'fullname': 'builtins.print', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': def (*values: builtins.object, *, sep: Union[builtins.str, None] =, end: Union[builtins.str, None] =, file: Union[_typeshed.SupportsWrite[builtins.str], None] =, flush: builtins.bool =)}
-- <class 'ast.Compare'> <class 'mypy.nodes.ComparisonExpr'>
{'operators': ['is'], 'method_types': [None, None, None], 'type': builtins.bool}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'a', 'fullname': 'ex.a', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}
-- <class 'ast.Name'> <class 'mypy.nodes.NameExpr'>
{'name': 'b', 'fullname': 'ex.b', 'kind': 1, 'is_new_def': False, 'is_special_form': False, 'is_inferred_def': False, 'is_alias_rvalue': False, 'type': builtins.int}

I am going to release it soon.

Hi @sobolevn , have you release it? It will be really useful for one of my projects.
I am inclined to use libCST to get type annotations in my AST, however setting a working Pyre environment is cumbersome, so I prefer to have a mypy based tool for that.

@devmessias
Copy link
Contributor

I've something already doing that but I forgot about this Issue. I can work on this issue and send a PR.

rominf pushed a commit to rominf/LibCST that referenced this issue Dec 7, 2022
This change is RFC (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure collegues (including security team) that "we need yet
   another type checker". `MypyTypeInferenceProvider` requires the
   latest mypy only.
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy:
   ```
   [FIXME: code here]
   ```

Drawback:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two typecheckers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST, which makes the latter less attractive for
	 end users.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      typechecker (mypy or pyre; user can do this via enviromental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any typechecker. Ther remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.

Related to:

* Instagram#451
* pyastrx/pyastrx#40
* python/mypy#12513
* python/mypy#4868
rominf pushed a commit to rominf/LibCST that referenced this issue Dec 7, 2022
This change is RFC (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure collegues (including security team) that "we need yet
   another type checker". `MypyTypeInferenceProvider` requires the
   latest mypy only.
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy, see the
   code above.

Drawback:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two typecheckers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST, which makes the latter less attractive for
	 end users.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      typechecker (mypy or pyre; user can do this via enviromental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any typechecker. Ther remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.

Related to:

* Instagram#451
* pyastrx/pyastrx#40
* python/mypy#12513
* python/mypy#4868
rominf pushed a commit to rominf/LibCST that referenced this issue Dec 7, 2022
This change is RFC (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure colleagues (including security team) that "we need yet
   another type checker". `MypyTypeInferenceProvider` requires the
   latest mypy only.
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy, see the
   code above.

Drawback:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two typecheckers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST, which makes the latter less attractive for
	 end users.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      typechecker (mypy or pyre; user can do this via enviromental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any typechecker. The remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.

Related to:

* Instagram#451
* pyastrx/pyastrx#40
* python/mypy#12513
* python/mypy#4868
rominf pushed a commit to rominf/LibCST that referenced this issue Dec 7, 2022
This change is RFC (please read whole change message).

Add `MypyTypeInferenceProvider` as an alternative for
`TypeInferenceProvider`. The provider infers types using mypy as
library. The only requirement for the usage is to have the latest mypy
installed. Types inferred are mypy types, since mypy type system is well
designed, to avoid the conversion, and also to keep it simple. For
compatibility and extensibility reasons, these types are stored in
separate field `MypyType.mypy_type`.

Let's assume we have the following code in the file `x.py` which we want
to inspect:
```python
x = [42]

s = set()

from enum import Enum

class E(Enum):
    f = "f"

e = E.f
```

Then to get play with mypy types one should use the code like:
```python
import libcst as cst

from libcst.metadata import MypyTypeInferenceProvider

filename = "x.py"
module = cst.parse_module(open(filename).read())
cache = MypyTypeInferenceProvider.gen_cache(".", [filename])[filename]
wrapper = cst.MetadataWrapper(
    module,
    cache={MypyTypeInferenceProvider: cache},
)

mypy_type = wrapper.resolve(MypyTypeInferenceProvider)
x_name_node = wrapper.module.body[0].body[0].targets[0].target
set_call_node = wrapper.module.body[1].body[0].value
e_name_node = wrapper.module.body[-1].body[0].targets[0].target

print(mypy_type[x_name_node])
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].fullname)
 # prints: builtins.list[builtins.int]

print(mypy_type[x_name_node].mypy_type.type.fullname)
 # prints: builtins.list

print(mypy_type[x_name_node].mypy_type.args)
 # prints: (builtins.int,)

print(mypy_type[x_name_node].mypy_type.type.bases[0].type.fullname)
 # prints: typing.MutableSequence

print(mypy_type[set_call_node])
 # prints: builtins.set

print("issuperset" in mypy_type[set_call_node].mypy_type.names)
 # prints: True

print(mypy_type[set_call_node.func])
 # prints: typing.Type[builtins.set]

print(mypy_type[e_name_node].mypy_type.type.is_enum)
 # prints: True
```

Why?

1. `TypeInferenceProvider` requires pyre (`pyre-check` on PyPI) to be
   installed. mypy is more popular than pyre. If the organization uses
   mypy already (which is almost always the case), it may be difficult
   to assure colleagues (including security team) that "we need yet
   another type checker". `MypyTypeInferenceProvider` requires the
   latest mypy only.
2. Even though it is possible to run pyre without watchman installation,
   this is not advertised. watchman installation is not always possible
   because of system requirements, or because of the security
   requirements like "we install only our favorite GNU/Linux
   distribution packages".
3. `TypeInferenceProvider` usage requires `pyre start` command to be run
   before the execution, and `pyre stop` - after the execution. This may
   be inconvenient, especially for the cases when pyre was not used
   before.
4. Types produced by pyre in `TypeInferenceProvider` are just strings.
   For example, it's not easily possible to infer that some variable is
   enum instance. `MypyTypeInferenceProvider` makes it easy, see the
   code above.

Drawbacks:

1. Speed. mypy is slower than pyre, so is `MypyTypeInferenceProvider`
   comparing to `TypeInferenceProvider`.
   How to partially solve this:
   1. Implement AST tree caching in mypy. It may be difficult, however
      this will lead to speed improvements for all the projects that use
      this functionality.
   2. Implement inferred types caching inside LibCST. As far as I know,
      no caching at all is implemented inside LibCST, which is the
      prerequisite for inferred types caching, so the task is big.
   3. Implement LibCST CST to mypy AST. I am not sure if this possible
      at all. Even if it is possible, the task is huge.
2. Two providers are doing similar things in LibCST will be present,
   this can potentially lead to the situation when there is a need
   install two typecheckers to get all codemods from the library
   running.
   Alternatives considered:
   1. Put `MypyTypeInferenceProvider` inside separate library (say,
       LibCST-mypy or `libcst-mypy` on PyPI). This will explicitly
       separate `MypyTypeInferenceProvider` from the rest of LibCST.
      Drawbacks:
      1. The need to maintain separate library.
      2. Limited fame (people need to know that the library exists).
      3. Since some codemods cannot be implemented easily without the
         library, for example, `if-elif-else` to `match` converter
	 (it needs powerful type inference), they are doomed to not be
	 shipped with LibCST, which makes the latter less attractive for
	 end users.
   2. Implement base class for inferred type, which inherits from `str`
      (to keep the compatibility with the existing codebase) and
      the mechanism for dynamically selecting `TypeInferenceProvider`
      typechecker (mypy or pyre; user can do this via enviromental
      variable). If the code inside LibCST requires just shallow type
      information (so, just `str` is enough), then the code can run with
      any typechecker. The remaining code (such as `if-elif-else` to
      `match` converter) will still require mypy.

Misc:

Code does not lint in my env, by some reason `pyre check` cannot find
`mypy` library.

Related to:

* Instagram#451
* pyastrx/pyastrx#40
* python/mypy#12513
* python/mypy#4868
@GideonBear
Copy link

I have made a tool to enhance ast with metadata from mypy:
I am going to release it soon.

Hi @sobolevn, this would be extremely useful for me and probably others. Have you released this? If not, can you release the (partial) source code? Thanks!

@sobolevn
Copy link
Member

@GideonBear
Copy link

Source is here: https://github.com/wemake-services/typed-linter/tree/master/typed_linter/contrib/mypy

@sobolevn Is it on private? https://github.com/wemake-services/typed-linter is a 404 for me.

@devmessias
Copy link
Contributor

devmessias commented Dec 24, 2022

Maybe this can help you @GideonBear , https://github.com/pyastrx/pyastrx/tree/main/pyastrx/inference .

Also you use this after installing pyastrx

mypyq -f test.py

@JeroenSchmidt
Copy link

@devmessias what is your recommended approach to get the inferred type information within the AST?

Is there a viable solution now? I saw that you've been busy getting related PRs approved across various projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests