-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and record implicit dependencies in incremental mode #2041
Merged
gvanrossum
merged 18 commits into
python:master
from
Michael0x2a:incremental-interface-3
Sep 2, 2016
Merged
Detect and record implicit dependencies in incremental mode #2041
gvanrossum
merged 18 commits into
python:master
from
Michael0x2a:incremental-interface-3
Sep 2, 2016
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2a8c180
to
8ebcdf2
Compare
9ad052f
to
8d3d327
Compare
69e5d31
to
cc90602
Compare
@@ -70,11 +102,16 @@ def __init__(self, | |||
self.msg = msg | |||
self.strfrm_checker = StringFormatterChecker(self, self.chk, self.msg) | |||
|
|||
def _visit_typeinfo(self, info: nodes.TypeInfo) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused?
Thanks for the extensive write-up! I've only found some nits; after those are addressed I'll re-review and merge. |
This commit modifies the incremental tests (and in particular, the 'rechecked' and 'stale' annotations) to more accurately describe what the expected behavior should be after making interfaces changes propagate only one step out.
This commit modifies TypeInfo so it records the name of the module in which the class (or NewType or namedtuple) was defined in. This does not modify the behavior of mypy, but will make it easier to perform introspection and analysis at later stages. This information is also a part of the TypeInfo's fullname, but is difficult to extract in the case of nested classes. For example, if we have a fully qualified name of a.b.c.d, it's not always obvious whether or not c is a module name or is a class name, and so forth.
This commit modifies the typechecking stage to record all module references that a particular module makes. This information will be used in a future commit. The motivation for this change is that it turns out that the set of all module references is often larger then the set of explicitly imported modules, and having this information would allow us to make incremental mode more aggressive.
This commit adds a new visitor to extract all module references from arbitrary types. This visitor should be considered provisional -- it will most likely be merged in with the typechecker at a later stage to make it a less expensive call.
This commit modifies the build process to record all implicit module references a given file makes when writing cache data. Previously, the cache data would only save all explicit dependencies (i.e. modules used within a direct import). This posed a problem because it turned out it's trivially easy to construct a series of interlinked files wherein changing the interface of one module produces or fixes an error in a seemingly unrelated file (which did NOT import anything from the first module). This made optimizing incremental mode harder since an interface change could have unknown effects on the codebase as a whole. This change modifies how dependencies are recorded so BOTH explicit (directly imported) and implicit (indirectly reliant) dependences are recorded for each file. This means that checking to see if any of our dependents' interfaces have changed requires simply checking every module in the list of dependencies (moving one "step") rather then having to potentially crawl the entire graph. In order to avoid accidentally perturbing the import graph in unpredictable ways, all implicit dependences will be given a lower priority then regular imports (lower then PRI_LOW).
Previously, the indirect dependency patcher would always add in indirect dependencies, even if they were silenced or ignored. This commit makes the patcher more discriminating.
This commit modifies Vars and adds an extra field to indicate if they refer to imports or not. Normally, modules are not represented as Var nodes, except in the case where that particular module was silenced or suppressed in some manner. This commit will make it easier to perform analysis on the AST during the typechecking phase.
Previously, the RefExpr module analyzer did not correctly handle the case where a silenced import was used inside of a class. For example, suppose we have a file named `mod1.py`: ```python class Foo: def bar(self) -> None: import unrelated # this is suppressed self.val = unrelated.test() ``` ...the analyzer would end up reporting that this file indirectly relied on module `mod1.Foo`, which is clearly incorrect. This commit attempts to handle this case.
The `# options: ...` directive was replaced with the `# flag: ...` directive.
This commit adds an (ad-hoc) check to the indirect dependencies patcher to filter out computed dependencies that are not, in fact, actually modules. Ideally, this ad-hoc check shouldn't be necessary but there are enough weird edge cases that it seems like a useful thing to have to prevent unexpected behavior.
This commit adds several more tests to try and check if mypy behaves correctly under a variety silent import related scenarios, and if it behaves correctly starting from an initial error.
It seems like the `info` attribute inside NameExpr is not actually set or used anywhere. This commit removes it altogether.
This commit contains mainly more fixups and refactoring in response to the first wave[let] of feedback.
e904af4
to
fcbdd4b
Compare
@gvanrossum -- I'm ready for the re-review now! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request modifies the typechecking phase and adds a new visitor to check types used within a particular module to find indirect module dependencies. These indirect module dependencies are then added to the set of a module's dependencies with a lowered priority so that they may be easily referenced when attempting to check if a module's dependencies have a stale interface or not. Now, instead of having to search through the import graph to see if a module's interface is stale or not, I only need to check the set of dependencies for that module, which is a potential optimization gain.
I've done some informal testing, and it seems like this optimization will decrease the time it takes for mypy to recheck a project after changing the interface of a single file by roughly 20% to 50%, depending on exactly which file is rechecked/the size of the repo. (If a project happens to have very few indirect dependencies, the percentage will naturally trend closer to 0%.)
However, this optimization does slow down how long it takes to run mypy with either a completely cold cache or a completely warm cache by approximately 5% to 10%. Figuring out how to speed up the changes in these cases is something I will most likely try solving in a separate pull request, unless I'm hit by a sudden burst of inspiration.
More details:
I define the term "indirect dependency" as a module that is not explicitly imported but still affects the types of the current module. For example, consider the following program:
In this test case,
mod1
is an direct dependency ofmain
, since it is directly imported. Modulemod3.py
is an indirect dependency ofmain
, since it was not directly imported, but does affect the type of theval
global variable ofmain
and therefore its public interface.Note that
mod2
is neither a direct or indirect dependency ofmain
, sincemain
does not in any way use types or symbols that were defined withinmod2
.However,
mod2
is an direct dependency ofunrelated
. Note thatmod3
is neither a direct nor indirect dependency ofunrelated
.This test case forms the following import graph (omitting the
builtins
module and all associated machinery):The solid lines indicate direct dependencies, and the dotted lines indicate indirect dependencies. (The dependees point to the dependers).
The problem here is that if
mod3
's interface is changed, we would also need to checkmod2
and all of its dependencies recursively since we have no idea what modules later in the import graph indirectly rely on us. So, in the current version of incremental mode, an interface change tomod3
forces us to recheck the entire graph, including theunrelated
module.Proposal:
My proposal is to perform an operation similar to a transitive closure and explicitly record the indirect dependencies within the
dependencies
field of theBuildState
class, and assign each indirect dependency a lowered priority compared to any import.I then propose to remove the current requirement that an interface change in a module automatically forces an interface change in anything that imports it.
Instead, since we explicitly keep track of both direct and indirect dependencies, it should be sufficient to just check those enumerated files. So that means in this case, an interface change in
mod3
would trigger a recheck inmod2
,mod1
, andmain
, but NOT inunrelated
. Sincemod2
's interface would not be changed by an interface change inmod2
(unless we delete or rename theB
class, theunrelated
module would not need to be rechecked in any way in most cases.More visually, to determine if we need to recheck a module, we follow the arrows backwards exactly once.
This doesn't seem like a huge win here since we get to save rechecking only a single file, but if the
unrelated
module is subsequently imported by many different modules, this change allows us to avoid having to recheck a potentially large, unconnected SCC.Potential concerns:
The main danger of this change is that it works by changing the import graph AFTER the files are fully typechecked. (The reason why we can only record these changes after the typechecking phase is because we need to know what the finalized types in the AST are before we can reliably determine what our indirect dependencies are).
This means in incremental mode, the import graph will look different on the second run of mypy compared to the first (though all subsequent runs should have the same import graph).
For example, if we take the above test case, the import graph would look like:
...at the start of the first run (when incremental mode is enabled), but would look like:
...at the start of all subsequent runs. This ends up breaking the invariant that incremental mode does not change the import graph and that the import graph looks the same between incremental mode and non-incremental mode. This may have subtle implications in complex graphs involving many cycles.
Sketch of correctness:
However, I believe this change is a safe one. If we have an indirect dependency between modules A and B, that means there must have been a direct chain of dependencies of the form
A -> mod1 -> mod2 -> ... -> modN -> B
. This means that module A would have been naturally loaded before B anyways, and adding a low-priority arrow connecting the two does not change the results. (That is, the existing SCC generation code produces a partial ordering of the modules by performing a modified topographical sort, and adding more arrows in the manner described above shouldn't seem like it should change that partial ordering).Since the indirect dependencies are also given a lowered priority then any other module, they also shouldn't drastically affect how the SCC generation/sorting code works, since every other kind of import will take priority.
Rejected idea:
The original plan was to simply just outlaw indirect dependencies altogether, which worked beautifully well in the following case:
...but proved impossible in cases similar to the first test case listed above.
It also turned out that regular code contained far too many indirect dependencies of the first form in regular code, making outlawing them a non-starter.