GetModIfaceFromDisk has bad first call complexity #2304

pepeiborra · 2021-10-25T08:50:46Z

The current implementation of GetModIface is O(# of transitive import statements), i.e. it does an amount of work proportional to the number of imports in the project. To see why, notice that to evaluate GetModIface M, that is to load the interface for a module M, we first need to load the interfaces of all its DD direct imports, which involves DD recursive calls to GetModIface. Transitively, this makes one GetModIface M call for every import in the module graph of M. This leads to bad startup performance in projects with a dense module graph since:

O(# of import statements) = O(edges in module graph) ==> O(modules^2).

Once GetModIface M has been called at least once the ongoing cost is O(direct imports), thanks to early cutoff, which is very good.

How can we improve the bad performance of the first call while preserving the good performance of the second call?

The text was updated successfully, but these errors were encountered:

pepeiborra · 2021-10-25T08:58:14Z

An alternative implementation of GetModIface M with better first call complexity would go like this:

Find the graph G of all the transitive imports
Topologically sort G into a list of MM modules (the transitive imports)
Load all the MM interfaces sequentially on a single HscEnv

In step 3 we avoid making any recursive calls to GetModIface and instead load all the interfaces "in place". This leads to O(transitive dependencies) complexity which is optimal and much better than the current O(# of transitive import statements).

The only problem is that, if we wanted to load all the modules in the project, the total cost would be O(modules^2) which is worse than the current one O(imports). Or put another way, the rebuild complexity would go from O(direct imports) to O(transitive imports).

Tradeoffs!

EDIT: The recursive calls to GetModIface would need to be replaced by calls to a hypothetical RefreshInterface build rule that blocks until the interface file on disk is fresh, regenerating it if needed. The tricky part of course is that, in order to check if the interface file is fresh with checkOldIface, we need to first load all the dependencies. So we end up in the same place where we started

jneira added type: enhancement New feature or request performance Issues about memory consumption, responsiveness, etc. type: refactor labels Oct 25, 2021

pepeiborra mentioned this issue Nov 2, 2021

Improve the performance of GetModIfaceFromDisk in large repos and delete GetDependencies #2323

Merged

mergify bot closed this as completed in #2323 Nov 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GetModIfaceFromDisk has bad first call complexity #2304

GetModIfaceFromDisk has bad first call complexity #2304

pepeiborra commented Oct 25, 2021 •

edited

Loading

pepeiborra commented Oct 25, 2021 •

edited

Loading

GetModIfaceFromDisk has bad first call complexity #2304

GetModIfaceFromDisk has bad first call complexity #2304

Comments

pepeiborra commented Oct 25, 2021 • edited Loading

pepeiborra commented Oct 25, 2021 • edited Loading

pepeiborra commented Oct 25, 2021 •

edited

Loading

pepeiborra commented Oct 25, 2021 •

edited

Loading