-
-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process dependent files #6256
Process dependent files #6256
Conversation
@@ -24,6 +24,9 @@ | |||
$output = str_replace(__DIR__, '.', $output); | |||
|
|||
$expectedDiff = 'expected-output.diff'; | |||
if (isset($argv[1]) && $argv[1] === '--diff') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow setting which output/diff file should be used so that we can test different outputs in our new end 2 end test
@@ -114,6 +114,7 @@ parameters: | |||
- rules/DeadCode/NodeManipulator/LivingCodeManipulator.php | |||
- rules/EarlyReturn/Rector/If_/ChangeAndIfToEarlyReturnRector.php | |||
- src/PhpParser/NodeTransformer.php | |||
- src/FileSystem/FilesFinder.php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add this exception here because now the findInDirectoriesAndFiles
returns an array by unpacking two arrays and this is incorrectly seen as wrong
) { | ||
} | ||
|
||
public function run(Configuration $configuration, InputInterface $input): ProcessResult | ||
{ | ||
$filePaths = $this->filesFinder->findFilesInPaths($configuration->getPaths(), $configuration); | ||
$allFiles = $this->filesFinder->findFilesInPaths($configuration->getPaths(), $configuration); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We split this function in two, first we find all possible files to process and then we filter them using the cache
$allFiles = $this->filesFinder->findFilesInPaths($configuration->getPaths(), $configuration); | ||
$filePaths = $this->filesFinder->filterUnchangedFiles($allFiles); | ||
|
||
$filePaths = [...$filePaths, ...$this->findDependentFiles($allFiles, $filePaths)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add any dependent files to the list of files that we want to analyse
@@ -94,9 +100,16 @@ public function run(Configuration $configuration, InputInterface $input): Proces | |||
} | |||
|
|||
if ($configuration->isParallel()) { | |||
$this->fileDependenciesCache->cacheAllFiles($allFiles); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are running in parallel, we save the list of all possible files in the cache. We do this because calculating this list is an expensive operation and calculating it for each worker would slow the process. Not sure this is the best option to pass this array to the workers, if you can think of a better option, please let me know
@@ -139,6 +139,7 @@ private function runWorker( | |||
$processResult = $this->applicationFileProcessor->processFiles( | |||
$filePaths, | |||
$configuration, | |||
null, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When processing files from the worker, we set the allFiles parameter to null so that this list of files is loaded from the cache
@@ -326,6 +327,7 @@ final class LazyContainerFactory | |||
NodeScopeResolver::class, | |||
ReflectionProvider::class, | |||
CachingVisitor::class, | |||
DependencyResolver::class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We make sure we can get this service from PHPStan
@@ -70,9 +70,16 @@ function (string $file): bool { | |||
// filtering files in directories collection | |||
$directories = $this->fileAndDirectoryFilter->filterDirectories($filesAndDirectories); | |||
$filteredFilePathsInDirectories = $this->findInDirectories($directories, $suffixes, $sortByName); | |||
return [...$filteredFilePaths, ...$filteredFilePathsInDirectories]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function now returns a list of all possible files to analyse, without filtering out unchanged files
* @param string[] $allFiles | ||
* @return string[] | ||
*/ | ||
public function filterUnchangedFiles(array $allFiles): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we add a new functions that filters out the unchanged files
@@ -40,6 +40,18 @@ public function hashFiles(array $files): string | |||
return $configHash; | |||
} | |||
|
|||
public function resolvePath(string $filePath): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved here so that it could be used both by the file changed detector and the dependency cache
Thanks for the PR. This is rather complex to review. Is there a way to use more from PHPStan to delegate the low-lever work? Could you share few performance comparison before/after? @staabm This is rather outside my area of focus, so I'd like to ask you for advisory opinion on this one 🙏 |
the caching on this level is a super-complex topic, therefore I would suggest to make it a opt-in feature before it stabilized. in addition its super frustrating when the cache doesn't work properly.
I can only second this. we need to be aware how much benefit we get for the added complexity. AFAIR rector already has some kind of cache which we need to make sure are somehow synchronized (so we should make sure that one cache does not depend on already and maybe expired other cached stuff) we should also make sure we have a common understand what rector already caches and what this PR will add on top. let me write down a few corner stones from the top of my head, which are fundamental for the PHPStan caching.
very likely this list is not complete. Implementation details can be found in PHPStan |
Regarding performance, this PR does not aim to increase performance, in fact it will decrease it a little since it needs to calculate the dependencies for each file that it processes and then we need to use this data to calculate the dependent files for the files that need to be processed. I tried to measure the loss in performance when doing a full analysis with an empty cache and was not able to measure any real difference when using parallel processing, the uncertainty in the parallel processing seems to be bigger than the loss of performance that we are trying to measure. Not using parallel I got these results: |
Regarding the Rector cache, right now it is a very simple cache, much simpler than the one used by PHPStan, it just saves the hash of any file processed so that this can be used later to find out if a file has changed. PHPStan saves a lot more data, including the dependencies for each file. |
Regarding using more of PHPStan, we cannot use its full caching capabilities because their cache is much more complex than the one used by Rector. The only possible piece of code that we could have maybe reused is the one that finds the dependent files from the saved dependencies but PHPStan has not extracted this code in a class or function that can be reused, it is embedded in the result cache manager code |
I'll be prompt to avoid investing more time on your side. I share similar view as @staabm . Also, depedent files do not solve the need for re-run Rector in case of adding types. E.g. when new I do re-runs for PHPStan and coding standard tools as well as they have the very same design 👍 Saying that, I want to keep current state of balanced equilibrium. Thank you @carlos-granados 🙏 |
@TomasVotruba no problem, just wanted to point out something. You said |
If anyone else is interested in this option, it is available in this fork: https://github.com/carlos-granados/rector |
When using the cache, process dependent files of modified files to make sure we have covered all possible needed changes. See rectorphp/rector#8801
This is implemented by using PHPStan's
DependencyResolver
. When we analyse a file, we save all the dependencies of this file in the cache. When we do a later analysis using the cache, we use these lists of dependencies to find the dependent files for any analysed file.The implementation is quite simple and quite performant, I have not been able to measure a noticeable degradation in performance. Seems to cover all the cases that I could think of, probably needs more eyes on it and more testing to make sure everything works as supposed.
Includes a new end 2 end test for this functionality
See the PR comments for more information