-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adopt Microsoft/tolerant-php-parser #357
Conversation
- add diagnostics for old parser - include maxRecursion levels - include option to run functions multiple times to help profile
Cool, I assumed there were some extra steps needed to make the submodules work in Travis, but looks fine! |
At the moment, the submodules would only be used for the Performance test. ValidationTest only runs on the individual testcases in the cases/ folder (previously, it ran on the submodules comparing the result to the old parser). |
I will file a whole bunch of issues shortly. Are you waiting for that before merging? Or also the latest comment #357 (comment)? |
I was going to fix #357 (comment) before merging, but if you want to do it quick feel free to do so |
Those conditions are a little gnarly... it's not pretty but it will work the same now |
Simplified the condition further :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LETS DO THIS
Performance.php
Outdated
} | ||
|
||
|
||
$parserKinds = [ParserKind::PHP_PARSER, ParserKind::TOLERANT_PHP_PARSER]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ParserKind still exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damnit, one sec :)
IT'S DONE 🎉 |
Thanks!... is that even a word? :/ |
@felixfbecker It might be useful to have the history of this PR, since so much happened, instead of squashing? I understand wanting a clean history, but |
I agree with @roblourens |
@roblourens I didn't want to have all the cleanup commits in there, but doing a manual rebase sounded like a lot of work... You are right but I don't think it's worth force pushing master now |
I usually only rebase+merge or merge with merge commit if every commit passes tests on its own and could be reverted on its own, which would have been hard to ensure for all the commits in this PR. |
fix #323
As discussed in microsoft/tolerant-php-parser#36 (comment), key benefits include:
This is a pretty big PR, so in order to make it easier to review - a little explanation of the current status and deep dive into the changes is in order. Additionally, this braindump was a good way to wrap my head around the remaining issues 😃. Happy to answer any additional Qs, and stay tuned for further updates!
What's ready for review?
TolerantDefinitionResolver.php
andTolerantSymbolInformation.php
. There are still a few TODOS, Formatting and doc comments need some clean up, but otherwise pretty reasonable.I'll walk through this in two stages:
General Status
Working branch:
Currently implemented Features
Testing and Validation
In order to validate the changes, I thoroughly compared the language server results (fqns, definitions, references) of the two parsers on a number of frameworks (~15,000 files in total), and worked to resolve any discrepencies.
Notable Improvements
Notable Regressions
Other differences
hello(/*cursor*/)
, go-to-definition at the denoted cursor position will not yield a result (but will yield a result if the function call name is selected)Performance Comparison
At this point in time, the performance is not nearly as good as it could be (e.g. every call to getResolvedName does a tree walk rather than performing any caching, does not include the impacts of incremental parsing, some of the API or API calls are innefficient and need to be improved).
That said, even without these changes - some preliminary results of the impact from the most recent run - this includes all lang. server operations, not just are below (see Performance.php for details - only ran this on my machine - PHP 7.1 x64 on Windows, kept an eye on CPU to ensure it remained constant throughout, units are in seconds) - still need to run more benchmarks and analayze results (unfortunately XDebug profiler has been seriously failing me for this use case):
The memory improvements are not nearly as significant as they could be because at this point in time, only the ASTs for the open files are held in memory, and definitions/references representation is the primary concern. However as demonstrated earlier - the ASTs themselves are significantly lighter, so in the future, we could hold many more in memory rather than throwing them all away.
Next steps
File Changes
Changes described below, with corresponding TODOs
GENERAL TODO
declare(strict_types = 1);
to all filescomposer.json [modified]
Modified to add a reference Microsoft/tolerant-php-parser. Note that the referenced version is currently being pulled from a private branch until these changes make their way to master
TODO
lang-server
rather thandev-lang-server
branchLanguageServer.php [modified]
The language server entrypoint is at
LanguageServer::initialize()
. Theinitialize
method is responsible for setting up the indexes, server capabilities, etc.The changes here are pretty minimal. The first change is that the $definitionResolver property is now of type
DefinitionResolverInterface
, rather thanDefinitionResolver
.This enables us to hotswap parsers -
DefinitionResolver
(the definition resolver corresponding to the old parser) andTolerantDefinitionResolver
(the definition resolver corresponding to the new parser) both implementDefinitionResolverInterface
DefinitionResolverInterface.php [added], DefinitionResolver.php [modified], TolerantDefinitionResolver.php [modified]
In order to make it as easy as possible to transition from the old parser to the new parser and validate these changes, they both retain the same core interface as before:
The signatures no longer include type hints for
PhpParser\Node
- this is because the new Node objects of are typeMicrosoft\PhpParser\Node
(note thatMicrosoft\PhpParser
is aliased asTolerant
everywhere)The
TolerantDefinitionResolver
logic has been completely rewritten otherwise to support the new AST structure and node / token properties.Initially, I considered creating some more granular abstractions in the hopes of creating a shared interface between the two parsers, but the differences were too significant to be worth the effort (e.g. method calls are significantly more expensive than property accesses, code became more difficult to debug, etc.)
TODO
DefinitionResolver
ParserResourceFactory.php [added], ParserKind.php [added]
The ParserResourceFactory class enables us to manage the resources between the two parser kinds (defined in ParserKind). This includes instantiating the parser, definition resolver, and tree analyzers.
The parser kind is set with a static property, and there are two primary parser kinds
PHP_PARSER
TOLERANT_PHP_PARSER
The current 'default' is
TOLERANT_PHP PARSER
.In addition, there are diagnostic modes which can be accessed by specifying
DIAGNOSTIC_PHP_PARSER
, orDIAGNOSTIC_TOLERANT_PHP_PARSER
. The main difference is that when these values are specified, they will instantiate aLoggedDefinitionResolver
orLoggedTolerantDefinitionResolver
instead ofDefinitionResolver
orTolerantDefinitionResolver
TODO
LoggedDefinitionResolverTrait.php [added], LoggedDefinitionResolver.php [added], LoggedTolerantDefinitionResolver.php [added]
LoggedDefinitionResolver
andLoggedTolerantDefinitionResolver
are basic wrappers aroundDefinitionResolver
andTolerantDefinitionResolver
to make it easier to debug the application without explicitly attaching a debugger, and the actual logging logic is defined inLoggedDefinitionResolverTrait
. Currently, logs are simply printed out to the console, but in the future we could implement some more advanced logging.The logs themselves are pretty basic - they intercept the primary method calls, and log their corresponding signatures, input parameters, and returned results. In addition, they are indented according to their current level of recursion.
In the future, it might be helpful to include more granular logging, but this has been sufficient for now (note that there would also be a performance impact to keeping this logging on all the time).
There are also a few logging options worth mentioning (the way they are implemented is currently pretty hacky and specific to a particular problem I was investigating, but this can be revisited).
maxRecursion
keeps track of the max recursion level (an interesting statistic for diagnosing issues - could also consider using a formal cap on this to prevent CPU spinning out of control).repeat
number of times a method should be repeated - I used this to diagnose some issues when I realized that the XDebug profiler results were absolutely useless for this application (by repeating a method call many times, I was able to get more accurate time stamps when analyzing perf. issues).times
keeps track of the timing profilesTODO
Logged
->Diagnostic
maxRecursion
,repeat
, andtimes
propertiesReferencesCollector.php [modified], DefinitionCollector.php [modified]
These files have been updated to correspond with the new DefinitionResolver interfaces
TODO
FqnUtilities.php [added]
For holding some common logic between the two definition resolvers.
TODO
Location.php [modified], Range.php [modified]
Modified to support creating locations and ranges from both node types.
TODO
TolerantSymbolInformation.php [added]
TODO
TolerantParserHelpers.php [added]
A temporary class to assist in answering simple questions about the AST. Eventually these operations will be rendered redundant by the parser.
TODO
tryGetPropertyDeclarationFromVariable
)TreeAnalyzerInterface.php [added], TreeAnalyzer.php [added], TolerantTreeAnlayzer.php [added]
Central interface for performing definition / reference collection. One thing that's worth mentioning is that - for performance reasons, I didn't end up taking advantage of the built-in Node API of the tolerant parser, but will revisit this decision.
TODO
TextDocument.php [modified]
Responsible for handling TextDocument requests from client and server (e.g. definition, references). Updated logic to properly handle new node types and definition resolver interface
TODO
Tests/* [modified]
Many of the tests have been updated to account for updated node locations. Unfortunatley this breaks tests for the first parser.
TODO
ValidationTest.php [added], broken/*.php [added]
Validation tests used to validate the parser against the old parser. broken/*.php are cases that were at some point an issue during the validation run (but not during the main run)
TODO