-
Notifications
You must be signed in to change notification settings - Fork 354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement goto definition with low CPU and memory overhead #332
Conversation
This commit removes pretty much everything in Metals, including the test suites. The current implementation severals from several problems - it is tied to Scalameta v2.1.7 and upgrading to v4.0.0 is difficult - it consumes a lot of memory, up to ~2-3GB for medium sized codebases. - it is slow at indexing dependency sources, importing a medium/large sized project can take several minutes. - it uses complex persistent caching to make up for slow indexing, storing data on the users disk. This implementation detail leaked into the user experience through the "Clear index cache" command. - tests are difficult to write, read and update to changing requirements. There are some good parts that are removed in this commit but may get revived in the future: - token edit distance works great for stale SemanticDB files - sbt server integration works and has an integration test - MetalsServices + Main + Configuration
- upgrade scalameta - remove unnecessary projects - use resource generators to feed input project classpath/sourceDirectories/sourceJars into test suite.
Reformat with Prettier.
We're going to implement a parser with the LegacyScanner (tokenizer) and this basic utilty will come in handy.
These methods were spread out through different objects and now they are in one place only.
This commit implements a high-performance Scala parser that only exctracts top-level symbols from Scala source files. This parser needs to be fast because it is on the critical path when importing a new project and we index the depenency sources.
This commit introduces a new module mtags that implements goto definition with a new design that achieves the following goals: - source indexing is fast, Scala sources are indexed at ~800k loc/s and Java sources don't require any indexing altogether. - the in-memory index uses little ram, an index for ~1.5 million lines of Scala takes ~15MB. The new symbol indexer is able to handle different kinds of sources: source files, sources.jar and SemanticDB files. Source and SemanticDB can complement each other, the indexer falls back source indexing when the SemanticDB file is either missing or is incomplete due to for example macro annotations. An important difference from the old indexer is that the new indexer is not responsible for computing the range position inside the source file. The final range position is left for a separate module that handles other concerns such as stale SemanticDBs using edit distance.
These are copy-pasted from the Scalameta test suite.
- BaseSuite: replaces utest DSL with ScalaTest FunSuite syntax. - BaseExpectSuite: replaces utest DSL with ScalaTest FunSuite syntax. - SingleFileExpectSuite: expect test with a single output file - DirectoryExpectSuite: expect test with a directory containing multiple expected output files. - InputProperties: metadata about the input project.
Test cases are plain Scala code, no magic assertions. This set of files stress a range of interesting Scala and Java language features.
This test suite is the most advanced of the current test suites, it tests a range of functionality: - synthetic symbols from case classes fallback to non-synthetic symbols - source + semanticdb indexing can be combined to support navigation to macro annotations.
We care equally about memory useage and CPU time to index sources. Rough summary of the results: - indexing 7 million loc Java and 1.5 million loc Scala takes ~15M ram - Scala topelevel parser handles ~800k loc/s The numbers are so good that I'm not too stressed about micro-optimization as the moment. We run the full indexer for a medium-sized classpath on every unit test without annoying me, total runtime of `unit/test` is ~4 seconds.
``` > bench/jmh:run -i 10 -wi 10 -f1 -t1 [info] Benchmark Mode Cnt Score Error Units [info] MetalsBench.indexSources ss 10 0.620 ± 0.058 s/op [info] MetalsBench.javaMtags ss 10 7.233 ± 0.017 s/op [info] MetalsBench.scalaMtags ss 10 4.617 ± 0.034 s/op [info] MetalsBench.scalaToplevels ss 10 0.361 ± 0.005 s/op ```
- Only deal with definitions, no references. - No SemanticDB indexing, only mtags. - Document OnDemandSymbolIndex.
We needed to add the enclosing source directory to the API to handle this case. Added detailed documentation to the methods.
Updated the PR with more docs and simplified the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass of comments, I haven't got to the bottom of this yet.
loadFromSourceJars(trivialPaths(toplevel)).foreach(addSemanticdb) | ||
} | ||
} | ||
if (!definitions.contains(symbol)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this repeated because the above can mutate definitions?
build.sbt
Outdated
"org.scalameta" %% "testkit" % V.scalameta % Test, | ||
"org.scalameta" %% "scalameta" % V.scalameta, | ||
"org.scalameta" %% "symtab" % V.scalameta, | ||
"org.scalameta" % "interactive" % V.scalameta cross CrossVersion.full, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this still in use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, ls4ps is not strictly used either because the metals
project has 2 source files at the moment. I will clean up the metals
project once we add language server stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the interactive part specifically, but there's no rush to clean this up now, I agree
docs/installation-contributors.md
Outdated
2.12.6 are not supported. | ||
* Java 8. Note that Java 9 or higher has not been tested. | ||
* macOS or Linux. Note that there have been reported issues on Windows. | ||
- Scala 2.12.7, down to the exact PATCH version. Note that Scala versions 2.11.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make the Scala version dynamic using V.scala212
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs don't build actually at the moment because the Configuration
object is now gone. We'll need to update the docs before there is a release.
/** | ||
* Utility to load relative paths from a classpath. | ||
* | ||
* Provides similar functionality as URLClassLoader but does no caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the no-caching a feature or a limitation? Maybe it's worth explaining why this class is needed over URLClassLoader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most important bit is that it uses AbsolutePath
and Option[T]
instead of java.net.URL
and nulls. Updated docstring to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of caching was actually out-dated, the underlying implementation uses URLClassLoader at the moment.
mtags/src/main/scala/scala/meta/internal/mtags/DefinitionAlternatives.scala
Show resolved
Hide resolved
|
||
} | ||
|
||
case class SymbolDefinition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe SymbolDefinitionPath
or SymbolDefinitionFile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ending with Path
makes the querySymbol/definitionSymbol
fields a bit confusing IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the same time, this is not really a symbol definition. It's the file where the symbol is defined, so that was confusing at a first read
import scala.meta.internal.mtags.Enrichments._ | ||
|
||
/** | ||
* Utility to generate method symbol disambiguators according to SemanticDB spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double space between symbol
and disambiguators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed, only if we had a formatting tool that could reformat our comments 🤔
// java.lang.NullPointerException: null | ||
// at com.thoughtworks.qdox.builder.impl.ModelBuilder.createTypeVariable(ModelBuilder.java:503) | ||
// at com.thoughtworks.qdox.builder.impl.ModelBuilder.endMethod(ModelBuilder.java:470) | ||
// TODO(olafur) report bug to qdox. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this been reported? 😇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, didn't manage to minimize the error. Replaced the comment with explanation why we ignore parse errors.
While removing the unused `DefinitionAlternatives.methodToVal` I realized that DefinitionSuite didn't properly document which cases relied on fallbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed review, a lot of great questions @gabro !
build.sbt
Outdated
"org.scalameta" %% "testkit" % V.scalameta % Test, | ||
"org.scalameta" %% "scalameta" % V.scalameta, | ||
"org.scalameta" %% "symtab" % V.scalameta, | ||
"org.scalameta" % "interactive" % V.scalameta cross CrossVersion.full, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, ls4ps is not strictly used either because the metals
project has 2 source files at the moment. I will clean up the metals
project once we add language server stuff.
docs/installation-contributors.md
Outdated
2.12.6 are not supported. | ||
* Java 8. Note that Java 9 or higher has not been tested. | ||
* macOS or Linux. Note that there have been reported issues on Windows. | ||
- Scala 2.12.7, down to the exact PATCH version. Note that Scala versions 2.11.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, done.
docs/installation-contributors.md
Outdated
2.12.6 are not supported. | ||
* Java 8. Note that Java 9 or higher has not been tested. | ||
* macOS or Linux. Note that there have been reported issues on Windows. | ||
- Scala 2.12.7, down to the exact PATCH version. Note that Scala versions 2.11.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs don't build actually at the moment because the Configuration
object is now gone. We'll need to update the docs before there is a release.
/** | ||
* Utility to load relative paths from a classpath. | ||
* | ||
* Provides similar functionality as URLClassLoader but does no caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most important bit is that it uses AbsolutePath
and Option[T]
instead of java.net.URL
and nulls. Updated docstring to reflect that.
/** | ||
* Utility to load relative paths from a classpath. | ||
* | ||
* Provides similar functionality as URLClassLoader but does no caching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lack of caching was actually out-dated, the underlying implementation uses URLClassLoader at the moment.
} | ||
|
||
/** Fallback to the val term for a def with multiple params */ | ||
private def methodToVal(symbol: String): Option[String] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used, removed.
* | ||
* Returns the path of the file that defines the symbol but does not include | ||
* the exact position of the definition. Computing the range position of the | ||
* definition is left for the client and can be done using the mtags module. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified with
the exact position of the definition. Computing the range position of the
definition is not handled by this method, it is left for the user and can
be done using the mtags module.
def addSourceFile( | ||
file: AbsolutePath, | ||
sourceDirectory: Option[AbsolutePath] | ||
): Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently this throws an exception for errors like tokenization failures. I'm not sure yet what the best error handling method so I'd prefer to refine it down the road once we have a clearer picture of how it will be used in the language server. Maybe the underlying implementation can accept a Reporter
🤔
|
||
} | ||
|
||
case class SymbolDefinition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ending with Path
makes the querySymbol/definitionSymbol
fields a bit confusing IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two other minor comments
* | ||
* Fast indexing is enabled by ScalaToplevelMtags, a custom parser that extracts | ||
* only toplevel symbols from a Scala source file. Java source files don't need indexing | ||
* because their file location can be inferred from the symbol itself. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a reference to the note from above, where you list the limitations of this approach.
) extends GlobalSymbolIndex { | ||
val mtags = new Mtags | ||
private val sourceJars = new ClasspathLoader(Classpath(Nil)) | ||
var indexedSources = 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be accessed concurrently? Maybe using an AtomicInteger
would be safer.
An overall comment that I have is about the pervasive use of It feels a bit brittle and I suspect it makes it difficult to refactor some of the code. Especially the use of |
Biggest change was wrapping all strings in a `Symbol` class. There's currently no standard symbol class in scalameta that we can reuse so we copy-paste what exists in scalafix and adjust to our needs.
That's a good point. I've updated the PR to use a wrapper Thank you @gabro for the review, a lot of great comments 🙏 I will merge this now after CI is green, next step is to setup a bsp client 💪 |
Implement goto definition with low CPU and memory overhead
This PR replaces the existing navigation solution with a new implementation that is fast and uses little memory. To avoid making this PR larger, only goto definition is implemented.
The fast indexing is enabled by a new Scala parser that extract only toplevel classes from Scala sources at ~800k loc/s according to JMH "single shot" average time. This parser is implemented on top of the
LegacyScanner
, which is a fork of the Scala compiler scanner and is used by the Scalameta tokenizer. Java source files don't need indexing since their toplevel classes are inferred from their filename and can be computed on-demand.The low-memory is achieved by storing only "non-trivial" toplevel classes in the index, that is toplevel members that don't match their enclosing file path. An index of the combined sources for akka+spark+finch+scalameta+linkerd+kafka+grpc uses ~17MB of memory, at ~1.4MB per 100k lines of Scala code. The index uses Scala stdlib
Map
withString
keys so it might be possible to compress this further with a smarter data structure like prefix trees.This PR removes pretty much everything from the original Metals codebase, including functional parts like the
Main
entrypoint, LSP functionality, sbt server integration and all tests suites. Some of those parts may get revived in the future but I think first we should first write more comprehensive tests for those components. In particular, the old implementation suffered from lack of e2e integration tests making it easy to silently introduce regressions.This PR introduces a new testing infrastructure based on expect tests that optimizes ease of writing tests, code reviewing tests and updating existing tests once the expected behavior changes. Some test like
DefinitionSuite
stress a range of important high-level properties like "every identifier token should resolve to a definition and every non-identifier token should not resolve to a definition". These property check have so far been very effective at catching bugs.