-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sharing state across analyzer instances #6324
Comments
Tagging people who might be interested in this feature/design discussion: @sharwell @tmeschter @srivatsn @JohnHamby @heejaechang @pharring |
Can we do better in terms of the safety of the concurrency model? |
If we are going to require analyzers that share state to be in the same assembly, can't any analyzers that want to share state be written as a single analyzer with multiple actions? |
We would prefer to avoid that, as it complicates the implementation of single analyzers. For example, it's currently easy to review the behavior of SA1018 (NullableTypeSymbolsMustNotBePrecededBySpace) against the documentation for that rule. It would not be nearly so easy if the analysis behavior for 150+ other violations was included in the same analyzer. |
📝 Another thing to consider here. In DotNetAnalyzers/StyleCopAnalyzers#1683 and DotNetAnalyzers/StyleCopAnalyzers#1689, we dramatically reduced the number of allocations required to register our analyzers with a new compilation. However, due to our use of shared state we are currently unable to remove the closure created here, which results in an allocation for every call we make to An ideal solution to state management would allow us to reduce or eliminate these per-compilation allocations. |
@mavasani why limiting sharing state per assembly make threading model more safe? I thought we only serialize analyzer execution not assembly? are we saying if an analyzer from a assembly register shared state, we will serialize all analyzer executions from that assembly regardless whether a particular analyzer from the assembly uses shared state or not? |
In my opinion, we don't need this goal. The intended outcome has questionable value at best, but the impact could result in very real limitations for analyzer developers. For example, in the future we may want to provide a common API for detecting generated code in a StyleCop-compliant manner, which many analyzers then use.
We've found that simply having a singly-initialized value associated with a compilation would be sufficient. For example, all we need for generated files support is to associate a single (initially empty) instance of I believe the priorities for the API should be:
More specifically, after the data is initialized we need access to this data to be lock-free. The implementation of CWT does not allow us to meet this requirement. |
@mavasani ah, I misinterpret the meaning of thread model. sorry. ignore my previous comment. |
The original post says threat, which I interpreted as a discussion of security, as opposed to thread which would be a discussion of concurrency. |
@mavasani so this reduce repeated work per each analyzer but analyzer need to repeat the same work per compilation, right? I think that is good goal. we should clear the shared state as soon as all analysis of a compilation has finished. otherwise we could get into memory leak. |
@heejaechang @sharwell We need some kind of threat model/security to ensure unrelated analyzers cannot view/modify state or worse throw exceptions based on the assumption about the data type of the state. I was proposing we start with drawing a boundary around the assembly in which analyzers exists, and can extend the API later to support sharing data across assemblies. However, we do need some measures to prevent unrelated analyzers from accidentally/purposefully corrupt shared data. |
@mavasani @sharwell 2 questions.
|
We've actually been successful in accomplishing this, but it's been a challenge with many pitfalls. We've completely changed the way we manage per-compilation state twice as a result of performance problems that snuck in. The primary benefit of a common API for us will be the elimination of many additional allocations (several hundred at least) which occur during the registration of our analyzers.
If you change the key to be a |
If you use "object" as the key instead of "string", then you can leave it up to the analyzer authors to determine "sharability". If you need private state, then use a "private static readonly object" key. If you need something visible to other analyzers in the same assembly, then use "internal", etc. #5542 had that |
@pharring that is surely a better approach, let me update the original comment accordingly. |
@mavasani can we remove this API. can one always use GetOrCreate ? that will remove people to assume there is any dependency we guarantee. |
@mavasani also, can we remove "State" from the API and rather put "Cache" in them. so that people doesn't use it as a state but use as a "Cache". implication is, "state" can break functionality but cache can't. this also remove assumption on dependency and also, we don't need to re-think our statefull and stateless analyzer story. |
Yes, probably but API design probably want to decide that. If the analyzer wants to have
I thought we explicitly want it to be strongly referenced state, that lives the lifetime of the compilation. We don't want to create conditional weak table or weakly referenced property bags or any kind of cache, that has already proven to be bad for IDE perf. Am I misunderstanding your comment? |
I would prefer this API be typesafe, for example as in #5708. |
"Yes, probably but API design probably want to decide that. If the analyzer wants to have null as an allowable value for the data for exceptional circumstances, then a reader invoking GetOrCreate needs to do an additional null check. Just a minor API preference.." user can still return null from GetOrCreateXXX if they want to. "I thought we explicitly want it to be strongly referenced state, that lives the lifetime of the compilation. We don't want to create conditional weak table or weakly referenced property bags or any kind of cache, that has already proven to be bad for IDE perf. Am I misunderstanding your comment?" this is not what I meant. implementation-wise, leave it exactly as you suggested. what I meant was just remove "State" from API and rather put "Catch" in them. and tell them you can't depends on the data for your functional correctness. so stateless or statefull analyzer both still can use this API without making them stateful analyzer. conjunction with the removing of "TryXXXX" API, all user of the API, which uses GetOrCreateXXX should be able to create the cache if it doesn't exist. |
I like the simplicity of a single For this to work, it is essential that the fast path (case where the item already exists) be very efficient in a concurrent environment, because it could be invoked in each of the analysis callbacks (potentially dozens of times per
We could get this by replacing the |
@gafter I have updated the proposed API to take a
Why would we do so? The fact that you want per-compilation shared state makes you a stateful analyzer, and we should be able to guarantee functional correctness by keeping this object alive as long as the underlying compilation lives. We can just store the state on CompilationWithAnalyzers instance, which has strong reference to the underlying compilation on which analyzers are executed. Analyzers can safely assume that we will initialize this state only once per compilation. |
Define a class public sealed class Key<T> { } Then update the API to require the key's type match the created value's type: TState GetOrCreateSharedState<TState>(Key<TState> key, Func<object, TState> createSharedState)
where TState: class |
@sharwell Yes, almost like that. get rid of the object: TState GetOrCreateSharedState<TState>(Key<TState> key, Func<Key<TState>, TState> createSharedState)
where TState : class |
@mavasani first analyzer become stateful analyzer. you get one benefit and you lost one - concurrency since state basically implies dependency. since it is state that they relies on for functionality, the order of state mutation must be correct and observable from all callers. making it not state is much simpler and stateless analyzer can use it as well. |
Here's a proposal to cover reading an external resource from multiple analyzers. It does not cover sharing mutable state between analyzers. A significant aspect of the issues discussed here is a need for multiple analyzers to read configuration, options, or directive context from external resources. Reading and parsing a resource can be expensive, motivating a desire to read a given resource once per compilation or per analysis session rather than once per analyzer per compilation. Here is a proposed API with the following goals: To be added to: We propose: Semantics: Caching can cross contexts. For example, if there is a call to AnalysisContext.TryGetResource in one analyzer and an equivalent call to CompilationStartAnalysisContext.TryGetResource in another, the resource will likely be read only once. The lifetime of a cached result is arbitrary, but is expected to be as long as practical without introducing excessive memory pressure. Expectations on clients: Violating these expectations can produce difficult-to-explain behavioral inconsistencies. |
I don't understand the need for either of these limitations. Can you clarify how this benefits the implementation and the reason it needs to be different from the behavior described by your other expectations? |
Violating these restrictions might produce a surprise. If the resource changes between invocations, the result you get back will also change. If the resource has not changed and you supply an equivalent reader delegate, you might get back an identical object from two invocations, or you might get different (but equivalent) results. If you supply a different delegate to two invocations with the same key, the second invocation might return the results of the first and not call the delegate supplied with the second invocation. The implementation won't enforce these restrictions. It also won't do anything to guarantee predictable behavior if they are violated. |
Based on the description in the previous comment, none of these behaviors is unexpected. I think what you currently have documented as an expectation on clients would be better worded as a recommendation that care be taken, along with clear examples like you provided in the follow-up. With the expectation in place, clients will not be able to leverage this functionality even if the developers understand the special considerations and know that they would not negatively impact their analyzers. |
Thanks @JohnHamby for the proposal. |
Overall, yes. Some questions remain which may or may not be in scope of this:
|
Additional files: http://source.roslyn.io/Microsoft.CodeAnalysis/R/b33eb1a9beed263a.html
Seems reasonable to me. I'll file a separate issue for it. |
…ross analyzer actions and also across different analyzer instances. Fixes dotnet#6324
Currently, our diagnostic analyzer API provides no support for sharing any state between diagnostic analyzers. This causes extreme pain or performance issues for our customers when they have some expensive computation and/or large data to share across analyzer instances. For example:
An ideal solution here is to provide an analyzer API to share per-compilation and per-additional file data across analyzer instances, so that analyzer authors don't have to explicitly deal with lifetime management for such state. I am going to propose an initial API for this feature, and we can iterate over the best design for it.
Principles:
Proposed API (per-compilation state):
Add the below API to each DiagnosticAnalysisContext parameter type for analyzer callbacks:
where
public sealed class Key<T> { }
is a separate definition.The text was updated successfully, but these errors were encountered: