-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor settings management for plugin #1393
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #1393 +/- ##
============================================
+ Coverage 84.92% 84.97% +0.05%
+ Complexity 1274 1273 -1
============================================
Files 167 167
Lines 5186 5172 -14
Branches 491 487 -4
============================================
- Hits 4404 4395 -9
+ Misses 573 570 -3
+ Partials 209 207 -2 ☔ View full report in Codecov by Sentry. |
What does 'separate out the different functionality of the plugin' mean? When do we need that? I think having every KNN setting in a single place itself does not have any issue unless you have to set up unnecessary resources to make the KNNSettings work, even if all you need is a simple setting value. I don't think that is the case here. An advantage of having KNNSettings is that it is easy to find what settings are available for the KNN plugin in a single place. I am not in favor of moving all settings to their relevant class. |
In the current implementation, we have utility functions for different functionality spanning across the entire plugin all in one class - KNNSettings.java. For instance, we are setting update consumers for the NativeMemoryCacheManager here: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L305., but also a method that will get the value for the filtered setting for KNNQueries: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L379. Thus, for testing something with KNNQuery filter threshold, you may have to worry about the NativeMemoryCacheManager. Overall, this makes it harder to test. Therefore, I think it makes sense for specific classes to own their settings. This is similar to what OpenSearch does: https://github.com/search?q=repo%3Aopensearch-project%2FOpenSearch%20Setting%3C&type=code.
All of the settings are available in the KNNPlugin.getSettings() method where they are registered. Its more so that a setting is defined in the component that owns the setting. |
Why does NativeMemoryCacheManager matter when testing the KNNQuery filter threshold, considering that setting update consumers do not touch anything regarding the KNNQuery filter threshold value? It is a fair point is that we know what settings are available for KNN by looking at the KNNPlugin class. However, by having a separate setting class, you also get to know which part of the code is setting-related and which part is not. In that sense, how about having multiple setting classes instead of adding setting codes inside each class? For example, NativeMemorySetting, FilteringSetting, and so on. |
It shouldnt matter, but currently in
We could do this. Before I make the change, though, Im curious what others might think? @VijayanB @martin-gaievski |
IMO, getting tangled is not big concern as of now. I looked at the test code and I haven't seen any improvement on the test code itself. Please point me out if I missed something. One other thing is, I see you define |
final String percentAsString = sValue.substring(0, sValue.length() - 1); | ||
try { | ||
final double percent = Double.parseDouble(percentAsString); | ||
if (percent < 0 || percent > 100) { | ||
throw new OpenSearchParseException("percentage should be in [0-100], got [{}]", percentAsString); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you refactor this to method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
); | ||
} else { | ||
logger.info( | ||
"[KNN] Cache capacity below 75% of the circuit breaker limit for all nodes." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it always 75%?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I will update to different messaging here.
knnStatsRequest.addStat(StatNames.CACHE_CAPACITY_REACHED.getName()); | ||
knnStatsRequest.timeout(new TimeValue(1000 * 10)); // 10 second timeout | ||
|
||
KNNStatsResponse knnStatsResponse = client.execute(KNNStatsAction.INSTANCE, knnStatsRequest).get(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since initialize is outside object creation, can you add non-null check for client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
Initial commit of refactoring change around KNNSettings. The goal of the refactor is to tie setting definitions with related resources and make utility class lighter weight. In future commits, I will clean up the KNNSettingsUtil interface and move settings to the classes that best define them. Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Creates a new class called KNNCircuitBreakUtil in order to perform updating of circuit break actions. Unable to move directly into KNNCircuitBreaker due to race conditions that would be produced with the NativeMemoryCacheManager. In the future, CB logic will be refactored as well. Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
8ed22d5
to
4e5b546
Compare
@heemin32 I think my concern overall is that we have developed a lot of singleton classes, which create strange testing behavior where several static classes need to be initialize before use. For a couple examples:
My thinking is that by removing Additionally, I dont think having all of the settings defined in a single place is good practice. For instance, we are adding settings into this class from other package, (i.e. why is
Good point. The problem with this setting is that it is ambiguous and potentially not used properly. By the name,
With this, it probably makes sense to deprecate the setting and come up with a stronger convention for setting threads. But this is most likely out of scope. That being said, I think the setting should be declared with its intended use. This would be in either the |
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
I believe this pull request primarily addresses the preference for shaping our codebase. It may be challenging to
The current state is option 1, and this PR aligns with option 3. Personally, I prefer option 2 as I believe it will simplify unit testing. I'm interested in hearing others' opinions as well. |
That makes sense thanks @heemin32. I still lean towards 3, but would be good to hear from other folks @navneet1v @VijayanB |
I prefer option 3 too |
@jmazanec15 , @heemin32 , @VijayanB Putting settings in the classes like KNNWeight.class, CircuitBreaker.class breaks the single responsibility principle, as the class lets say CircuitBreaker.class has more than 1 responsibility which is holding the logic to define, set, unset the CB settings and then also the logic to trigger CB. Same goes for other classes too. Hence in my opinion we should have separate settings classes. Now what should be those classes we can discuss on that but in terms of functionality/behavior of a settings class it should be:
having said all this, I am with option 2. But with a small tweak which is I am not stressing on settings class per feature as of now. We can do some other grouping but settings should be in separate classes for sure and should be coupled with other classes like CircuitBreaker, KNNWeight.class |
Thanks @navneet1v
I would say the CircuitBreaker is an outlier. I see settings as a structured way for users to tune the functionality of the system. We implement the circuit breaker as a setting - but I believe this is a mistake and should be done (if at all) via cluster metadata as opposed to cluster settings. I havent seen other cases in the OpenSearch ecosystem that operate similarly. Additionally (and curious your thoughts on this) I think that it is best practice for settings to limit functionality impacted to a specific component. For instance, with the circuit breaker limit - the value provided by this setting should be used for one thing - setting the cache size. In practice, this results in the value being read directly in one component.
I would say for (2), we should not make a habit of the plugin altering settings directly. I see a setting as the way a user, not the plugin, can tune functionality. For (3), I think it would be better to use a class like KNNClusterUtil, that takes a setting and can get the value. So, something like:
I like this approach because it reduces the duplicated code (fewer classes depending on ClusterService) and it makes it clear that the caller is taking a dependency on a setting, which is sometimes unclear when using static utility methods. One other point of functionality that needs to be implemented somewhere is registering settings update consumers. For the KNNCache related settings, these trigger things like cache rebuilding. So, for these, would it be better to have them registered in the settings class or in the KNNCacheClass. My feeling is registering in the KNNCache class would prevent adding extra dependencies in a generic settings class. |
|
Description
This PR refactors how the k-NN plugin handles settings - it does not change any functionality. Up until now, we had a class called
KNNSettings
that contained all setting definitions as well as different utility functions for interacting with the settings. TheKNNSettings
class implements the singleton pattern.One problem with the
KNNSettings
class is that it creates a tight coupling of different components of the plugin. For instance, a setting for the model cache is declared in the same place as a setting for filter query. This coupling makes it harder to separate out the different functionality of plugin. This change resolves this by declaring settings in the class they belong to based on functionality. Then, they are registered in theKNNPlugin
class.Here are where the settings are moved to:
Along with this, another problem is that the
KNNSettings
has a set of utility functions that rely on singleton pattern. The singleton pattern makes testing more complex and we are trying to get away from this - this is one step towards that. This change moves those utility functions to the corresponding class and then moves the setting lookup functionality toKNNClusterUtil
which is a utility class with the purpose of interacting with the cluster. In the future, I would like to avoid initializing a bunch of classes statically with clusterService and instead depend onKNNClusterUtil
- but this is for another review.One important thing to note is the addition of
KNNCircuitBreakerUtil
, a static utility class for interacting with the circuit breaker. While these methods could have been moved into theKNNCircuitBreaker
directly, it creates somewhat of a circular dependency withNativeMemoryCacheManager
. To avoid this, the utility class was created. In the future, this will change when we refactor the KNNCircuitBreaker functionality.Issues Resolved
#1017
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.