Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor settings management for plugin #1393

Closed
wants to merge 18 commits into from

Conversation

jmazanec15
Copy link
Member

@jmazanec15 jmazanec15 commented Jan 18, 2024

Description

This PR refactors how the k-NN plugin handles settings - it does not change any functionality. Up until now, we had a class called KNNSettings that contained all setting definitions as well as different utility functions for interacting with the settings. The KNNSettings class implements the singleton pattern.

One problem with the KNNSettings class is that it creates a tight coupling of different components of the plugin. For instance, a setting for the model cache is declared in the same place as a setting for filter query. This coupling makes it harder to separate out the different functionality of plugin. This change resolves this by declaring settings in the class they belong to based on functionality. Then, they are registered in the KNNPlugin class.

Here are where the settings are moved to:

  1. KNN_SPACE_TYPE = "index.knn.space_type" -> LegacyFieldMapper.java
  2. KNN_ALGO_PARAM_M = "index.knn.algo_param.m" -> LegacyFieldMapper.java
  3. KNN_ALGO_PARAM_EF_CONSTRUCTION = "index.knn.algo_param.ef_construction" -> LegacyFieldMapper.java
  4. KNN_ALGO_PARAM_EF_SEARCH = "index.knn.algo_param.ef_search" -> LegacyFieldMapper.java
  5. KNN_ALGO_PARAM_INDEX_THREAD_QTY = "knn.algo_param.index_thread_qty" -> JNIService.java
  6. KNN_MEMORY_CIRCUIT_BREAKER_ENABLED = "knn.memory.circuit_breaker.enabled" -> KNNCircuitBreaker.java
  7. KNN_MEMORY_CIRCUIT_BREAKER_LIMIT = "knn.memory.circuit_breaker.limit" -> KNNCircuitBreaker.java
  8. KNN_CIRCUIT_BREAKER_TRIGGERED = "knn.circuit_breaker.triggered" -> KNNCircuitBreaker.java
  9. KNN_CIRCUIT_BREAKER_UNSET_PERCENTAGE = "knn.circuit_breaker.unset.percentage" -> KNNCircuitBreaker.java
  10. KNN_CACHE_ITEM_EXPIRY_ENABLED = "knn.cache.item.expiry.enabled" -> NativeMemoryCacheManager.java
  11. KNN_CACHE_ITEM_EXPIRY_TIME_MINUTES = "knn.cache.item.expiry.minutes" -> NativeMemoryCacheManager.java
  12. KNN_PLUGIN_ENABLED = "knn.plugin.enabled" -> KNNPlugin.java
  13. KNN_INDEX = "index.knn" -> KNNCodecService.java
  14. MODEL_INDEX_NUMBER_OF_SHARDS = "knn.model.index.number_of_shards" -> ModelDao.java
  15. MODEL_INDEX_NUMBER_OF_REPLICAS = "knn.model.index.number_of_replicas" -> ModelDao.java
  16. MODEL_CACHE_SIZE_LIMIT = "knn.model.cache.size.limit" -> ModelCache.java
  17. ADVANCED_FILTERED_EXACT_SEARCH_THRESHOLD = "index.knn.advanced.filtered_exact_search_threshold" -> KNNWeight.java

Along with this, another problem is that the KNNSettings has a set of utility functions that rely on singleton pattern. The singleton pattern makes testing more complex and we are trying to get away from this - this is one step towards that. This change moves those utility functions to the corresponding class and then moves the setting lookup functionality to KNNClusterUtil which is a utility class with the purpose of interacting with the cluster. In the future, I would like to avoid initializing a bunch of classes statically with clusterService and instead depend on KNNClusterUtil - but this is for another review.

One important thing to note is the addition of KNNCircuitBreakerUtil, a static utility class for interacting with the circuit breaker. While these methods could have been moved into the KNNCircuitBreaker directly, it creates somewhat of a circular dependency with NativeMemoryCacheManager. To avoid this, the utility class was created. In the future, this will change when we refactor the KNNCircuitBreaker functionality.

Issues Resolved

#1017

Check List

  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Jan 18, 2024

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (89fc267) 84.92% compared to head (cb5c451) 84.97%.

Files Patch % Lines
...va/org/opensearch/knn/index/KNNCircuitBreaker.java 52.94% 11 Missing and 5 partials ⚠️
...rg/opensearch/knn/index/KNNCircuitBreakerUtil.java 78.78% 6 Missing and 1 partial ⚠️
...in/java/org/opensearch/knn/indices/ModelCache.java 69.23% 3 Missing and 1 partial ⚠️
...main/java/org/opensearch/knn/index/util/Faiss.java 0.00% 0 Missing and 2 partials ⚠️
...ain/java/org/opensearch/knn/index/util/Lucene.java 0.00% 0 Missing and 1 partial ⚠️
...ain/java/org/opensearch/knn/index/util/Nmslib.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1393      +/-   ##
============================================
+ Coverage     84.92%   84.97%   +0.05%     
+ Complexity     1274     1273       -1     
============================================
  Files           167      167              
  Lines          5186     5172      -14     
  Branches        491      487       -4     
============================================
- Hits           4404     4395       -9     
+ Misses          573      570       -3     
+ Partials        209      207       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jmazanec15 jmazanec15 added Refactoring Improve the design, structure, and implementation while preserving its functionality backport 2.x labels Jan 18, 2024
@jmazanec15 jmazanec15 marked this pull request as ready for review January 18, 2024 20:25
@heemin32
Copy link
Collaborator

One problem with the KNNSettings class is that it creates a tight coupling of different components of the plugin. For instance, a setting for the model cache is declared in the same place as a setting for filter query. This coupling makes it harder to separate out the different functionality of plugin.

What does 'separate out the different functionality of the plugin' mean? When do we need that? I think having every KNN setting in a single place itself does not have any issue unless you have to set up unnecessary resources to make the KNNSettings work, even if all you need is a simple setting value. I don't think that is the case here.

An advantage of having KNNSettings is that it is easy to find what settings are available for the KNN plugin in a single place. I am not in favor of moving all settings to their relevant class.

@jmazanec15
Copy link
Member Author

jmazanec15 commented Jan 18, 2024

@heemin32

What does 'separate out the different functionality of the plugin' mean? When do we need that? I think having every KNN setting in a single place itself does not have any issue unless you have to set up unnecessary resources to make the KNNSettings work, even if all you need is a simple setting value. I don't think that is the case here.

In the current implementation, we have utility functions for different functionality spanning across the entire plugin all in one class - KNNSettings.java. For instance, we are setting update consumers for the NativeMemoryCacheManager here: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L305., but also a method that will get the value for the filtered setting for KNNQueries: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L379. Thus, for testing something with KNNQuery filter threshold, you may have to worry about the NativeMemoryCacheManager. Overall, this makes it harder to test. Therefore, I think it makes sense for specific classes to own their settings. This is similar to what OpenSearch does: https://github.com/search?q=repo%3Aopensearch-project%2FOpenSearch%20Setting%3C&type=code.

An advantage of having KNNSettings is that it is easy to find what settings are available for the KNN plugin in a single place. I am not in favor of moving all settings to their relevant class.

All of the settings are available in the KNNPlugin.getSettings() method where they are registered. Its more so that a setting is defined in the component that owns the setting.

@heemin32
Copy link
Collaborator

Why does NativeMemoryCacheManager matter when testing the KNNQuery filter threshold, considering that setting update consumers do not touch anything regarding the KNNQuery filter threshold value?

It is a fair point is that we know what settings are available for KNN by looking at the KNNPlugin class. However, by having a separate setting class, you also get to know which part of the code is setting-related and which part is not. In that sense, how about having multiple setting classes instead of adding setting codes inside each class? For example, NativeMemorySetting, FilteringSetting, and so on.

@jmazanec15
Copy link
Member Author

Why does NativeMemoryCacheManager matter when testing the KNNQuery filter threshold, considering that setting update consumers do not touch anything regarding the KNNQuery filter threshold value?

It shouldnt matter, but currently in KNNSettings.getFilteredExactSearchThreshold, we need clusterService to execute the call. To set clusterService, we need to call initialize which sets update consumers for nativememorycachemanager: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L387. So it ends up getting tangled.

However, by having a separate setting class, you also get to know which part of the code is setting-related and which part is not. In that sense, how about having multiple setting classes instead of adding setting codes inside each class? For example, NativeMemorySetting, FilteringSetting, and so on.

We could do this. Before I make the change, though, Im curious what others might think? @VijayanB @martin-gaievski

@heemin32
Copy link
Collaborator

It shouldnt matter, but currently in KNNSettings.getFilteredExactSearchThreshold, we need clusterService to execute the call. To set clusterService, we need to call initialize which sets update consumers for nativememorycachemanager: https://github.com/opensearch-project/k-NN/blob/main/src/main/java/org/opensearch/knn/index/KNNSettings.java#L387. So it ends up getting tangled.

IMO, getting tangled is not big concern as of now. I looked at the test code and I haven't seen any improvement on the test code itself. Please point me out if I missed something.

One other thing is, I see you define KNN_ALGO_PARAM_INDEX_THREAD_QTY_SETTING setting in JNIService class and it is being used in TraningJob and KNN80DocValuesConsumer classes. Why JNIService should have it? Wouldn't it be better to have a separate setting class to hold it?

Comment on lines 46 to 51
final String percentAsString = sValue.substring(0, sValue.length() - 1);
try {
final double percent = Double.parseDouble(percentAsString);
if (percent < 0 || percent > 100) {
throw new OpenSearchParseException("percentage should be in [0-100], got [{}]", percentAsString);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you refactor this to method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

);
} else {
logger.info(
"[KNN] Cache capacity below 75% of the circuit breaker limit for all nodes."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it always 75%?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I will update to different messaging here.

knnStatsRequest.addStat(StatNames.CACHE_CAPACITY_REACHED.getName());
knnStatsRequest.timeout(new TimeValue(1000 * 10)); // 10 second timeout

KNNStatsResponse knnStatsResponse = client.execute(KNNStatsAction.INSTANCE, knnStatsRequest).get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since initialize is outside object creation, can you add non-null check for client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Initial commit of refactoring change around KNNSettings. The goal of the
refactor is to tie setting definitions with related resources and make
utility class lighter weight.

In future commits, I will clean up the KNNSettingsUtil interface and
move settings to the classes that best define them.

Signed-off-by: John Mazanec <[email protected]>
Creates a new class called KNNCircuitBreakUtil in order to perform
updating of circuit break actions. Unable to move directly into
KNNCircuitBreaker due to race conditions that would be produced with the
NativeMemoryCacheManager. In the future, CB logic will be refactored as
well.

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
@jmazanec15
Copy link
Member Author

jmazanec15 commented Jan 24, 2024

IMO, getting tangled is not big concern as of now. I looked at the test code and I haven't seen any improvement on the test code itself. Please point me out if I missed something.

@heemin32 I think my concern overall is that we have developed a lot of singleton classes, which create strange testing behavior where several static classes need to be initialize before use. For a couple examples:

  1. After each test, we need to update the singleton cache manager class to invalidate entries - https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/KNNTestCase.java#L64-L65, https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/KNNSingleNodeTestCase.java#L69-L73
  2. After each test, we need to reset counters to avoid flakiness - https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/KNNTestCase.java#L59
  3. For codec service, we need to mock all of these settings classes again - https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/index/codec/KNNCodecTestCase.java#L101-L113
  4. For KNNWeight, we have an elaborate pre-test-class setup due to frequent static classes: https://github.com/opensearch-project/k-NN/blob/main/src/test/java/org/opensearch/knn/index/query/KNNWeightTests.java#L97-L126

My thinking is that by removing KNNSettings class, we can start to move away from this singleton pattern and simplfy testing in the future. Next, we can refactor the CircuitBreaker and then the NativeMemoryCacheManager or ModelDao. However, because these all depend on KNNSettings it makes it difficult to do so.

Additionally, I dont think having all of the settings defined in a single place is good practice. For instance, we are adding settings into this class from other package, (i.e. why is MODEL_INDEX_NUMBER_OF_REPLICAS defined in org.opensearch.knn.index when the model index functionality is in org.opensearch.knn.indices).

One other thing is, I see you define KNN_ALGO_PARAM_INDEX_THREAD_QTY_SETTING setting in JNIService class and it is being used in TraningJob and KNN80DocValuesConsumer classes. Why JNIService should have it? Wouldn't it be better to have a separate setting class to hold it?

Good point. The problem with this setting is that it is ambiguous and potentially not used properly. By the name, KNN_ALGO_PARAM_INDEX_THREAD_QTY_SETTING implies that it is the number of threads used during indexing. However, this is not exactly true:

  1. As you pointed out, it is also used in TrainingJob. This is a bug. There should be a separate setting for TrainingJob
  2. For Lucene, it is not used at all.

With this, it probably makes sense to deprecate the setting and come up with a stronger convention for setting threads. But this is most likely out of scope. That being said, I think the setting should be declared with its intended use. This would be in either the KNN80DocValuesConsumer class or in the KNNCodecService class.

@jmazanec15 jmazanec15 requested a review from VijayanB January 24, 2024 16:37
@heemin32
Copy link
Collaborator

heemin32 commented Jan 24, 2024

  1. Despite the refactoring, those four examples still remain. It is due to the use of static methods, not because of a centralized Setting class.
  2. The Singleton class pattern itself is not inherently bad; the issue lies in how we implement the singleton class, specifically using static variables and methods.
  3. While I agree that refactoring CircuitBreaker, NativeMemoryCacheManager, and ModelDao is necessary for better testability, I propose achieving this by introducing their own setting class if needed.

I believe this pull request primarily addresses the preference for shaping our codebase. It may be challenging to incrementally remove static methods if we stick with current state though.

  1. Single setting class for the entire knn plugin
  2. Multiple setting classes for each feature
  3. No setting class; define settings inside each individual class.
  4. Define settings inside each individual class; if a setting is used in multiple places, create a Settings class.

The current state is option 1, and this PR aligns with option 3. Personally, I prefer option 2 as I believe it will simplify unit testing. I'm interested in hearing others' opinions as well.

@jmazanec15
Copy link
Member Author

That makes sense thanks @heemin32. I still lean towards 3, but would be good to hear from other folks @navneet1v @VijayanB

@VijayanB
Copy link
Member

That makes sense thanks @heemin32. I still lean towards 3, but would be good to hear from other folks @navneet1v @VijayanB

I prefer option 3 too

@navneet1v
Copy link
Collaborator

navneet1v commented Jan 29, 2024

@jmazanec15 , @heemin32 , @VijayanB
First thing this is a great cleanup/refactoring @jmazanec15 you have picked up. I would like to think this in terms of single responsibility principle from SOLID Principles. Ref: https://www.baeldung.com/solid-principles

Putting settings in the classes like KNNWeight.class, CircuitBreaker.class breaks the single responsibility principle, as the class lets say CircuitBreaker.class has more than 1 responsibility which is holding the logic to define, set, unset the CB settings and then also the logic to trigger CB. Same goes for other classes too. Hence in my opinion we should have separate settings classes. Now what should be those classes we can discuss on that but in terms of functionality/behavior of a settings class it should be:

  1. Provide a way for KNNPlugin.class.getSettings() method to get all the settings which are defined by that class.
  2. Provide a way to update the settings.
  3. Provide a way to read a setting(cluster level or index level).
  4. Easy to mock i.e. we should not be mocking a bunch of static functions. It should be a clean mock. (This is optional, but I want to stress on this as one of the reason why we are doing this refactoring was to provide an easy way to mock settings class.)
  1. Single setting class for the entire knn plugin
  2. Multiple setting classes for each feature
  3. No setting class; define settings inside each individual class.
  4. Define settings inside each individual class; if a setting is used in multiple places, create a Settings class.

The current state is option 1, and this PR aligns with option 3. Personally, I prefer option 2 as I believe it will simplify unit testing. I'm interested in hearing others' opinions as well.

having said all this, I am with option 2. But with a small tweak which is I am not stressing on settings class per feature as of now. We can do some other grouping but settings should be in separate classes for sure and should be coupled with other classes like CircuitBreaker, KNNWeight.class

@jmazanec15
Copy link
Member Author

Thanks @navneet1v

Putting settings in the classes like KNNWeight.class, CircuitBreaker.class breaks the single responsibility principle, as the class lets say CircuitBreaker.class has more than 1 responsibility which is holding the logic to define, set, unset the CB settings and then also the logic to trigger CB.

I would say the CircuitBreaker is an outlier. I see settings as a structured way for users to tune the functionality of the system. We implement the circuit breaker as a setting - but I believe this is a mistake and should be done (if at all) via cluster metadata as opposed to cluster settings. I havent seen other cases in the OpenSearch ecosystem that operate similarly.

Additionally (and curious your thoughts on this) I think that it is best practice for settings to limit functionality impacted to a specific component. For instance, with the circuit breaker limit - the value provided by this setting should be used for one thing - setting the cache size. In practice, this results in the value being read directly in one component.

Now what should be those classes we can discuss on that but in terms of functionality/behavior of a settings class it should be:
1. Provide a way for KNNPlugin.class.getSettings() method to get all the settings which are defined by that class.
2. Provide a way to update the settings.
3. Provide a way to read a setting(cluster level or index level).
4. Easy to mock i.e. we should not be mocking a bunch of static functions. It should be a clean mock. (This is optional, but I want to stress on this as one of the reason why we are doing this refactoring was to provide an easy way to mock settings class.)

I would say for (2), we should not make a habit of the plugin altering settings directly. I see a setting as the way a user, not the plugin, can tune functionality.

For (3), I think it would be better to use a class like KNNClusterUtil, that takes a setting and can get the value. So, something like:

KNNClusterUtil
    /**
     * Get setting value for the cluster. Return default if not set.
     *
     * @param <T> Setting type
     * @return T     setting value or default
     */
    public <T> T getClusterSetting(Setting<T> setting) {
        return clusterService.getClusterSettings().get(setting);
    }

I like this approach because it reduces the duplicated code (fewer classes depending on ClusterService) and it makes it clear that the caller is taking a dependency on a setting, which is sometimes unclear when using static utility methods.

One other point of functionality that needs to be implemented somewhere is registering settings update consumers. For the KNNCache related settings, these trigger things like cache rebuilding. So, for these, would it be better to have them registered in the settings class or in the KNNCacheClass. My feeling is registering in the KNNCache class would prevent adding extra dependencies in a generic settings class.

@jmazanec15
Copy link
Member Author

@jmazanec15 jmazanec15 closed this Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Refactoring Improve the design, structure, and implementation while preserving its functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants