Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add activation feature for CPU/Memory scaler #6231

Closed
wants to merge 1 commit into from

Conversation

kunwooy
Copy link
Contributor

@kunwooy kunwooy commented Oct 11, 2024

Currently, the cpu & memory scaler lacks the activation feature because it delegates the scaling responsibilities to the built-in Kubernetes HPA controller. As a result, even if the scale target is currently scaled-out by the cpu/memory metric being above the threshold value, if some other Keda scalers which use External Metrics are used in conjunction with the cpu/memory scaler, it will be deactivated (and thus scaled to zero) when all other scalers using External Metrics are deactivated.

Hence, my proposal is to introduce a way to check the activation of the cpu/memory scaler. Since the scaling behavior will be handled by the HPA controller, cpu/memory scaler only needs to feed in the activation value to the scaled object controller in its GetMetricsAndActivity() method. Moreover to enable such feature, I introduce activationValue field in cpu/memory trigger's metadata.

(I have accidentally closed the last pull request: #6174)

Checklist

Fixes #6057

Relates to #

@kunwooy kunwooy requested a review from a team as a code owner October 11, 2024 01:34
@kunwooy
Copy link
Contributor Author

kunwooy commented Oct 11, 2024

@JorTurFer You suggested that I query the metrics server directly instead of querying HPA. I have modified the code (#6174 (comment)).

If approved, modification to the helm chart is needed since the operator's service account needs permission on pods.metrics.k8s.io APIs

@kunwooy kunwooy force-pushed the main branch 3 times, most recently from c8f6e16 to 0e3f923 Compare October 11, 2024 08:18
@kunwooy kunwooy force-pushed the main branch 9 times, most recently from 18293de to 943fce1 Compare October 14, 2024 03:56
@SpiritZhou
Copy link
Contributor

Do you mind updating the CHANGELOG.md as well?

@kunwooy
Copy link
Contributor Author

kunwooy commented Oct 16, 2024

@SpiritZhou I updated the CHANGELOG.md too.

@kunwooy kunwooy force-pushed the main branch 3 times, most recently from 2fb091a to df7e6f6 Compare October 23, 2024 05:49
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I quickly checked the code, I haven't read full the getAverage... functions.

But I would like to see more e2e tests (covering cases like scale 0->1, 1->0, activation with just a single cpu/mem scaler, etc ...to cover all possible cases).

I also think that this feature should be released first as and experimental one.

pkg/k8s/metricsclient.go Outdated Show resolved Hide resolved
err := config.TypedConfig(&meta)
func getScaleTarget(scalableObjectName, scalableObjectNamespace string, kubeClient client.Client) (string, string, error) {
scaledObject := &kedav1alpha1.ScaledObject{}
err := kubeClient.Get(context.Background(), types.NamespacedName{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a specific reason to use context.Background()? If not, then we should pass ctx from the top level here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used context.Background() because neither of its top level functions parseResourceMetadata() and NewCPUMemoryScaler has context. If there is any other way, I'd be happy to know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to add the param there, you can see that ctx is also added to a few other scalers

pkg/scalers/cpu_memory_scaler.go Show resolved Hide resolved
func parseResourceMetadata(config *scalersconfig.ScalerConfig, logger logr.Logger) (cpuMemoryMetadata, error) {
meta := cpuMemoryMetadata{}
err := config.TypedConfig(&meta)
func getScaleTarget(scalableObjectName, scalableObjectNamespace string, kubeClient client.Client) (string, string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment explaining this function

default:
return meta, fmt.Errorf("unknown metric type: %s, allowed values are 'Utilization' or 'AverageValue'", string(meta.MetricType))
}

if config.ScalableObjectType == "ScaledObject" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should fail for other types imho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that I do not check for config.ScalableObjectType in the current code, and instead return an error for other types? If that's so, should I parse the error string to see if the error is related to type? Because config.ScalableObjectType not being ScaledObject should not result in the error of the parseResourceMetadata() function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically add else and if the type is different fire an error mentioning that the type is not supported

pkg/scalers/cpu_memory_scaler.go Show resolved Hide resolved
pkg/scalers/cpu_memory_scaler.go Show resolved Hide resolved

labelSelector = labels.SelectorFromSet(statefulSet.Spec.Selector.MatchLabels)
default:
return nil, nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about other types?

See if this can be reused:

gvk := obj.Status.ScaleTargetGVKR.GroupVersionKind()

Copy link
Contributor Author

@kunwooy kunwooy Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added GVK as the means to reference the scale target. However in the default case, there is still no way to fetch the MatchLabels field which is required to select the corresponding PodMetrics

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try to check if we can create a duck type that would desribe the resource, similar as it is done in the referenced code for podspec or here https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/withtriggers_types.go

If we are not able to do that, then we should check supported scaletargets when we are creating the scaler and fail in that case

@kunwooy
Copy link
Contributor Author

kunwooy commented Nov 5, 2024

Thanks, I quickly checked the code, I haven't read full the getAverage... functions.

But I would like to see more e2e tests (covering cases like scale 0->1, 1->0, activation with just a single cpu/mem scaler, etc ...to cover all possible cases).

I also think that this feature should be released first as and experimental one.

@zroubalik I have added comments and made some changes you pointed out.

However about the e2e test, you should note that this feature does not enable CPU/Memory scalers to activate itself and scale from zero because it is impossible to collect CPU/Memory data when there is no pod running. So the only difference this commit brings is the case where a CPU/Memory scaler is used with other external trigger scaler, and that external trigger is deactivated while the cpu/memory trigger is activated. When that happens, the deployment/statefulset should not scale to zero (while previously there were no activation feature for cpu/memory trigger and thus was scaled to zero). You can check out the e2e test for the above case here:

https://github.com/kunwooy/keda/blob/d5ad0944617d9a4e1efac5685dbcb3ed9cf61507/tests/scalers/cpu/cpu_test.go?plain=1#L255

Also, how do I move the feature to an experimental feature?

@kunwooy kunwooy force-pushed the main branch 2 times, most recently from d5ad094 to 19d9a2b Compare November 6, 2024 07:57
@kunwooy kunwooy force-pushed the main branch 6 times, most recently from 8d0d995 to 87f5419 Compare November 7, 2024 04:55
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ad experimental feature - this is just a matter of documentation and we should also pring info log message when a scaler with this feature is created

default:
return meta, fmt.Errorf("unknown metric type: %s, allowed values are 'Utilization' or 'AverageValue'", string(meta.MetricType))
}

if config.ScalableObjectType == "ScaledObject" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically add else and if the type is different fire an error mentioning that the type is not supported


labelSelector = labels.SelectorFromSet(statefulSet.Spec.Selector.MatchLabels)
default:
return nil, nil, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try to check if we can create a duck type that would desribe the resource, similar as it is done in the referenced code for podspec or here https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/withtriggers_types.go

If we are not able to do that, then we should check supported scaletargets when we are creating the scaler and fail in that case

@kunwooy
Copy link
Contributor Author

kunwooy commented Nov 10, 2024

@zroubalik I'm currently unavailable right now. I'll make the changes and let you know within next week.

Copy link

stale bot commented Jan 9, 2025

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Jan 9, 2025
Copy link

stale bot commented Jan 18, 2025

This issue has been automatically closed due to inactivity.

@stale stale bot closed this Jan 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale All issues that are marked as stale due to inactivity
Projects
None yet
3 participants