Skip to content
This repository has been archived by the owner on Jan 22, 2021. It is now read-only.

Move to config file for configuration #19

Closed
jsturtevant opened this issue Aug 20, 2018 · 9 comments
Closed

Move to config file for configuration #19

jsturtevant opened this issue Aug 20, 2018 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@jsturtevant
Copy link
Collaborator

Currently all of the metadata required to pull the metrics from Azure are hosted on the HPA, for example on the external metrics HPA:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
 name: consumer-scaler
spec:
 scaleTargetRef:
   apiVersion: extensions/v1beta1
   kind: Deployment
   name: consumer
 minReplicas: 1
 maxReplicas: 10
 metrics:
  - type: External
    external:
      metricName: queuemessages
      metricSelector:
        matchLabels:
          metricName: Messages
          resourceGroup: sb-external-example
          resourceName: sb-external-ns
          resourceProviderNamespace: Microsoft.Servicebus
          resourceType: namespaces
          aggregation: Total
          filter: EntityName_eq_externalq
      targetValue: 30

This creates overhead for the end user especially if they were to try to map Odata queries to label selectors as I previously suggested.

Another interesting result of this the conversion is when using metrics names. With custom metrics the end user must convert the Application Insights metric name from performanceCounters/requestsPerSecond to performanceCounters-requestsPerSecond replacing the / to a -.

With external metrics the "metric name" is not used at all and instead pulled from the selector. This is becuase the metricname in the URL is lower cased and Azure monitor is case sensitive the the metric names, for example Service Bus Messages cannot be messages which happens when the Metric name is pass via url via hpa.

Proposal

To create a better experience for the developer I propose we use a config file that is load via configmap to hold this meta data. The Prometheus Adapter uses the idea of a config though I believe we don't need it to be as complex (yet). This would allow a few scenarios:

  • makes cognitive overhead for developer converting azure meta data simpler
  • a cluster administrator could restrict which metrics are available through the configuration file (if not listed in the file then would not be able to retrieve metric)
  • all for more complex configuration when writing queries

To help even further it would be possible to provide tooling that would auto generate this config file based on the service principal that has access to azure resources.

Example config

#optional sub id that will be used by all external metrics (unless overriden on metric definition)
subscirptionId: 12345678-1234-1234-1234-12345678901234 
#optional appid that will be used by all custom metrics (unless overriden on metric definition)
applicationId: 12345678-1234-1234-1234-123456789012

#list all external metrics values are obtained from https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-supported-metrics
external:
  - metric: queuemessages #this is the name that will be referenced by hpa 
    metricName: Messages #azure name - is Case sensitive
    resourceGroup: sb-external-example
    resourceName: sb-external-ns
    resourceProviderNamespace: Microsoft.Servicebus #this can container slashes (/)
    resourceType: namespaces
    aggregation: Total
    filter: EntityName eq 'externalq'  #any valid odata filter
    subscriptionID: 12345678-1234-1234-1234-12345678901234 #optional (override global)

#list all custom metrics
custom:
  - metric: rps # this is the name that will be refrenced by hpa
    metricName: performanceCounters/requestsPerSecond # azure name which containers slashes
    applicationId: 12345678-1234-1234-1234-123456789012 #optional (override global)
  # use with ai queries 
  - metric: requests # this is the name that will be refrenced by hpa
    # AI query as defined at https://docs.loganalytics.io/docs/Language-Reference/Query-statements
    query: requests | where timestamp > ago(30d) and client_City == "Redmond" | summarize clients = dcount(client_IP) by tod_UTC=bin(timestamp % 1d, 1h), resultCode | extend local_hour = (tod_UTC - 8h) % 24h
    applicationId: 12345678-1234-1234-1234-123456789012 #optional (override global)

Feedback

Please provide any feedback you might have an the above proposal. Thanks!

@jsturtevant
Copy link
Collaborator Author

PTAL @tomkerkhove @marc-sensenich

@tomkerkhove
Copy link
Member

I get your concern I think this is a good approach but I have a few concerns:

  • Given there will only be one adapter installed per cluster this can be tricky:
    • One bad change can impact all HPA's across all namespaces
    • We're losing flexibility because one party will manage this adapter while the other might want to add their own config which can take a while or be denied

As an example, currently we are sharing one cluster across different teams where this could be a problem in terms of governance.

That said, I like the idea of having certain parts configured in the core such as default subscription and default metrics.

Extended proposal

Personally I think you're idea is perfect but that it needs to be taken a step further so that you can deploy the adapter with its configuration, but that it's a simple pointer to more detailed metrics definition for a specific namespace or "metric group".

This might sound a bit of overkill but provides more flexibility towards the consumer and less overhead for the admins.

HPA Configuration

Simplified down to the name of the metric on which you'd like to scale and in which metrics group it belongs:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
 name: consumer-scaler
spec:
 scaleTargetRef:
   apiVersion: extensions/v1beta1
   kind: Deployment
   name: consumer
 minReplicas: 1
 maxReplicas: 10
 metrics:
  - type: External
    external:
      metricName: queuemessages
      metricSelector:
        matchLabels:
          metricName: order-queue-size
          metricsGroup: order-service-metrics
      targetValue: 30

Adapter Configuration

The adapter then gets the correct configuration based on the following config:

# List of all metric groups which are defined in the cluster
# Allows you to configure one or more metric groups from the same namespace but with a different name
metricGroups:
 - name: order-service-metrics
   namespace: order-service
   config-map: order-metrics-declaration
 - name: order-service-custom-metrics
   namespace: order-service
   config-map: custom-order-metrics-declaration
 - name: warehouse-monitor-metrics
   namespace: warehouse-service
   config-map: warehouse-monitor-metrics

Caveat - This way of linking to a config-map is probably not ideal but just an indication of the approach

Metric Group Configuration

Configuration map in the mentioned namespace is then the spec that you've proposed:

#list all external metrics values are obtained from https://docs.microsoft.com/en-us/azure/monitoring-and-diagnostics/monitoring-supported-metrics
external:
  - metric: queuemessages #this is the name that will be referenced by hpa 
    metricName: Messages #azure name - is Case sensitive
    resourceGroup: sb-external-example
    resourceName: sb-external-ns
    resourceProviderNamespace: Microsoft.Servicebus #this can container slashes (/)
    resourceType: namespaces
    aggregation: Total
    filter: EntityName eq 'externalq'  #any valid odata filter
    subscriptionID: 12345678-1234-1234-1234-12345678901234 #optional (override global)

#list all custom metrics
custom:
  - metric: rps # this is the name that will be refrenced by hpa
    metricName: performanceCounters/requestsPerSecond # azure name which containers slashes
    applicationId: 12345678-1234-1234-1234-123456789012 #optional (override global)
  # use with ai queries 
  - metric: requests # this is the name that will be refrenced by hpa
    # AI query as defined at https://docs.loganalytics.io/docs/Language-Reference/Query-statements
    query: requests | where timestamp > ago(30d) and client_City == "Redmond" | summarize clients = dcount(client_IP) by tod_UTC=bin(timestamp % 1d, 1h), resultCode | extend local_hour = (tod_UTC - 8h) % 24h
    applicationId: 12345678-1234-1234-1234-123456789012 #optional (override global)

Note - I think this can be improved a bit in terms of readability but looks good. Only thing I would do is provide a bit more structure in what is related to what, a bit similar to how I currently do it for Promitor

Security

When we go this route this also means that we can take security a step further and configure the authentication in the "metrics group" configuration so that multiple service principles can be used for authentication.

Visual Flow

azure-adapter-config

@tomkerkhove
Copy link
Member

Was thinking about my proposal again and I'm not really sure if it's even possible to get those referenced ConfigMaps, do you know?

Instead of using the name it would probably be better to use a label selector - Either with specified labels or built-in ones which we enforce (which is less nice).

@jsturtevant
Copy link
Collaborator Author

It would be possible to pull the config maps for those referenced files via k8 api. The challenge would come that it would add over head in terms of configuring RBAC on those resources and the extra over head of configuration. I do like the added benefits (not having one bad actor fail entire adapter and management separation of configs.

Another thing to consider is how to restrict access to metrics across the different name spaces. There are two scenarios to consider:

  • External metrics - ensure access to correct external metric. If a request from one teams namespace do we ensure they can get access to another teams metrics
  • Custom Metrics - finding and accessing the metrics across name spaces. This is a bit more straight forward, we can use resource types and namespace labels to insure we only grab metrics where the hpa originated. One of the challenges here is the application insights sdk is't kubernetes aware currently (doesn't attach meta data to the metrics being sent so that namespace label is not currently there unless you do report the metric your self)

@tomkerkhove
Copy link
Member

An idea is that we could handle this via the "Adapter Configuration".

If my HPA mentioned these selectors:

 metricSelector:
        matchLabels:
          metricName: order-queue-size
          metricsGroup: order-service-metrics

You can see to which namespace it's assigned and check if it matches:

metricGroups:
 - name: order-service-metrics
   namespace: order-service
   config-map: order-metrics-declaration
 - name: order-service-custom-metrics
   namespace: order-service
   config-map: custom-order-metrics-declaration
 - name: warehouse-monitor-metrics
   namespace: warehouse-service
   config-map: warehouse-monitor-metrics

@jsturtevant jsturtevant added enhancement New feature or request help wanted Extra attention is needed question Further information is requested labels Aug 23, 2018
@jsturtevant
Copy link
Collaborator Author

That would partially solve the issue. If we look at two teams working in the same cluster in different namespaces, and even across different Azure subscriptions, we want to make sure team 1 couldn't access metrics from team 2, which would still be possible if we only had the check above.

External Metrics

I've been thinking about how to to do this, especially in the case of external metrics. In a metrics system where you can tag metrics, you would be able to use the namespace value and ensure that a request from an HPA within namespace would only get values tagged with that namespace. In the case of Azure external metrics coming from Azure Monitor, we don't have the ability to tag the metric, but we can tag the resource in Azure. If we tag the resource then the adapter can use the namespace and check that the resource is assigned to the "namespace" before returning the value. Since each team could effectively own each resource (team 1 doesn't have access to team 2's subscription) this resolves the access issue.

Tags can easily be applied to a resource:

jsonrtag=$(az resource show -g examplegroup -n examplevnet --resource-type "Microsoft.Network/virtualNetworks" --query tags)
rt=$(echo $jsonrtag | tr -d '"{},' | sed 's/: /=/g')
az resource tag --tags $rt namespace=team2 -g examplegroup -n examplevnet --resource-type "Microsoft.Network/virtualNetworks"

Custom Metrics

In Application metrics, we can have properties for each metric that can be filtered on, so we can add kubernetes metadata to the properties and filter based on namespace. The challenge is when we look at existing AI SDKs, only the C# library has an addon to add the metadata currently. For now, we can instruct users to add the metadata via property initializers. It would be a required step to use the adapter.

Root scoped metrics

Enabling these checks will help a customer set up a secure environment by default. The use of root scope metrics and relaxing the default policy on the HPA (which will be scoped to namespaced metrics) will allow for accessing the metrics without namespaces if needed/desired. This will require the cluster admin to make a explicit decsion.

@jsturtevant
Copy link
Collaborator Author

jsturtevant commented Aug 25, 2018

Metric Groups

Based on the conversation so far, I think the configuration file will have the following format:

# optional
metricGroups:
 - name: order-service-metrics
   namespace: order-service
   config-map: order-metrics-declaration #contains definitions for external/custom metrics as defined below
# optional
external:
  - metric: queuemessages #this is the name that will be referenced by hpa 
    # The metric properties described above
# optional
custom:
  - metric: rps # this is the name that will be refrenced by hpa
    # metric properties described above

This will allow for quick configuration for small teams yet give the option for metrics groups for teams that need the flexibilty. We will not dynamically load the config map through an api but instead rely on the deployment to be configured with the aditional config map. The adapter will periodically look for updates and read the config map, if the new configmap is invalid it will rely on the older configuration and will not fail the entire adapter.

@tomkerkhove
Copy link
Member

I like the last suggestion for the configuration file!

However, I'm a bit concerned about tagging resources and using the custom properties in Application Insights for filtering metrics as this requires (significant) setup just to get metrics.

Is there a reason why we can't restrict metric groups for all namespaces expect for them mentioned one? We know where the HPA is running and to what namespace the metrics group is restricted so in theory we know if they are allowed to scrape it, no?

If this is about the adapter scraping subscriptions for a group to which it is not allowed we might consider moving the authentication information to the namespace as well and introduce authentication to the group itself which is pointing to a metric?

Sample:

# optional
metricGroups:
 - name: order-service-metrics
   namespace: order-service
   config-map: order-metrics-declaration #contains definitions for external/custom metrics as defined 
   authentication: order-metrics-auth-secret #contains authentication secrets for accessing Azure Monitor
# optional
external:
  - metric: queuemessages #this is the name that will be referenced by hpa 
    # The metric properties described above
# optional
custom:
  - metric: rps # this is the name that will be refrenced by hpa
    # metric properties described above

@jsturtevant
Copy link
Collaborator Author

@tomkerkhove I think your right about the additional overhead for external metrics. When it comes to App Insights I think having the extra meta data will be come important in the long run any ways. If you use a single app insights instance for more than one app you will need to add that meta data to be able to filter by application. In addition I could see the need for getting detailed metric across a given application and filtering by pod to find a mis-behaving instance. But that is something we can easily handle through the configuration.

Adding the authentication to the metric groups will give that added check I was concerned about. Unless anyone else has concerns or input we are good on the final design and can iterate on it as it's developed. Thanks for all the input!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants