Skip to content

Commit

Permalink
merge conflicts with cni-metrics-helper chart
Browse files Browse the repository at this point in the history
Fix compilation errors (aws#1751)

add support for running canary script in different regions (aws#1752)

Regenerate pod eni values for new instance types (aws#1754)

* Regenerate pod eni values for new instance types

Co-authored-by: Senthil Kumaran <[email protected]>

Closed issue message (aws#1761)

* closed issue message

* update message

fix typo in upload script (aws#1763)

Update calico file path

Use an unique s3 bucket name (aws#1760)

Update region

Workflow to build arm and x86 images (aws#1764)

DataStore.GetStats() refactoring to simplify adding new fields (aws#1704)

* DataStore.GetStats() refactoring to simplify adding new fields

* cleanup

* cleanup

* cleanup

* goimports

* rename test to TestGetStatsV4

* address comments

* fix typo

* update

* update "IP pool is too low" logging

* GetStats() -> GetIpStats()

* GetStats() -> GetIpStats() in tests and comments

* update test

* cleanup test

* add logPoolStats comment

Fix KOPS_STATE_STORE (aws#1770)

Automation script for running IT  (aws#1759)

Update issue template

Update issue template with email address

Update issue template

Update go.mod for integration folder (aws#1741)

* Update go.mod for integration folder

- Update go.mod for integration folder

* Change integration test to use new K8s test framework

* Modify server pod image

* Switch to Nginx port 80 for server pod

* Switch server port in client test

* Remove custom command directive for Nginx pod

* Added ping command for host checks

README: mention arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy (aws#1768)

Co-authored-by: Shreya027 <[email protected]>

Add dl1.24xlarge to ENILimits override list (aws#1777)

Chart and Manifest updates (aws#1771)

* Chart and Manifest updates

* Update probe timeout values

Change workflow to use git install (aws#1785)

- Change workflow to use git install as the go get command was
  altering go.mod file without updating go.sum file
  • Loading branch information
Chinmay Gadgil committed Dec 9, 2021
1 parent f32af68 commit 32a756a
Show file tree
Hide file tree
Showing 12 changed files with 155 additions and 193 deletions.
30 changes: 1 addition & 29 deletions charts/cni-metrics-helper/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,34 +5,6 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
1 change: 1 addition & 0 deletions charts/cni-metrics-helper/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ image:

env:
USE_CLOUDWATCH: "true"
AWS_CLUSTER_ID: ""

fullnameOverride: "cni-metrics-helper"

Expand Down
88 changes: 88 additions & 0 deletions cmd/cni-metrics-helper/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,94 @@ The following diagram shows how `cni-metrics-helper` works in a cluster:

![](../../docs/images/cni-metrics-helper.png)

### Using IRSA
As per [AWS EKS Security Best Practice](https://docs.aws.amazon.com/eks/latest/userguide/best-practices-security.html), if you are using IRSA for pods then following requirements must be satisfied to succesfully publish metrics to CloudWatch

1. The IAM Role for your SA must have following policy attached

```
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData"
],
"Resource": "*"
}
]
}
```

2. You should have following ClusterRole and ClusterRoleBinding for the IRSA

```
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cni-metrics-helper
rules:
- apiGroups: [""]
resources:
- pods
- pods/proxy
verbs: ["get", "watch", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cni-metrics-helper
labels:
app.kubernetes.io/name: cni-metrics-helper
app.kubernetes.io/instance: cni-metrics-helper
app.kubernetes.io/version: "v1.9.3"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cni-metrics-helper
subjects:
- kind: ServiceAccount
name: <IRSA name>
namespace: kube-system
```

3. Specify this IRSA in the cni-metrics-helper deployment spec alongwith CLUSTER_ID as the metric dimension

```
kind: Deployment
apiVersion: apps/v1
metadata:
name: cni-metrics-helper
namespace: kube-system
labels:
k8s-app: cni-metrics-helper
spec:
selector:
matchLabels:
k8s-app: cni-metrics-helper
template:
metadata:
labels:
k8s-app: cni-metrics-helper
spec:
containers:
- env:
- name: USE_CLOUDWATCH
value: "true"
- name: CLUSTER_ID
value: "demo-cluster"
name: cni-metrics-helper
image: <image>
serviceAccountName: <IRSA name>
```
With IRSA, the above deployment spec will be auto-injected with AWS_REGION parameter and it will be used to fetch Region information.
Possible Scenarios for above configuration
1. If you are not using IRSA, then Region and CLUSTER_ID will be fetched using IMDS (should have access)
2. If you are using IRSA but have not specified CLUSTER_ID, we can still get this information if IMDS access is not blocked
3. If you have blocked IMDS access, then you must specify a value for CLUSTER_ID (metric dimension) in the deployment spec
4. If you have not blocked IMDS access but have specified CLUSTER_ID value, then it will be used.

### Installing the cni-metrics-helper
```
kubectl apply -f v1.6/cni-metrics-helper.yaml
Expand Down
16 changes: 15 additions & 1 deletion cmd/cni-metrics-helper/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -80,9 +80,23 @@ func main() {
}
}

// Fetch region, if using IRSA it be will auto injected as env variable in pod spec
// If not found then it will be empty, in which case we will try to fetch it from IMDS (existing approach)
// This can also mean that Cx is not using IRSA and we shouldn't enforce IRSA requirement
region, _ := os.LookupEnv("AWS_REGION")

// should be name/identifier for the cluster if specified
clusterID, _ := os.LookupEnv("AWS_CLUSTER_ID")

log.Infof("Using REGION=%s and CLUSTER_ID=%s", region, clusterID)

log.Infof("Starting CNIMetricsHelper. Sending metrics to CloudWatch: %v, LogLevel %s", options.submitCW, logConfig.LogLevel)

clientSet, err := k8sapi.GetKubeClientSet()
if err != nil {
log.Fatalf("Error Fetching Kubernetes Client: %s", err)
os.Exit(1)
}

rawK8SClient, err := k8sapi.CreateKubeClient()
if err != nil {
Expand All @@ -98,7 +112,7 @@ func main() {
var cw publisher.Publisher

if options.submitCW {
cw, err = publisher.New(ctx)
cw, err = publisher.New(ctx, region, clusterID)
if err != nil {
log.Fatalf("Failed to create publisher: %v", err)
}
Expand Down
10 changes: 5 additions & 5 deletions cmd/cni-metrics-helper/metrics/metrics.go
Original file line number Diff line number Diff line change
Expand Up @@ -238,11 +238,11 @@ func postProcessingHistogram(convert metricsConvert, log logger.Logger) bool {
func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.Logger) (bool, error) {
resetDetected := false

mType := family.GetType()
metricType := family.GetType()
for _, metric := range family.GetMetric() {
for _, act := range convert.actions {
if act.matchFunc(metric) {
switch mType {
switch metricType {
case dto.MetricType_GAUGE:
processGauge(metric, &act)
case dto.MetricType_HISTOGRAM:
Expand All @@ -256,7 +256,7 @@ func processMetric(family *dto.MetricFamily, convert metricsConvert, log logger.
}
}

switch mType {
switch metricType {
case dto.MetricType_COUNTER:
curResetDetected := postProcessingCounter(convert, log)
if curResetDetected {
Expand Down Expand Up @@ -316,9 +316,9 @@ func filterMetrics(originalMetrics map[string]*dto.MetricFamily,
func produceCloudWatchMetrics(t metricsTarget, families map[string]*dto.MetricFamily, convertDef map[string]metricsConvert, cw publisher.Publisher) {
for key, family := range families {
convertMetrics := convertDef[key]
mType := family.GetType()
metricType := family.GetType()
for _, action := range convertMetrics.actions {
switch mType {
switch metricType {
case dto.MetricType_COUNTER:
if t.submitCloudWatch() {
dataPoint := &cloudwatch.MetricDatum{
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-cn.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-us-gov-east-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
30 changes: 1 addition & 29 deletions config/master/cni-metrics-helper-us-gov-west-1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down
33 changes: 4 additions & 29 deletions config/master/cni-metrics-helper.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,37 +18,9 @@ metadata:
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- pods/proxy
- services
- resourcequotas
- replicationcontrollers
- limitranges
- persistentvolumeclaims
- persistentvolumes
- namespaces
- endpoints
verbs: ["list", "watch", "get"]
- apiGroups: ["extensions"]
resources:
- daemonsets
- deployments
- replicasets
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources:
- statefulsets
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources:
- cronjobs
- jobs
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources:
- horizontalpodautoscalers
verbs: ["list", "watch"]
verbs: ["get", "watch", "list"]
---
# Source: cni-metrics-helper/templates/clusterrolebinding.yaml
apiVersion: rbac.authorization.k8s.io/v1
Expand Down Expand Up @@ -89,6 +61,9 @@ spec:
- env:
- name: USE_CLOUDWATCH
value: "true"
# Optional: Should be ClusterName/ClusterIdentifier used as the metric dimension
- name: AWS_CLUSTER_ID
value: ""
name: cni-metrics-helper
image: "602401143452.dkr.ecr.us-west-2.amazonaws.com/cni-metrics-helper:v1.10.1"
serviceAccountName: cni-metrics-helper
Loading

0 comments on commit 32a756a

Please sign in to comment.