Skip to content

Commit

Permalink
Add storage cluster to stress environments. Update federated auth doc…
Browse files Browse the repository at this point in the history
…umentation. (#8588)

* Document federated identity credential auth in stress clusters

* Add custom storage cluster to stress infra configs. Update AKS base version

* Update tools/stress-cluster/chaos/README.md

Co-authored-by: Richard Park <[email protected]>

* Update tools/stress-cluster/chaos/README.md

Co-authored-by: Richard Park <[email protected]>

* Update tools/stress-cluster/chaos/README.md

Co-authored-by: Richard Park <[email protected]>

* Update tools/stress-cluster/chaos/README.md

Co-authored-by: Richard Park <[email protected]>

* Support regex/negative regex filters for stress test discovery. Add storage env defaults

* Add storage environment pipeline and deploy conditionals for stress

* Update addons changelog

* Doc feedback

---------

Co-authored-by: Richard Park <[email protected]>
  • Loading branch information
benbp and richardpark-msft authored Jul 11, 2024
1 parent ac07661 commit 602310a
Show file tree
Hide file tree
Showing 13 changed files with 147 additions and 58 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ function ParseChart([string]$chartFile) {

function MatchesAnnotations([hashtable]$chart, [hashtable]$filters) {
foreach ($filter in $filters.GetEnumerator()) {
if (!$chart["annotations"] -or $chart["annotations"][$filter.Key] -ne $filter.Value) {
if (!$chart["annotations"] -or $chart["annotations"][$filter.Key] -notmatch $filter.Value) {
return $false
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,12 @@ function DeployStressTests(
}
$clusterGroup = 'rg-stress-cluster-prod'
$subscription = 'Azure SDK Test Resources'
} elseif ($environment -eq 'storage') {
if ($clusterGroup -or $subscription) {
Write-Warning "Overriding cluster group and subscription with defaults for 'storage' environment."
}
$clusterGroup = 'rg-stress-cluster-storage'
$subscription = 'XClient'
} elseif (!$clusterGroup -or !$subscription) {
throw "clusterGroup and subscription parameters must be specified when deploying to an environment that is not pg or prod."
}
Expand Down
28 changes: 28 additions & 0 deletions eng/pipelines/stress-test-release-storage.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
pr: none

trigger: none

parameters:
- name: Environment
type: string
default: storage
values:
- storage
- pg
- prod
- name: TestRepository
displayName: Stress Test Repository
type: string
default: java-storage
values:
- java-storage
- name: DeployFromBranchOrCommit
type: string
default: main

extends:
template: /eng/pipelines/templates/jobs/stress-test-release.yml
parameters:
Environment: ${{ parameters.Environment }}
TestRepository: ${{ parameters.TestRepository }}
DeployFromBranchOrCommit: ${{ parameters.DeployFromBranchOrCommit }}
2 changes: 1 addition & 1 deletion eng/pipelines/stress-test-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ parameters:
values:
- all
- examples
- javascript
- java
- javascript
- net
- python
- go
Expand Down
14 changes: 10 additions & 4 deletions eng/pipelines/templates/jobs/stress-test-release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,17 +15,21 @@ jobs:
- template: /eng/pipelines/templates/variables/globals.yml
strategy:
matrix:
${{ if eq(parameters.TestRepository, 'java-storage') }}:
java_storage:
Repository: Azure/azure-sdk-for-java
Filters: '@{ "environment" = "storage" }'
${{ if or(eq(parameters.TestRepository, 'examples'), eq(parameters.TestRepository, 'all')) }}:
examples:
Repository: Azure/azure-sdk-tools
Filters: '@{ "example" = "true" }'
${{ if or(eq(parameters.TestRepository, 'javascript'), eq(parameters.TestRepository, 'all')) }}:
javascript:
Repository: Azure/azure-sdk-for-js
Filters: '@{}'
${{ if or(eq(parameters.TestRepository, 'java'), eq(parameters.TestRepository, 'all')) }}:
java:
Repository: Azure/azure-sdk-for-java
Filters: '@{ "environment" = "^$" }'
${{ if or(eq(parameters.TestRepository, 'javascript'), eq(parameters.TestRepository, 'all')) }}:
javascript:
Repository: Azure/azure-sdk-for-js
Filters: '@{}'
${{ if or(eq(parameters.TestRepository, 'net'), eq(parameters.TestRepository, 'all')) }}:
net:
Expand Down Expand Up @@ -67,6 +71,8 @@ jobs:
azureSubscription: Azure SDK Test Resources
${{ if eq(parameters.Environment, 'pg') }}:
azureSubscription: Azure SDK Playground
${{ if eq(parameters.Environment, 'storage') }}:
azureSubscription: storage-sdk-stress-tests
scriptType: pscore
scriptPath: $(System.DefaultWorkingDirectory)/$(Repository)/eng/common/scripts/stress-testing/deploy-stress-tests.ps1
arguments:
Expand Down
41 changes: 34 additions & 7 deletions tools/stress-cluster/chaos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The chaos environment is an AKS cluster (Azure Kubernetes Service) with several
* [Creating a Stress Test](#creating-a-stress-test)
* [Layout](#layout)
* [Stress Test Metadata](#stress-test-metadata)
* [Stress Test Auth](#stress-test-auth)
* [Stress Test Secrets and Environment](#stress-test-secrets-and-environment)
* [Stress Test File Share](#stress-test-file-share)
* [Stress Test Azure Resources](#stress-test-azure-resources)
Expand Down Expand Up @@ -181,6 +182,33 @@ Fields in `Chart.yaml`
1. Extra fields in `annotations` can be set arbitrarily, and used via the `-Filters` argument to the [stress test deploy
script](https://github.com/Azure/azure-sdk-tools/blob/main/eng/common/scripts/stress-testing/deploy-stress-tests.ps1).

### Stress Test Auth

Stress tests are authenticated via [workload identity for AKS](https://learn.microsoft.com/azure/aks/workload-identity-overview).

To authenticate to Azure, tests can choose to use `DefaultAzureCredential` or `WorkloadIdentityCredential`,
though it is recommended to use `DefaultAzureCredential` for ease of switching between local machine runs and container runs in the cluster.
The container that a stress test runs in will have a federated identity token available in its environment that can be used to authenticate.
This credential is made available automatically by the cluster.

Azure CLI can also be used for scripting purposes provided a login step is added to the test invocation.
See an example [here](https://github.com/Azure/azure-sdk-tools/blob/main/tools/stress-cluster/chaos/examples/stress-deployment-example/templates/deploy-job.yaml)
or in the following snippet:

```
command: ['bash', '-c']
args:
- |
source $ENV_FILE &&
az login --federated-token "$(cat $AZURE_FEDERATED_TOKEN_FILE)" --service-principal -u "$AZURE_CLIENT_ID" -t "$AZURE_TENANT_ID" &&
az account set -s $AZURE_SUBSCRIPTION_ID &&
<your custom test script here using az cli commands>
```

Each federated identity is backed by a managed identity with authorization to the cluster subscription. For all permissions
given to this principal, see [workload app roles](https://github.com/Azure/azure-sdk-tools/blob/main/tools/stress-cluster/cluster/azure/cluster/workloadapproles.bicep).
When a new namespace is created in the a stress cluster, a [federated identity](https://learn.microsoft.com/graph/api/resources/federatedidentitycredentials-overview?view=graph-rest-1.0) specific to that namespace is created against a pool of managed identities by the stress watcher service (there is a hard limit of 20 federated identities per managed identity). When a namespace is deleted, the corresponding federated identity is also deleted.

### Stress Test Secrets and Environment

For ease of implementation regarding merging secrets from various Keyvault sources, secret values injected into the stress
Expand All @@ -197,19 +225,14 @@ The following environment variables are currently populated by default into the
[bicep template outputs](https://docs.microsoft.com/azure/azure-resource-manager/bicep/outputs) specified.

```
AZURE_CLIENT_ID=<value>
AZURE_CLIENT_OID=<value>
AZURE_CLIENT_SECRET=<value>
AZURE_TENANT_ID=<value>
AZURE_SUBSCRIPTION_ID=<value>
APPINSIGHTS_CONNECTION_STRING=<value>
APPINSIGHTS_INSTRUMENTATIONKEY=<value>
# Bicep template outputs inserted here as well, for example
RESOURCE_GROUP=<value>
```

Additionally, several values are made available as environment variables via the `stress-test-addons.container-env` template (see [job manifest](#job-manifest)):
Additionally, several values are made available as environment variables via the `stress-test-addons.container-env` template (see [job manifest](#job-manifest)) or by the AKS cluster:

- `GIT_COMMIT` - Matches the git commit of the repository in which the stress test was deployed from. Useful for telemetry queries.
- `ENV_FILE` - Path to the env file that can be dot sourced to load deployment and other secrets.
Expand All @@ -218,6 +241,10 @@ Additionally, several values are made available as environment variables via the
- `POD_NAMESPACE` - The kubernetes namespace the container is running in, useful for custom telemetry.
- `DEBUG_SHARE` - See [stress test file share](#stress-test-file-share)
- `DEBUG_SHARE_ROOT` - See [stress test file share](#stress-test-file-share)
- `AZURE_SUBSCRIPTION_ID` - The Azure subscription id the stress test will authenticate and deploy resources to.
- `AZURE_TENANT_ID` - The Azure tenant id the stress test will authenticate to. Set by AKS.
- `AZURE_CLIENT_ID` - The Entra principal the stress test will authenticate as. Set by AKS.
- `AZURE_FEDERATED_TOKEN_FILE` - The path to the federated identity token that can be used to login with Azure CLI or the Identity SDK. Set by AKS.

### Stress Test File Share

Expand Down Expand Up @@ -400,7 +427,7 @@ spec:
args:
- |
source $ENV_FILE &&
az login --service-principal -u $AZURE_CLIENT_ID -p $AZURE_CLIENT_SECRET --tenant $AZURE_TENANT_ID &&
az login --federated-token "$(cat $AZURE_FEDERATED_TOKEN_FILE)" --service-principal -u "$AZURE_CLIENT_ID" -t "$AZURE_TENANT_ID" &&
az account set -s $AZURE_SUBSCRIPTION_ID &&
az group show -g $RESOURCE_GROUP -o json
{{- include "stress-test-addons.container-env" . | nindent 6 }}
Expand Down
44 changes: 2 additions & 42 deletions tools/stress-cluster/cluster/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@ Table of Contents
* [Prod Cluster](#prod-cluster)
* [Local Cluster](#local-cluster)
* [Deploying Stress Test Addons](#deploying-stress-test-addons)
* [Rotating Cluster Secrets](#rotating-cluster-secrets)
* [Development](#development)
* [Bicep templates](#bicep-templates)
* [Helm templates](#helm-templates)
Expand Down Expand Up @@ -68,7 +67,7 @@ Cluster buildout and deployment involves three main steps which are automated in
First, update the `./azure/parameters/dev.json` parameters file with the values marked `// add me`, then run:

```
./provision.ps1 -env dev -LocalAddonsPath `pwd`/kubernetes/stress-test-addons
./provision.ps1 -env dev -LocalAddonsPath "$(pwd)/kubernetes/stress-test-addons"
```

To deploy stress test packages to the dev environment
Expand All @@ -79,7 +78,7 @@ resource values from the newly provisioned dev environment that are required by
Avoid checking in the updated dev values, they are for local use only.

```
<tools repo>/eng/common/scripts/stress-testing/deploy-stress-tests.ps1 -Environment dev
<tools repo>/eng/common/scripts/stress-testing/deploy-stress-tests.ps1 -Environment dev -LocalAddonsPath "$(pwd)/kubernetes/stress-test-addons"
```

## Playground Cluster
Expand Down Expand Up @@ -126,45 +125,6 @@ Steps for deploying the stress test addons helm chart:
1. Run `kubectl get pods -n examples -w` to monitor the status of each pod and look for Running/Completed and make sure there are no errors.
1. Update all the stress tests' Chart.yaml files across the other repos in the same manner.

# Rotating Cluster Secrets

Each stress cluster provisions one app/service principal with permissions to deploy resources to a subscription. This is used for stress tests that define bicep templates for live resources.

The secret is initialized in the `rg-stress-secrets-<env>` resource group in the subscription. There will be a keyvault named `stress-secrets-<env>` and will have one secret named `public`. This secret takes the format of a .env file like:

```
AZURE_CLIENT_SECRET=<secret>
AZURE_TENANT_ID=<tenant id>
AZURE_CLIENT_ID=<client id>
AZURE_SUBSCRIPTION_ID=<sub id>
AZURE_CLIENT_OID=<oid>
STRESS_CLUSTER_RESOURCE_GROUP=<rg>
```

During cluster buildout (`provision.ps1`), this is all initialized automatically, however sometimes this secret needs to be rotated on-demand (for expiration or security reasons).

To rotate the secret, find the underlying app registration for the cluster. This will match the `AZURE_CLIENT_ID` of the secret, or you can search in Azure Portal for `stress-provisioner-<env>`. Navigate to the application/app registration page, and click `Certificates & secrets` on the left side. Click `New client secret`, set expiration to 12 months and name/describe it `rbac`. When the secret is created, you will be able to copy the value.

Next, run the following to get the existing .env file secret for the stress cluster:

```
az keyvault secret show --vault-name stress-secrets-<env> -n public -o tsv --query value > stress-secret
```

Update the file, replacing the `AZURE_CLIENT_SECRET` value with the new secret value, then run:

```
az keyvault secret set --vault-name stress-secrets-<env> -n public -f ./stress-secret
```

To verify the rotation is complete, do a test run of the deployment example. From the root of `azure-sdk-tools`:

```
eng/common/scripts/stress-testing/deploy-stress-tests.ps1 -Environment <env> -SearchDirectory ./tools/stress-cluster/chaos/examples/stress-deployment-example
```

Then monitor the stress deployment and make sure the resources deployed successfully in the `init-azure-deployer` init container.

# Development

## Bicep templates
Expand Down
2 changes: 1 addition & 1 deletion tools/stress-cluster/cluster/azure/cluster/cluster.bicep
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ param updateNodes bool = false
// monitoring parameters
param workspaceId string

var kubernetesVersion = '1.26.6'
var kubernetesVersion = '1.29.4'
var nodeResourceGroup = 'rg-nodes-${dnsPrefix}-${clusterName}-${groupSuffix}'

var systemAgentPool = {
Expand Down
32 changes: 32 additions & 0 deletions tools/stress-cluster/cluster/azure/parameters/storage.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"subscriptionId": {
"value": "ba45b233-e2ef-4169-8808-49eb0d8eba0d"
},
"groupSuffix": {
"value": "storage"
},
"clusterName": {
"value": "stress-storage"
},
"clusterLocation": {
"value": "southcentralus"
},
"defaultAgentPoolMinNodes": {
"value": 5
},
"defaultAgentPoolMaxNodes": {
"value": 200
},
"tags": {
"value": {
"environment": "storage",
"owners": "bebroder",
"purpose": "stress and load testing for storage SDKs - maintained by Azure SDK (devdiv)",
"DoNotDelete": ""
}
}
}
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Release History

## 0.3.3 (2024-07-10)

### Features Added

Added new cluster 'storage' to addons environment config

## 0.3.2 (2024-05-15)

### Features Added
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,5 @@ apiVersion: v2
name: stress-test-addons
description: Baseline resources and templates for stress testing clusters

version: 0.3.2
version: 0.3.3
appVersion: v0.1
Original file line number Diff line number Diff line change
@@ -1,6 +1,15 @@
apiVersion: v1
entries:
stress-test-addons:
- apiVersion: v2
appVersion: v0.1
created: "2024-07-10T15:50:11.882316755-04:00"
description: Baseline resources and templates for stress testing clusters
digest: a10ab977bd37c9eff59cd82334a14e63612ff568dfbe591cfa521a90565d1762
name: stress-test-addons
urls:
- https://stresstestcharts.blob.core.windows.net/helm/stress-test-addons-0.3.3.tgz
version: 0.3.3
- apiVersion: v2
appVersion: v0.1
created: "2024-05-15T19:50:35.339373231-04:00"
Expand Down Expand Up @@ -217,4 +226,4 @@ entries:
urls:
- https://stresstestcharts.blob.core.windows.net/helm/stress-test-addons-0.1.2.tgz
version: 0.1.2
generated: "2024-05-15T19:50:35.33115471-04:00"
generated: "2024-07-10T15:50:11.88175622-04:00"
Loading

0 comments on commit 602310a

Please sign in to comment.