Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Azure authentication refactoring design #1940

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions docs/design/Refactor Azure authentication.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# **Azure Authentication Refactoring in Ratify**

## **Introduction**
Authentication is a critical process in Ratify, ensuring secure access to artifatcs in container registries, and to keys, secrets and certificates from cloud key vaults, and other resources. Azure offers two primary SDKs for authentication in Go:
shahramk64 marked this conversation as resolved.
Show resolved Hide resolved

- **Azure Identity ([azidentity](https://learn.microsoft.com/en-us/azure/developer/go/azure-sdk-authentication?tabs=bash))**: Designed for seamless integration with Azure services.
- **Microsoft Authentication Library ([MSAL](https://learn.microsoft.com/en-us/entra/identity-platform/msal-overview))**: Provides advanced token management capabilities.

Currently, Ratify uses both SDKs across different components, leading to complexity and maintenance overhead. This document proposes a comprehensive refactoring of Azure authentication in Ratify to improve maintainability, reduce duplication, and streamline the user experience.

---

## **Existing Azure Authentication in Ratify**

### **ACR Token Retrieval**
Located in the **ORAS auth providers (`pkg/common/oras/authprovider/azure`)**:
1. **Azure Managed Identity (`azureidentity.go`)**:
- Uses `azidentity.NewManagedIdentityCredential` to retrieve an access token.
- Requires only the `clientID`:
```go
id := azidentity.ClientID(clientID)
opts := azidentity.ManagedIdentityCredentialOptions{ID: id}
cred, err := azidentity.NewManagedIdentityCredential(&opts)
```

2. **Azure Workload Identity (`azureworkloadidentity.go`)**:
- Uses `confidential.NewCredFromAssertionCallback` from the **MSAL** package.

### **Key Management Provider and Certificate Provider**
Both components recently replaced the deprecated `autorest` SDK with `azidentity` and now use workload identity credentials for authentication.

---

## **Challenges with the Current Design**

### 1. **Multiple SDKs**
Ratify employs both **`MSAL`** and **`azidentity`**, increasing the maintenance burden. Consolidating to a single SDK simplifies dependency management, reduces upgrade complexity, and enhances maintainability.

### 2. **Code Duplication**
Significant code duplication exists across components, particularly between Azure workload identity and managed identity implementations. Consolidating shared logic improves maintainability.

### 3. **Explicit Authentication Selection**
susanshi marked this conversation as resolved.
Show resolved Hide resolved
Currently, users must explicitly specify the authentication type. In well-defined environments like Azure Kubernetes Service (AKS), this should be inferred automatically based on environment variables.

---

## **Proposed Refactoring**

### **Goals**
1. Design a **common package** for Azure authentication logic.
2. **Infer authentication type** automatically based on the environment, reducing user configuration overhead.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great add for almost all scenarios but there are scenarios where an override from the uesr to specify exactly the cred type might be required. Notation CLI encountered this too. We should consider exposing override capability which will not use the chained credential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. As you suggested, we can provide this ability by accepting the override from user input.

3. **Unify implementations** for workload identity and managed identity in ORAS auth providers.
4. Implement a **chained authentication process**:
akashsinghal marked this conversation as resolved.
Show resolved Hide resolved
- Workload Identity → Managed Identity → Azure CLI.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In terms of Azure CLI, does that mean Ratify CLI will also support auth to Azure?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, one of the goals of this work is to support Azure authentication in the CLI scenario and the azidentity SDK seems to be able to facilitate this through a number of credential types like AzureCLICredential, AzureDeveloperCLICredential, DefaultAzureCredential, and ChainedTokenCredential.
ChainedTokenCredential seems to be the right choice for ratify to consolidate all scenarios in one single place.

5. Use a single SDK (**`azidentity`**) for all authentication workflows to improve maintainability and alignment with Azure best practices.

---

### **Refactoring Plan**

#### **1. Introduce a New Azure Authentication Package**
- A new package, `pkg/common/cloudauthproviders/azure`, will consolidate shared Azure authentication logic.
- Authentication will use `ChainedTokenCredential` to sequentially try:
- **Workload Identity**
- **Managed Identity**
- **Azure CLI**
- If all attempts fail, the process will return an error.

##### **Proposed Code Snippet**
```go
package azure

import (
"fmt"
"os"

"github.com/Azure/azure-sdk-for-go/sdk/azidentity"
)

func NewChainedCredential() (*azidentity.ChainedTokenCredential, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I assume this proposed implementation is going to change now to take into account different client ids not specified via env variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. This is just a simple example of how it can be done. ClientId can be explicitly passed and used when provided, instead of always expecting it from env variables.

var creds []azidentity.TokenCredential

// Add Workload Identity if environment variables are set
if tenantID := os.Getenv("AZURE_TENANT_ID"); tenantID != "" {
if clientID := os.Getenv("AZURE_CLIENT_ID"); clientID != "" {
if tokenFile := os.Getenv("AZURE_FEDERATED_TOKEN_FILE"); tokenFile != "" {
wiCred, err := azidentity.NewWorkloadIdentityCredential(&azidentity.WorkloadIdentityCredentialOptions{
TenantID: tenantID,
ClientID: clientID,
TokenFilePath: tokenFile,
})
if err == nil {
creds = append(creds, wiCred)
}
}
}
}

// Add Managed Identity
if clientID := os.Getenv("AZURE_CLIENT_ID"); clientID != "" {
miCred, err := azidentity.NewManagedIdentityCredential(&azidentity.ManagedIdentityCredentialOptions{
ID: azidentity.ClientID(clientID),
})
if err == nil {
creds = append(creds, miCred)
}
}

// Add Azure CLI Credential
cliCred, err := azidentity.NewAzureCLICredential(nil)
if err == nil {
creds = append(creds, cliCred)
}

if len(creds) == 0 {
return nil, fmt.Errorf("no valid credentials detected. Check environment configuration.")
}

// Combine credentials into a chain
return azidentity.NewChainedTokenCredential(creds, nil)
}
```
#### **2. Refactor ORAS Auth Providers**
- Combine `azureidentity.go` and `azureworkloadidentity.go` into a single file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will maintain backwards compatability and ensure no breaking changes? We'll need to ensure we can support existing workload identity managed identity providers when user specifies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can ensure this by providing the override ability, as you pointed above.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we will introduce a new auth provider or just refactor existing one and add new fields necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can combine the existing two Oras auth providers into one auth provider. If chained authentication is used, there is no need to have both. We can override the chained credential process based on the user input, to explicitly use workload or managed identity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are keeping the existing authProviders, and introducing new azure auth provider. Should we just keep current implementation as is ( reduce risk , and introduce a new implementation/new file for the new auth provider. This is simliar to how we deprecated CertProvider and introduce KMP CR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me too.

- Update the implementation to use the `pkg/common/cloudauthproviders/azure` package for authentication.
- Authentication type will be inferred based on environment variables.

#### **3. Refactor Key Management and Certificate Providers**
- Update the providers to leverage the new `pkg/common/cloudauthproviders/azure` package for authentication.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, will there be any change in how the credentials are configured? Is KMP AKV setup with client id etc. decoupled still from ORAS azure auth providers?

Copy link
Contributor Author

@shahramk64 shahramk64 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. This requires more thinking. There are a few alternatives here:
1- Decoupled scenario: both can provide their own configurations. I wonder how the chained credential should work in this case. For example if both are defining a client id variable, which one is set in the ENV variable that the chained credential uses.
2- Coupled scenario: the configuration that is common to both is extracted and placed into a separate resource to represent both. This means that both will use the same credential type and the same credential config, (unless overridden explicitly?)
I think a decision needs to be made whether to support different types of credentials for Oras and KMP at the same time or not (for example, workload identity for KMP and managed identity for Oras), and also when using the same credential type for both, to support different identities for them (for example, a different client id for Oras and KMP)

- Remove redundant logic and ensure consistent authentication processes across all providers.

---

### **Advantages of the Proposed Refactoring**

1. **Improved Maintainability**:
- A single SDK (`azidentity`) reduces dependencies and simplifies code management.
- Consolidated authentication logic minimizes duplication and enhances clarity.

2. **Enhanced User Experience**:
- Automatic detection of authentication type eliminates the need for explicit configuration in most environments.
Copy link
Collaborator

@FeynmanZhou FeynmanZhou Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate on which auth configuration will be simplified after refactoring? Will Ratify detect whether users use a Azure Workload Identity or Azure Managed Identity? Maybe we could reference this doc to clarify which configuration could be removed https://ratify.dev/docs/quickstarts/ratify-on-azure#create-a-custom-resource-for-accessing-acr


3. **Extensibility**:
- Centralized authentication logic makes it easier to extend support for new scenarios or credential types in the future.

4. **Alignment with Azure Best Practices**:
- `azidentity` provides a Kubernetes-native experience, integrating seamlessly with other Azure SDKs.

---

### **Proposed Tasks**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we plan to deprecate any existing auth provider/config, we should add this to the V2 tracking issue that @binbin-li made.


1. **Create the New Azure Authentication Package**:
- Implement shared authentication logic using `azidentity` and `ChainedTokenCredential`.

2. **Refactor ORAS Auth Providers**:
- Combine `azureidentity.go` and `azureworkloadidentity.go`.
- Use the new package for authentication.

3. **Refactor Key Management and Certificate Providers**:
- Update the providers to leverage the common Azure authentication package.

4. **Test and Validate**:
- Thoroughly test the refactored components across different environments (e.g., AKS, local development) to ensure correctness and reliability.

Loading