| 1 | Overview | 2 | Architecture Overview |
| 3 | Azure Service Topology | 4 | Implementation Guide |
| 5 | Decision Criteria | 6 | Cost Model |
| 7 | Anti-Patterns to Avoid | 8 | References |
Overview
Azure offers two primary container platforms — AKS (Azure Kubernetes Service) and Azure Container Apps — supported by Azure Container Registry (ACR) for image storage and Azure Container Instances (ACI) for on-demand single containers. The containerisation journey on Azure starts with ACR for image storage, moves through a CI/CD pipeline that builds and pushes images, and ends with either Container Apps or AKS running those images behind an Application Gateway or Azure Load Balancer.
Container Apps is built on Kubernetes and KEDA internally, but that complexity is fully managed. You declare revision traffic weights, scaling rules, and environment variables; Azure provisions and scales the underlying infrastructure. AKS exposes the full Kubernetes API — node pools, taints, custom admission controllers, CNI selection, and GPU scheduling — suited to teams that need those primitives or have existing Kubernetes expertise to justify the overhead.
Architecture Overview
%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([Developer Pushes Code]) START --> CI[CI Pipeline\nBuild container image\nRun tests and SAST] CI --> SCAN{Image Security\nScan Passed?} SCAN -->|Vulnerabilities found| FAIL[Block deployment\nNotify engineer\nLog to Defender for Containers] SCAN -->|Clean| ACR_PUSH[Push to ACR\nImmutable tags\nRetention policy active] ACR_PUSH --> DEPLOY{Deployment\nTarget} DEPLOY -->|No K8s complexity needed| CAPP[Azure Container Apps\nRevision-based deployment\nScale-to-zero supported] DEPLOY -->|Kubernetes ecosystem| AKS[AKS Cluster\nUser node pool\nHelm or GitOps] CAPP --> AGWY[Application Gateway\nor Azure Front Door\nHTTPS ingress] AKS --> AGWY AGWY --> USERS([Users Reach the Application]) style START fill:#4f8ef7,color:#fff style USERS fill:#10b981,color:#fff style FAIL fill:#fef3c7 style ACR_PUSH fill:#e0f2fe
Azure Service Topology
%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD INTERNET[Internet Traffic] subgraph REGISTRY["Azure Container Registry (Premium)"] REPO[Private Repository\nPublic network disabled\nPrivate endpoint only] SCAN_ACR[Defender for Containers\nContinuous CVE scanning\nGeo-replication enabled] end subgraph AKS_CLUSTER["AKS Cluster"] subgraph SYSPOOL["System Node Pool — D4s_v5 × 3"] COREDNS[CoreDNS\nCluster DNS] KONNECT[Konnectivity\nAPI server tunnel] METRICS[metrics-server\nResource metrics] end subgraph USERPOOL["User Node Pool — D8s_v5 (3–20, autoscaler)"] APP1[App Pod A\nWorkload Identity\nno host process] APP2[App Pod B\nWorkload Identity\nno host process] APP3[App Pod C\nWorkload Identity\nno host process] end end subgraph IDENTITY["Identity — Entra ID"] WI[Workload Identity\nOIDC federation\nno secrets in pods] MI[Managed Identity\nKubelet identity\nAcrPull role] end subgraph OBSERVE["Observability"] MON[Azure Monitor\nContainer Insights\nPrometheus scrape] LAW[Log Analytics Workspace\nQuery and alerts\nRetention policy] end INTERNET --> AKS_CLUSTER REPO --> MI MI -->|AcrPull| AKS_CLUSTER SCAN_ACR -.->|Scans| REPO WI -->|OIDC token exchange| APP1 & APP2 & APP3 SYSPOOL -.->|taint: CriticalAddonsOnly| USERPOOL AKS_CLUSTER --> MON --> LAW style INTERNET fill:#4f8ef7,color:#fff style SCAN_ACR fill:#e0f2fe style WI fill:#e0f2fe
Implementation Guide
Bicep — AKS Managed Cluster
// Microsoft.ContainerService/managedClusters@2023-07-01
resource aksCluster 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
name: clusterName
location: location
identity: {
type: 'SystemAssigned'
}
properties: {
enableRBAC: true
aadProfile: {
managed: true
enableAzureRBAC: true
}
oidcIssuerProfile: {
enabled: true
}
securityProfile: {
workloadIdentity: {
enabled: true
}
imageCleaner: {
enabled: true
intervalHours: 48
}
}
networkProfile: {
networkPlugin: 'azure'
networkPolicy: 'azure'
loadBalancerSku: 'standard'
outboundType: 'userAssignedNATGateway'
}
agentPoolProfiles: [
{
name: 'system'
mode: 'System'
count: 3
vmSize: 'Standard_D4s_v5'
availabilityZones: [ '1', '2', '3' ]
osDiskType: 'Ephemeral'
upgradeSettings: {
maxSurge: '33%'
}
nodeTaints: [ 'CriticalAddonsOnly=true:NoSchedule' ]
}
{
name: 'apppool'
mode: 'User'
count: 3
minCount: 3
maxCount: 20
enableAutoScaling: true
vmSize: 'Standard_D8s_v5'
availabilityZones: [ '1', '2', '3' ]
osDiskType: 'Ephemeral'
upgradeSettings: {
maxSurge: '33%'
}
}
]
addonProfiles: {
azurePolicy: {
enabled: true
}
omsAgent: {
enabled: true
config: {
logAnalyticsWorkspaceResourceID: logAnalyticsWorkspaceId
}
}
}
}
}
// Grant kubelet identity AcrPull on the registry
resource acrPullAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
scope: containerRegistry
name: guid(containerRegistry.id, aksCluster.id, acrPullRoleId)
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', acrPullRoleId)
principalId: aksCluster.properties.identityProfile.kubeletidentity.objectId
principalType: 'ServicePrincipal'
}
}
Bicep — Azure Container Registry (Premium)
// Microsoft.ContainerRegistry/registries@2023-07-01
resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-07-01' = {
name: registryName
location: location
sku: {
name: 'Premium'
}
properties: {
adminUserEnabled: false
publicNetworkAccess: 'Disabled'
zoneRedundancy: 'Enabled'
policies: {
retentionPolicy: {
days: 30
status: 'enabled'
}
}
}
}
Terraform equivalent: Use
azurerm_kubernetes_clusterwithidentity { type = "SystemAssigned" },oidc_issuer_enabled = true,workload_identity_enabled = true, and separateazurerm_kubernetes_cluster_node_poolfor the user pool. Useazurerm_container_registrywithsku = "Premium",public_network_access_enabled = false,zone_redundancy_enabled = true, andazurerm_role_assignmentbindingAcrPullto the kubelet identity principal.
Decision Criteria
| Criteria | AKS | Container Apps |
|---|---|---|
| Operational overhead | Higher — own node pools, upgrades, CNI | Lower — fully managed infrastructure |
| Kubernetes ecosystem | Full — CRDs, operators, admission webhooks | Limited — KEDA scaling rules, Dapr |
| Cluster upgrades | You manage — runbook required | Automatic — no runbook needed |
| Custom schedulers | Yes | No |
| Scale-to-zero | No (min 1 pod per pool) | Yes — built-in |
| Cost at scale | Lower — pack workloads on reserved VMs | Higher — per-vCPU-second consumption |
| GPU / spot nodes | Yes | No |
| Best for | Custom admission controllers, CNI, GPU, K8s primitives, client-required Kubernetes | Greenfield microservices, event-driven workloads, smaller teams |
Recommendation for most Ascendion projects: Start with Azure Container Apps. Graduate to AKS when you need a specific Kubernetes capability — custom operators, advanced network policy, GPU workloads, or client-mandated Kubernetes.
Cost Model
| Component | Cost Driver | Optimisation |
|---|---|---|
| AKS node pools | VM SKU × node count × uptime | Reserved Instances (1yr/3yr) for baseline; Spot nodes for batch |
| System node pool | Always-on × 3 nodes minimum | Use smallest SKU that satisfies CoreDNS and Konnectivity headroom |
| NAT Gateway | Hourly + data processed | Share one NAT Gateway across all subnets in the VNet |
| ACR Premium | Storage GB + geo-replication pairs | Retention policy removes untagged manifests; delete unused repos |
| Azure Monitor | Ingestion GB + retention days | Filter noisy container logs at the agent; set 30-day hot retention |
| Log Analytics | Pay-per-GB ingestion | Commitment tiers above 100 GB/day; archive cold data cheaply |
Cost optimisation levers:
- Use Ephemeral OS disks on all node pools — no managed disk cost per node, faster node provisioning.
- Enable cluster autoscaler on the user pool; set
scale-down-delay-after-addto 10 minutes to avoid thrashing. - Spot node pools for non-production environments and batch workloads reduce VM cost by up to 90%.
- ACR geo-replication adds a second replica charge — only enable it for regions that actually serve production traffic.
- Defender for Containers is charged per vCore of running nodes; audit enablement scope to production only.
Anti-Patterns to Avoid
Scheduling workloads on the system node pool alongside CoreDNS, Konnectivity, and metrics-server. A misbehaving application pod can starve system components of CPU/memory, causing DNS failures and breaking the entire cluster.
Taint the system node pool with CriticalAddonsOnly=true:NoSchedule. All application pods must carry a toleration to land there; by default they do not, so they are scheduled only on the user pool. Always provision a dedicated user node pool.
Deploying AKS with networkPolicy: none, leaving all pods able to reach all other pods across namespaces with no restriction. A compromised pod can scan and connect to any in-cluster service.
Set networkPolicy: azure (or Calico) at cluster creation time — it cannot be changed post-creation. Define NetworkPolicy manifests that default-deny all ingress and egress per namespace, then add explicit allow rules for required traffic paths.
Configuring the AKS cluster with a service principal and client secret. Secrets expire, require manual rotation, and are frequently leaked in pipeline logs or Bicep parameter files.
Use identity: { type: 'SystemAssigned' } for the cluster control plane and Workload Identity (OIDC federation with Entra ID) for application pods. No secrets are stored anywhere; the OIDC token is issued per-pod and expires automatically.
Hardcoding count: 5 on the user node pool with no autoscaler. Traffic spikes exhaust available pods and cause request failures; quiet periods waste compute budget.
Enable enableAutoScaling: true with a minCount/maxCount range on every user node pool. Set Horizontal Pod Autoscaler (HPA) or KEDA scalers on each deployment. The node autoscaler responds to pending pods; HPA/KEDA creates pending pods when load rises.
Setting adminUserEnabled: true on the container registry and distributing the admin username and password to pipelines and node pools. Admin credentials are permanent, shared, and not audited per consumer.
Set adminUserEnabled: false. Grant the AKS kubelet identity the AcrPull role via role assignment. Grant pipeline service principals AcrPush via role assignment. All access is identity-based, audited in Entra ID sign-in logs, and revocable per principal.
Flowchart
References
- Microsoft — AKS best practices. https://learn.microsoft.com/azure/aks/best-practices
- Microsoft — Workload Identity overview. https://learn.microsoft.com/azure/aks/workload-identity-overview
- Microsoft — Container Apps vs AKS. https://learn.microsoft.com/azure/container-apps/compare-options
- Microsoft — Upgrade an AKS cluster. https://learn.microsoft.com/azure/aks/upgrade-cluster
- CNCF — Cloud Native Landscape. https://landscape.cncf.io
- Portal: AWS container platform comparison. /technology/cloud/aws-container-platform/