On This Page
1Overview2Architecture Overview
3Azure Service Topology4Implementation Guide
5Decision Criteria6Cost Model
7Anti-Patterns to Avoid8References

Overview

Azure offers two primary container platforms — AKS (Azure Kubernetes Service) and Azure Container Apps — supported by Azure Container Registry (ACR) for image storage and Azure Container Instances (ACI) for on-demand single containers. The containerisation journey on Azure starts with ACR for image storage, moves through a CI/CD pipeline that builds and pushes images, and ends with either Container Apps or AKS running those images behind an Application Gateway or Azure Load Balancer.

Container Apps is built on Kubernetes and KEDA internally, but that complexity is fully managed. You declare revision traffic weights, scaling rules, and environment variables; Azure provisions and scales the underlying infrastructure. AKS exposes the full Kubernetes API — node pools, taints, custom admission controllers, CNI selection, and GPU scheduling — suited to teams that need those primitives or have existing Kubernetes expertise to justify the overhead.

Architecture Overview

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD START([Developer Pushes Code]) START --> CI[CI Pipeline\nBuild container image\nRun tests and SAST] CI --> SCAN{Image Security\nScan Passed?} SCAN -->|Vulnerabilities found| FAIL[Block deployment\nNotify engineer\nLog to Defender for Containers] SCAN -->|Clean| ACR_PUSH[Push to ACR\nImmutable tags\nRetention policy active] ACR_PUSH --> DEPLOY{Deployment\nTarget} DEPLOY -->|No K8s complexity needed| CAPP[Azure Container Apps\nRevision-based deployment\nScale-to-zero supported] DEPLOY -->|Kubernetes ecosystem| AKS[AKS Cluster\nUser node pool\nHelm or GitOps] CAPP --> AGWY[Application Gateway\nor Azure Front Door\nHTTPS ingress] AKS --> AGWY AGWY --> USERS([Users Reach the Application]) style START fill:#4f8ef7,color:#fff style USERS fill:#10b981,color:#fff style FAIL fill:#fef3c7 style ACR_PUSH fill:#e0f2fe

Azure Service Topology

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD INTERNET[Internet Traffic] subgraph REGISTRY["Azure Container Registry (Premium)"] REPO[Private Repository\nPublic network disabled\nPrivate endpoint only] SCAN_ACR[Defender for Containers\nContinuous CVE scanning\nGeo-replication enabled] end subgraph AKS_CLUSTER["AKS Cluster"] subgraph SYSPOOL["System Node Pool — D4s_v5 × 3"] COREDNS[CoreDNS\nCluster DNS] KONNECT[Konnectivity\nAPI server tunnel] METRICS[metrics-server\nResource metrics] end subgraph USERPOOL["User Node Pool — D8s_v5 (3–20, autoscaler)"] APP1[App Pod A\nWorkload Identity\nno host process] APP2[App Pod B\nWorkload Identity\nno host process] APP3[App Pod C\nWorkload Identity\nno host process] end end subgraph IDENTITY["Identity — Entra ID"] WI[Workload Identity\nOIDC federation\nno secrets in pods] MI[Managed Identity\nKubelet identity\nAcrPull role] end subgraph OBSERVE["Observability"] MON[Azure Monitor\nContainer Insights\nPrometheus scrape] LAW[Log Analytics Workspace\nQuery and alerts\nRetention policy] end INTERNET --> AKS_CLUSTER REPO --> MI MI -->|AcrPull| AKS_CLUSTER SCAN_ACR -.->|Scans| REPO WI -->|OIDC token exchange| APP1 & APP2 & APP3 SYSPOOL -.->|taint: CriticalAddonsOnly| USERPOOL AKS_CLUSTER --> MON --> LAW style INTERNET fill:#4f8ef7,color:#fff style SCAN_ACR fill:#e0f2fe style WI fill:#e0f2fe

Implementation Guide

Bicep — AKS Managed Cluster

// Microsoft.ContainerService/managedClusters@2023-07-01
resource aksCluster 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
  name: clusterName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    enableRBAC: true
    aadProfile: {
      managed: true
      enableAzureRBAC: true
    }
    oidcIssuerProfile: {
      enabled: true
    }
    securityProfile: {
      workloadIdentity: {
        enabled: true
      }
      imageCleaner: {
        enabled: true
        intervalHours: 48
      }
    }
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'azure'
      loadBalancerSku: 'standard'
      outboundType: 'userAssignedNATGateway'
    }
    agentPoolProfiles: [
      {
        name: 'system'
        mode: 'System'
        count: 3
        vmSize: 'Standard_D4s_v5'
        availabilityZones: [ '1', '2', '3' ]
        osDiskType: 'Ephemeral'
        upgradeSettings: {
          maxSurge: '33%'
        }
        nodeTaints: [ 'CriticalAddonsOnly=true:NoSchedule' ]
      }
      {
        name: 'apppool'
        mode: 'User'
        count: 3
        minCount: 3
        maxCount: 20
        enableAutoScaling: true
        vmSize: 'Standard_D8s_v5'
        availabilityZones: [ '1', '2', '3' ]
        osDiskType: 'Ephemeral'
        upgradeSettings: {
          maxSurge: '33%'
        }
      }
    ]
    addonProfiles: {
      azurePolicy: {
        enabled: true
      }
      omsAgent: {
        enabled: true
        config: {
          logAnalyticsWorkspaceResourceID: logAnalyticsWorkspaceId
        }
      }
    }
  }
}

// Grant kubelet identity AcrPull on the registry
resource acrPullAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: containerRegistry
  name: guid(containerRegistry.id, aksCluster.id, acrPullRoleId)
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', acrPullRoleId)
    principalId: aksCluster.properties.identityProfile.kubeletidentity.objectId
    principalType: 'ServicePrincipal'
  }
}

Bicep — Azure Container Registry (Premium)

// Microsoft.ContainerRegistry/registries@2023-07-01
resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-07-01' = {
  name: registryName
  location: location
  sku: {
    name: 'Premium'
  }
  properties: {
    adminUserEnabled: false
    publicNetworkAccess: 'Disabled'
    zoneRedundancy: 'Enabled'
    policies: {
      retentionPolicy: {
        days: 30
        status: 'enabled'
      }
    }
  }
}

Terraform equivalent: Use azurerm_kubernetes_cluster with identity { type = "SystemAssigned" }, oidc_issuer_enabled = true, workload_identity_enabled = true, and separate azurerm_kubernetes_cluster_node_pool for the user pool. Use azurerm_container_registry with sku = "Premium", public_network_access_enabled = false, zone_redundancy_enabled = true, and azurerm_role_assignment binding AcrPull to the kubelet identity principal.

Decision Criteria

Criteria AKS Container Apps
Operational overhead Higher — own node pools, upgrades, CNI Lower — fully managed infrastructure
Kubernetes ecosystem Full — CRDs, operators, admission webhooks Limited — KEDA scaling rules, Dapr
Cluster upgrades You manage — runbook required Automatic — no runbook needed
Custom schedulers Yes No
Scale-to-zero No (min 1 pod per pool) Yes — built-in
Cost at scale Lower — pack workloads on reserved VMs Higher — per-vCPU-second consumption
GPU / spot nodes Yes No
Best for Custom admission controllers, CNI, GPU, K8s primitives, client-required Kubernetes Greenfield microservices, event-driven workloads, smaller teams

Recommendation for most Ascendion projects: Start with Azure Container Apps. Graduate to AKS when you need a specific Kubernetes capability — custom operators, advanced network policy, GPU workloads, or client-mandated Kubernetes.

Cost Model

Component Cost Driver Optimisation
AKS node pools VM SKU × node count × uptime Reserved Instances (1yr/3yr) for baseline; Spot nodes for batch
System node pool Always-on × 3 nodes minimum Use smallest SKU that satisfies CoreDNS and Konnectivity headroom
NAT Gateway Hourly + data processed Share one NAT Gateway across all subnets in the VNet
ACR Premium Storage GB + geo-replication pairs Retention policy removes untagged manifests; delete unused repos
Azure Monitor Ingestion GB + retention days Filter noisy container logs at the agent; set 30-day hot retention
Log Analytics Pay-per-GB ingestion Commitment tiers above 100 GB/day; archive cold data cheaply

Cost optimisation levers:

  • Use Ephemeral OS disks on all node pools — no managed disk cost per node, faster node provisioning.
  • Enable cluster autoscaler on the user pool; set scale-down-delay-after-add to 10 minutes to avoid thrashing.
  • Spot node pools for non-production environments and batch workloads reduce VM cost by up to 90%.
  • ACR geo-replication adds a second replica charge — only enable it for regions that actually serve production traffic.
  • Defender for Containers is charged per vCore of running nodes; audit enablement scope to production only.

Anti-Patterns to Avoid

⚠ 1. Running Application Pods on the System Node Pool

Scheduling workloads on the system node pool alongside CoreDNS, Konnectivity, and metrics-server. A misbehaving application pod can starve system components of CPU/memory, causing DNS failures and breaking the entire cluster.

Hover to see the fix ↻
↺ Correct Approach

Taint the system node pool with CriticalAddonsOnly=true:NoSchedule. All application pods must carry a toleration to land there; by default they do not, so they are scheduled only on the user pool. Always provision a dedicated user node pool.

⚠ 2. Disabling Network Policies

Deploying AKS with networkPolicy: none, leaving all pods able to reach all other pods across namespaces with no restriction. A compromised pod can scan and connect to any in-cluster service.

Hover to see the fix ↻
↺ Correct Approach

Set networkPolicy: azure (or Calico) at cluster creation time — it cannot be changed post-creation. Define NetworkPolicy manifests that default-deny all ingress and egress per namespace, then add explicit allow rules for required traffic paths.

⚠ 3. Service Principal with Password for Cluster Identity

Configuring the AKS cluster with a service principal and client secret. Secrets expire, require manual rotation, and are frequently leaked in pipeline logs or Bicep parameter files.

Hover to see the fix ↻
↺ Correct Approach

Use identity: { type: 'SystemAssigned' } for the cluster control plane and Workload Identity (OIDC federation with Entra ID) for application pods. No secrets are stored anywhere; the OIDC token is issued per-pod and expires automatically.

⚠ 4. Fixed Node Pool Count in Production

Hardcoding count: 5 on the user node pool with no autoscaler. Traffic spikes exhaust available pods and cause request failures; quiet periods waste compute budget.

Hover to see the fix ↻
↺ Correct Approach

Enable enableAutoScaling: true with a minCount/maxCount range on every user node pool. Set Horizontal Pod Autoscaler (HPA) or KEDA scalers on each deployment. The node autoscaler responds to pending pods; HPA/KEDA creates pending pods when load rises.

⚠ 5. ACR Admin Credentials Enabled

Setting adminUserEnabled: true on the container registry and distributing the admin username and password to pipelines and node pools. Admin credentials are permanent, shared, and not audited per consumer.

Hover to see the fix ↻
↺ Correct Approach

Set adminUserEnabled: false. Grant the AKS kubelet identity the AcrPull role via role assignment. Grant pipeline service principals AcrPush via role assignment. All access is identity-based, audited in Entra ID sign-in logs, and revocable per principal.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD INTERNET[Internet Traffic] subgraph REGISTRY["Azure Container Registry (Premium)"] REPO[Private Repository\nPublic network disabled\nPrivate endpoint only] SCAN_ACR[Defender for Containers\nContinuous CVE scanning\nGeo-replication enabled] end subgraph AKS_CLUSTER["AKS Cluster — Azure CNI + azure networkPolicy"] subgraph SYSPOOL["System Node Pool — D4s_v5 × 3 — taint: CriticalAddonsOnly"] COREDNS[CoreDNS\nCluster DNS] KONNECT[Konnectivity\nAPI server tunnel] METRICS[metrics-server\nResource metrics] end subgraph USERPOOL["User Node Pool — D8s_v5 (3–20, cluster autoscaler on)"] APP1[App Pod A\nWorkload Identity\nOIDC token] APP2[App Pod B\nWorkload Identity\nOIDC token] APP3[App Pod C\nWorkload Identity\nOIDC token] end end subgraph IDENTITY["Identity Layer — Entra ID"] WI[Workload Identity\nOIDC federation\nNo secrets in pods] MI[Managed Identity\nKubelet identity\nAcrPull role assignment] end subgraph OBSERVE["Observability"] MON[Azure Monitor\nContainer Insights\nPrometheus scrape] LAW[Log Analytics Workspace\nQuery and alerts\n30-day retention] end INTERNET --> AKS_CLUSTER REPO --> MI MI -->|AcrPull\nrole assignment| AKS_CLUSTER SCAN_ACR -.->|Scans images| REPO WI -->|OIDC token\nexchange| APP1 & APP2 & APP3 SYSPOOL -.->|NoSchedule for\napp workloads| USERPOOL AKS_CLUSTER --> MON MON --> LAW style INTERNET fill:#4f8ef7,color:#fff style SCAN_ACR fill:#e0f2fe style WI fill:#e0f2fe style MI fill:#DBEAFE

References

  1. Microsoft — AKS best practices. https://learn.microsoft.com/azure/aks/best-practices
  2. Microsoft — Workload Identity overview. https://learn.microsoft.com/azure/aks/workload-identity-overview
  3. Microsoft — Container Apps vs AKS. https://learn.microsoft.com/azure/container-apps/compare-options
  4. Microsoft — Upgrade an AKS cluster. https://learn.microsoft.com/azure/aks/upgrade-cluster
  5. CNCF — Cloud Native Landscape. https://landscape.cncf.io
  6. Portal: AWS container platform comparison. /technology/cloud/aws-container-platform/
Ascendion Engineering Knowledge Base ← Cloud