Container Platform on Azure

On This Page

1	Overview	2	Architecture Overview
3	Azure Service Topology	4	Implementation Guide
5	Decision Criteria	6	Cost Model
7	Anti-Patterns to Avoid	8	References

Overview

Azure offers two primary container platforms — AKS (Azure Kubernetes Service) and Azure Container Apps — supported by Azure Container Registry (ACR) for image storage and Azure Container Instances (ACI) for on-demand single containers. The containerisation journey on Azure starts with ACR for image storage, moves through a CI/CD pipeline that builds and pushes images, and ends with either Container Apps or AKS running those images behind an Application Gateway or Azure Load Balancer.

Container Apps is built on Kubernetes and KEDA internally, but that complexity is fully managed. You declare revision traffic weights, scaling rules, and environment variables; Azure provisions and scales the underlying infrastructure. AKS exposes the full Kubernetes API — node pools, taints, custom admission controllers, CNI selection, and GPU scheduling — suited to teams that need those primitives or have existing Kubernetes expertise to justify the overhead.

Architecture Overview

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%%
flowchart TD
    START([Developer Pushes Code])

    START --> CI[CI Pipeline\nBuild container image\nRun tests and SAST]
    CI --> SCAN{Image Security\nScan Passed?}
    SCAN -->|Vulnerabilities found| FAIL[Block deployment\nNotify engineer\nLog to Defender for Containers]
    SCAN -->|Clean| ACR_PUSH[Push to ACR\nImmutable tags\nRetention policy active]

    ACR_PUSH --> DEPLOY{Deployment\nTarget}
    DEPLOY -->|No K8s complexity needed| CAPP[Azure Container Apps\nRevision-based deployment\nScale-to-zero supported]
    DEPLOY -->|Kubernetes ecosystem| AKS[AKS Cluster\nUser node pool\nHelm or GitOps]

    CAPP --> AGWY[Application Gateway\nor Azure Front Door\nHTTPS ingress]
    AKS --> AGWY

    AGWY --> USERS([Users Reach the Application])

    style START fill:#4f8ef7,color:#fff
    style USERS fill:#10b981,color:#fff
    style FAIL fill:#fef3c7
    style ACR_PUSH fill:#e0f2fe

Azure Service Topology

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%%
flowchart TD
    INTERNET[Internet Traffic]

    subgraph REGISTRY["Azure Container Registry (Premium)"]
        REPO[Private Repository\nPublic network disabled\nPrivate endpoint only]
        SCAN_ACR[Defender for Containers\nContinuous CVE scanning\nGeo-replication enabled]
    end

    subgraph AKS_CLUSTER["AKS Cluster"]
        subgraph SYSPOOL["System Node Pool — D4s_v5 × 3"]
            COREDNS[CoreDNS\nCluster DNS]
            KONNECT[Konnectivity\nAPI server tunnel]
            METRICS[metrics-server\nResource metrics]
        end
        subgraph USERPOOL["User Node Pool — D8s_v5 (3–20, autoscaler)"]
            APP1[App Pod A\nWorkload Identity\nno host process]
            APP2[App Pod B\nWorkload Identity\nno host process]
            APP3[App Pod C\nWorkload Identity\nno host process]
        end
    end

    subgraph IDENTITY["Identity — Entra ID"]
        WI[Workload Identity\nOIDC federation\nno secrets in pods]
        MI[Managed Identity\nKubelet identity\nAcrPull role]
    end

    subgraph OBSERVE["Observability"]
        MON[Azure Monitor\nContainer Insights\nPrometheus scrape]
        LAW[Log Analytics Workspace\nQuery and alerts\nRetention policy]
    end

    INTERNET --> AKS_CLUSTER
    REPO --> MI
    MI -->|AcrPull| AKS_CLUSTER
    SCAN_ACR -.->|Scans| REPO
    WI -->|OIDC token exchange| APP1 & APP2 & APP3
    SYSPOOL -.->|taint: CriticalAddonsOnly| USERPOOL
    AKS_CLUSTER --> MON --> LAW

    style INTERNET fill:#4f8ef7,color:#fff
    style SCAN_ACR fill:#e0f2fe
    style WI fill:#e0f2fe

Implementation Guide

Bicep — AKS Managed Cluster

// Microsoft.ContainerService/managedClusters@2023-07-01
resource aksCluster 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
  name: clusterName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    enableRBAC: true
    aadProfile: {
      managed: true
      enableAzureRBAC: true
    }
    oidcIssuerProfile: {
      enabled: true
    }
    securityProfile: {
      workloadIdentity: {
        enabled: true
      }
      imageCleaner: {
        enabled: true
        intervalHours: 48
      }
    }
    networkProfile: {
      networkPlugin: 'azure'
      networkPolicy: 'azure'
      loadBalancerSku: 'standard'
      outboundType: 'userAssignedNATGateway'
    }
    agentPoolProfiles: [
      {
        name: 'system'
        mode: 'System'
        count: 3
        vmSize: 'Standard_D4s_v5'
        availabilityZones: [ '1', '2', '3' ]
        osDiskType: 'Ephemeral'
        upgradeSettings: {
          maxSurge: '33%'
        }
        nodeTaints: [ 'CriticalAddonsOnly=true:NoSchedule' ]
      }
      {
        name: 'apppool'
        mode: 'User'
        count: 3
        minCount: 3
        maxCount: 20
        enableAutoScaling: true
        vmSize: 'Standard_D8s_v5'
        availabilityZones: [ '1', '2', '3' ]
        osDiskType: 'Ephemeral'
        upgradeSettings: {
          maxSurge: '33%'
        }
      }
    ]
    addonProfiles: {
      azurePolicy: {
        enabled: true
      }
      omsAgent: {
        enabled: true
        config: {
          logAnalyticsWorkspaceResourceID: logAnalyticsWorkspaceId
        }
      }
    }
  }
}

// Grant kubelet identity AcrPull on the registry
resource acrPullAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: containerRegistry
  name: guid(containerRegistry.id, aksCluster.id, acrPullRoleId)
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', acrPullRoleId)
    principalId: aksCluster.properties.identityProfile.kubeletidentity.objectId
    principalType: 'ServicePrincipal'
  }
}

Bicep — Azure Container Registry (Premium)

// Microsoft.ContainerRegistry/registries@2023-07-01
resource containerRegistry 'Microsoft.ContainerRegistry/registries@2023-07-01' = {
  name: registryName
  location: location
  sku: {
    name: 'Premium'
  }
  properties: {
    adminUserEnabled: false
    publicNetworkAccess: 'Disabled'
    zoneRedundancy: 'Enabled'
    policies: {
      retentionPolicy: {
        days: 30
        status: 'enabled'
      }
    }
  }
}

Terraform equivalent: Use azurerm_kubernetes_cluster with identity { type = "SystemAssigned" }, oidc_issuer_enabled = true, workload_identity_enabled = true, and separate azurerm_kubernetes_cluster_node_pool for the user pool. Use azurerm_container_registry with sku = "Premium", public_network_access_enabled = false, zone_redundancy_enabled = true, and azurerm_role_assignment binding AcrPull to the kubelet identity principal.

Decision Criteria

Criteria	AKS	Container Apps
Operational overhead	Higher — own node pools, upgrades, CNI	Lower — fully managed infrastructure
Kubernetes ecosystem	Full — CRDs, operators, admission webhooks	Limited — KEDA scaling rules, Dapr
Cluster upgrades	You manage — runbook required	Automatic — no runbook needed
Custom schedulers	Yes	No
Scale-to-zero	No (min 1 pod per pool)	Yes — built-in
Cost at scale	Lower — pack workloads on reserved VMs	Higher — per-vCPU-second consumption
GPU / spot nodes	Yes	No
Best for	Custom admission controllers, CNI, GPU, K8s primitives, client-required Kubernetes	Greenfield microservices, event-driven workloads, smaller teams

Recommendation for most Ascendion projects: Start with Azure Container Apps. Graduate to AKS when you need a specific Kubernetes capability — custom operators, advanced network policy, GPU workloads, or client-mandated Kubernetes.

Cost Model

Component	Cost Driver	Optimisation
AKS node pools	VM SKU × node count × uptime	Reserved Instances (1yr/3yr) for baseline; Spot nodes for batch
System node pool	Always-on × 3 nodes minimum	Use smallest SKU that satisfies CoreDNS and Konnectivity headroom
NAT Gateway	Hourly + data processed	Share one NAT Gateway across all subnets in the VNet
ACR Premium	Storage GB + geo-replication pairs	Retention policy removes untagged manifests; delete unused repos
Azure Monitor	Ingestion GB + retention days	Filter noisy container logs at the agent; set 30-day hot retention
Log Analytics	Pay-per-GB ingestion	Commitment tiers above 100 GB/day; archive cold data cheaply

Cost optimisation levers:

Use Ephemeral OS disks on all node pools — no managed disk cost per node, faster node provisioning.
Enable cluster autoscaler on the user pool; set scale-down-delay-after-add to 10 minutes to avoid thrashing.
Spot node pools for non-production environments and batch workloads reduce VM cost by up to 90%.
ACR geo-replication adds a second replica charge — only enable it for regions that actually serve production traffic.
Defender for Containers is charged per vCore of running nodes; audit enablement scope to production only.

Anti-Patterns to Avoid

⚠ 1. Running Application Pods on the System Node Pool

Scheduling workloads on the system node pool alongside CoreDNS, Konnectivity, and metrics-server. A misbehaving application pod can starve system components of CPU/memory, causing DNS failures and breaking the entire cluster.

Hover to see the fix ↻

↺ Correct Approach

Taint the system node pool with CriticalAddonsOnly=true:NoSchedule. All application pods must carry a toleration to land there; by default they do not, so they are scheduled only on the user pool. Always provision a dedicated user node pool.

⚠ 2. Disabling Network Policies

Deploying AKS with networkPolicy: none, leaving all pods able to reach all other pods across namespaces with no restriction. A compromised pod can scan and connect to any in-cluster service.

Hover to see the fix ↻

↺ Correct Approach

Set networkPolicy: azure (or Calico) at cluster creation time — it cannot be changed post-creation. Define NetworkPolicy manifests that default-deny all ingress and egress per namespace, then add explicit allow rules for required traffic paths.

⚠ 3. Service Principal with Password for Cluster Identity

Configuring the AKS cluster with a service principal and client secret. Secrets expire, require manual rotation, and are frequently leaked in pipeline logs or Bicep parameter files.

Hover to see the fix ↻

↺ Correct Approach

Use identity: { type: 'SystemAssigned' } for the cluster control plane and Workload Identity (OIDC federation with Entra ID) for application pods. No secrets are stored anywhere; the OIDC token is issued per-pod and expires automatically.

⚠ 4. Fixed Node Pool Count in Production

Hardcoding count: 5 on the user node pool with no autoscaler. Traffic spikes exhaust available pods and cause request failures; quiet periods waste compute budget.

Hover to see the fix ↻

↺ Correct Approach

Enable enableAutoScaling: true with a minCount/maxCount range on every user node pool. Set Horizontal Pod Autoscaler (HPA) or KEDA scalers on each deployment. The node autoscaler responds to pending pods; HPA/KEDA creates pending pods when load rises.

⚠ 5. ACR Admin Credentials Enabled

Setting adminUserEnabled: true on the container registry and distributing the admin username and password to pipelines and node pools. Admin credentials are permanent, shared, and not audited per consumer.

Hover to see the fix ↻

↺ Correct Approach

Set adminUserEnabled: false. Grant the AKS kubelet identity the AcrPull role via role assignment. Grant pipeline service principals AcrPush via role assignment. All access is identity-based, audited in Entra ID sign-in logs, and revocable per principal.

Flowchart

%%{init:{'theme':'base','themeVariables':{'fontSize':'14px','fontFamily':'Inter, system-ui, sans-serif','primaryColor':'#DBEAFE','primaryTextColor':'#1e3a5f','primaryBorderColor':'#2563EB','lineColor':'#374151','clusterBkg':'#F9FAFB','clusterBorder':'#D1D5DB','edgeLabelBackground':'#FFFFFF'},'flowchart':{'curve':'orthogonal','padding':30,'nodeSpacing':65,'rankSpacing':75,'useMaxWidth':true}}}%% flowchart TD INTERNET[Internet Traffic] subgraph REGISTRY["Azure Container Registry (Premium)"] REPO[Private Repository\nPublic network disabled\nPrivate endpoint only] SCAN_ACR[Defender for Containers\nContinuous CVE scanning\nGeo-replication enabled] end subgraph AKS_CLUSTER["AKS Cluster — Azure CNI + azure networkPolicy"] subgraph SYSPOOL["System Node Pool — D4s_v5 × 3 — taint: CriticalAddonsOnly"] COREDNS[CoreDNS\nCluster DNS] KONNECT[Konnectivity\nAPI server tunnel] METRICS[metrics-server\nResource metrics] end subgraph USERPOOL["User Node Pool — D8s_v5 (3–20, cluster autoscaler on)"] APP1[App Pod A\nWorkload Identity\nOIDC token] APP2[App Pod B\nWorkload Identity\nOIDC token] APP3[App Pod C\nWorkload Identity\nOIDC token] end end subgraph IDENTITY["Identity Layer — Entra ID"] WI[Workload Identity\nOIDC federation\nNo secrets in pods] MI[Managed Identity\nKubelet identity\nAcrPull role assignment] end subgraph OBSERVE["Observability"] MON[Azure Monitor\nContainer Insights\nPrometheus scrape] LAW[Log Analytics Workspace\nQuery and alerts\n30-day retention] end INTERNET --> AKS_CLUSTER REPO --> MI MI -->|AcrPull\nrole assignment| AKS_CLUSTER SCAN_ACR -.->|Scans images| REPO WI -->|OIDC token\nexchange| APP1 & APP2 & APP3 SYSPOOL -.->|NoSchedule for\napp workloads| USERPOOL AKS_CLUSTER --> MON MON --> LAW style INTERNET fill:#4f8ef7,color:#fff style SCAN_ACR fill:#e0f2fe style WI fill:#e0f2fe style MI fill:#DBEAFE

References

Microsoft — AKS best practices. https://learn.microsoft.com/azure/aks/best-practices
Microsoft — Workload Identity overview. https://learn.microsoft.com/azure/aks/workload-identity-overview
Microsoft — Container Apps vs AKS. https://learn.microsoft.com/azure/container-apps/compare-options
Microsoft — Upgrade an AKS cluster. https://learn.microsoft.com/azure/aks/upgrade-cluster
CNCF — Cloud Native Landscape. https://landscape.cncf.io
Portal: AWS container platform comparison. /technology/cloud/aws-container-platform/

Ascendion Engineering Knowledge Base ← Cloud