ComputeLesson 2 of 16

VM Scale Sets

VM Scale Sets (VMSS) let you deploy and manage a group of identical, auto-scaling VMs behind a load balancer — giving you elastic capacity for variable workloads.

Simple Explanation

Imagine hiring 2 workers for normal load and automatically bringing in 8 more when it gets busy — then sending them home when the rush ends. VM Scale Sets do exactly that for VMs.

When to Use VMSS

VMSS vs App Service Auto-scaling

If your app can run on App Service, auto-scaling there is simpler — no OS management, zero VM boot time. Use VMSS when you need full OS control at scale. VMSS boot time (2–5 min) is slower than App Service slot scaling (seconds).

How VMSS Works

VMSS Auto-scaling Flow
Trigger
CPU > 70%
Schedule
Custom metric
Scale Out
Add instances
Use same image
Register with LB
Scale In
CPU < 30%
Remove instances
Cooldown period
Load Balancer
Distributes traffic
Health probes
Drains on scale-in

Commands

Azure CLI
# Create a scale set
az vmss create \
  --resource-group rg-app \
  --name vmss-web \
  --image Ubuntu2204 \
  --vm-sku Standard_D2s_v3 \
  --instance-count 2 \
  --admin-username azureuser \
  --generate-ssh-keys \
  --upgrade-policy-mode automatic

# Define autoscale (2–10 instances based on CPU)
az monitor autoscale create \
  --resource-group rg-app \
  --resource vmss-web \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name autoscale-web \
  --min-count 2 --max-count 10 --count 2

# Add scale-out rule (CPU > 70% → add 2)
az monitor autoscale rule create \
  --resource-group rg-app \
  --autoscale-name autoscale-web \
  --scale out 2 \
  --condition "Percentage CPU > 70 avg 5m"

# List instances
az vmss list-instances --resource-group rg-app --name vmss-web --output table

# Manual scale
az vmss scale --resource-group rg-app --name vmss-web --new-capacity 5

Hands-on

  1. Create a VMSS with 2 instances (Standard_B2s for cost).
  2. Attach an autoscale policy: scale out at CPU > 70%, scale in at CPU < 30%.
  3. Generate load with stress tool and watch auto-scaling trigger.
  4. Check the load balancer backends to see new instances register.
  5. Manually scale in and verify instance count drops.

Debugging Scenario

Issue: New instances are not getting traffic after scale-out.

Interview Questions

Beginner

What is a VM Scale Set?

A group of identical VMs managed together that can auto-scale in/out based on demand metrics or schedules.

What is the minimum instance count for VMSS?

You can configure minimum as 0 (will scale from 0 on demand) or 1+. Setting minimum to 0 is useful for batch jobs; keep minimum ≥ 2 for production web apps.

Scenario-based

Traffic spikes every day at noon. How do you configure VMSS?

Use scheduled autoscale: scale out to 10 instances at 11:45 AM, scale in to 2 at 2:00 PM. Combine with metric autoscale as a safety net for unpredictable spikes.

VMSS is scaling but response time is still high.

New instances may still be bootstrapping (cloud-init running, app starting). Add a readiness probe to the load balancer so new instances only receive traffic when fully ready.

Summary

VMSS provides elastic VM capacity with automatic scaling and integrated load balancing. Use it for stateless, VM-level workloads that need dynamic capacity. For PaaS-compatible workloads, App Services with auto-scale is simpler.