Hands-onLesson 15 of 16

Debugging Bicep Deployments

Use structured runbooks to triage Bicep failures at every layer: compile, pre-flight, dependency, and runtime.

Simple Explanation (ELI5)

A Bicep deployment can fail at four different stages. Knowing which stage you are in tells you exactly where to look. Stage 1 is compile (Bicep syntax errors caught before anything touches Azure). Stage 2 is pre-flight validation (Azure checks your template structure and permissions). Stage 3 is dependency ordering (a resource tries to create before something it depends on is ready). Stage 4 is runtime (Azure accepted the plan but a resource operation failed after it started).

Failure Layer Map

StageWhen it happensWhere to lookCLI command
Compileaz bicep buildTerminal error outputaz bicep build --file main.bicep
Pre-flight validationaz deployment validateAzure response bodyaz deployment group validate ...
What-ifaz deployment what-ifUnexpected deletes or replacementsaz deployment group what-if ...
RuntimeDuring az deployment createDeployment operations listaz deployment operation group list ...

Essential Debugging Commands

bash
# Step 1 — Compile to catch syntax errors
az bicep build --file main.bicep

# Step 2 — Validate template against Azure (pre-flight)
az deployment group validate \
  --resource-group rg-platform-dev \
  --template-file main.bicep \
  --parameters @dev.parameters.json

# Step 3 — What-if preview (shows changes without deploying)
az deployment group what-if \
  --resource-group rg-platform-dev \
  --template-file main.bicep \
  --parameters @dev.parameters.json

# Step 4 — List deployment operations after a failed deploy
az deployment operation group list \
  --resource-group rg-platform-dev \
  --name my-deployment \
  --query "[?properties.provisioningState=='Failed']" -o table

# Step 5 — Get the full error message for a specific operation
az deployment operation group list \
  --resource-group rg-platform-dev \
  --name my-deployment \
  --query "[?properties.provisioningState=='Failed'].properties.statusMessage" -o json

# Step 6 — Check activity log for policy denials
az monitor activity-log list \
  --resource-group rg-platform-dev \
  --status Failed \
  --query "[].{time:eventTimestamp,op:operationName.value,msg:properties.statusMessage}" -o table

Runbook 1: Build Fails with Compile Error

Symptom: az bicep build exits with a non-zero code and shows a syntax or type error in the terminal.

  1. Read the exact error message — it includes the Bicep file name, line number, and a description.
  2. Common causes: misspelled property name, wrong type (string vs int), unclosed bracket, or referencing a resource before it is declared.
  3. Use VS Code with the Bicep extension — it highlights errors in real time before you even run build.
  4. For property path errors, check the ARM resource type documentation for the correct property names at your declared API version.
  5. After fixing, re-run az bicep build until it exits cleanly with no output.

Runbook 2: Pre-flight Validation Fails (Template Accepted by Bicep, Rejected by Azure)

Symptom: Build succeeds but az deployment group validate returns an error like InvalidTemplateDeployment or InvalidParameter.

  1. Read the full JSON response body — it contains a message field with the exact rejection reason.
  2. Common causes: @allowed value mismatch, resource name exceeds max length, incompatible parameter combination (e.g. OS disk type not valid for VM size), or subscription-level quota exceeded.
  3. If the error is quota-related, check quota with az vm list-usage --location <region> -o table before requesting an increase.
  4. If the error is permission-related (AuthorizationFailed), check the service principal or identity running the deployment has Contributor or the required role on the target resource group.
  5. Fix the input values or RBAC assignment, then re-validate before deploying.

Runbook 3: Runtime Dependency Order Failure

Symptom: Deployment starts but fails with ResourceNotFound or LinkedInvalidPropertyId, meaning resource B tried to reference a property from resource A before A finished creating.

  1. Identify the failing resource by running the operations list command above and finding the first Failed entry.
  2. Look at the resource properties — find any ID or name that references another resource in the same template.
  3. In Bicep, implicit dependency is created by referencing a resource using its symbolic name (e.g. vnet.id). Explicit dependsOn is only needed for side-effects not reflected in a property reference.
  4. If the code uses string interpolation like '${vnetName}' instead of vnet.id, Bicep does not see the dependency — change it to use the symbolic reference.
  5. Re-run what-if after fixing to confirm the resource order is now correct.
bicep
// WRONG: string interpolation breaks dependency detection
resource subnet 'Microsoft.Network/virtualNetworks/subnets@2023-09-01' existing = {
  name: 'snet-workloads'
  parent: resourceId('Microsoft.Network/virtualNetworks', vnetName)  // Bicep cannot track this dependency
}

// CORRECT: use symbolic reference — Bicep sees the dependency automatically
resource vnet 'Microsoft.Network/virtualNetworks@2023-09-01' = { ... }
var subnetId = vnet.properties.subnets[0].id  // Bicep knows vnet must exist first

Runbook 4: Azure Policy Blocks the Deployment

Symptom: Deployment fails with RequestDisallowedByPolicy. The template is valid but a resource violates an assigned Azure Policy.

  1. Get the policy denial details: az deployment operation group list --name <deploy-name> -g <rg> --query "[?properties.provisioningState=='Failed'].properties.statusMessage" -o json
  2. Find the policyDefinitionId in the statusMessage — this tells you which policy triggered the denial.
  3. Common policies that cause denials: require HTTPS-only storage, disallow public IPs, restrict allowed regions, enforce SKU tiers, require specific tags.
  4. Update your Bicep to comply: set allowBlobPublicAccess: false, add required tags, use an allowed SKU or region.
  5. If the policy is incorrect (applied to a wrong scope), escalate to the platform team — do not attempt to bypass it.

Runbook 5: What-if Shows Unexpected Resource Deletion

Symptom: What-if output shows a resource marked for deletion that should not be deleted. This is most common in complete deployment mode or after a module is refactored.

  1. Identify the resource shown as to-be-deleted in what-if output.
  2. Check whether it was removed from the Bicep template intentionally or accidentally.
  3. Check the deployment mode — complete mode deletes all resources in the RG not present in the template. Incremental mode only creates or updates, never deletes based on omission.
  4. If the resource is still needed, add it back to the template before deploying.
  5. Switch to incremental mode (--mode Incremental) for most use cases to prevent accidental deletions.
⚠️
Complete Mode Risk

Complete deployment mode deletes any Azure resource in the target resource group that is not present in the Bicep template. Always review what-if output before deploying in complete mode in any shared or production environment.

Interview Questions

Beginner

What is the first thing you do when a Bicep deployment fails?

Check which stage failed: compile, validation, or runtime. Each stage gives different errors in different places. Start with the CLI error output, then check deployment operations in Azure if the template was accepted.

What does az deployment group validate actually check?

It checks the ARM template structure, parameter types, allowed values, and basic resource type validity against Azure. It does not create any resources, but it does verify that the deployment would be accepted by ARM before execution.

What is deployment what-if and why is it important?

What-if shows exactly what changes ARM will make without executing them. It shows creates, modifications, deletes, and no-change resources. Running it before every production deploy is a control that prevents unexpected changes and policy violations.

Intermediate

What is the difference between incremental and complete deployment mode?

Incremental (the default) only creates or updates resources present in the template; it never deletes resources based on omission. Complete mode deletes any resource in the target resource group that is not in the template. Complete mode is more dangerous and requires review of what-if output before every run.

How do you trace which individual resource inside a deployment failed and why?

Use az deployment operation group list filtered to provisioningState=Failed. This returns each failed operation with a statusMessage containing the detailed ARM error code and message for that specific resource.

Why does string interpolation break implicit dependency in Bicep?

When you use a string like 'rg-${environment}' to reference a resource name, Bicep treats it as a static string and does not link it to a resource object. Use symbolic references like vnet.id to enable Bicep to build the correct dependency graph and deploy resources in the right order.

How do you find and understand a policy denial in a failed deployment?

Retrieve the failed operation status message with the operations list command. The message includes a policyDefinitionId and a description of the rule that was violated. Fix the resource properties to comply with the policy, then redeploy.

Scenario-based

A prod deployment was accepted by validate but failed halfway through. Some resources were created, others were not. What do you do?

Do not delete and rebuild blindly. List the deployment operations to see exactly which resources succeeded and which failed. Fix the root cause of the failure. Re-run the deployment — ARM's incremental mode will skip already-created resources and only retry or create missing ones, using idempotency to avoid duplicates.

A junior engineer says what-if is optional and they always just deploy. What is the risk?

Without what-if, unexpected deletes, replacements, or policy violations are only discovered after resources are affected. In complete mode, what-if skipping can cause production resources to be deleted. What-if in Bicep costs nothing and should be a mandatory pipeline gate for any environment beyond dev.

A deployment fails with RequestDisallowedByPolicy. The engineer wants to add an exception. What is the right process?

First, check whether the template actually violates the intent of the policy. If it does, fix the template to comply. If the policy is misapplied or wrong, escalate to the platform or governance team to evaluate an exception through the proper policy review process. Never bypass policy controls unilaterally in production.

Real-world Usage

Production platform teams build a four-gate deployment pipeline: (1) az bicep build in CI to catch compile errors on every pull request, (2) az deployment validate in the dev stage to verify ARM acceptance, (3) az deployment what-if with mandatory human review before any production deployment, (4) az deployment create with operation-level logging captured to the pipeline dashboard. Any gate failure stops the pipeline and routes to the owning team for triage.

Summary

Bicep deployment failures always belong to one of five runbooks: compile error, pre-flight rejection, dependency ordering, runtime policy denial, or unexpected deletion via what-if. Knowing which runbook to follow eliminates guesswork and reduces mean time to recovery. Always run build, validate, and what-if before create — the three commands that move failure detection as early and cheaply as possible.