AdvancedLesson 12 of 16

Drift, Testing & Operations

Operate Terraform beyond the happy path: detect drift, validate refactors, handle imports, and reduce risk in long-lived infrastructure estates.

🧒 Simple Explanation (ELI5)

Even if Terraform built your infrastructure, the real world changes over time. Someone edits the portal, a provider behaves differently, or a refactor changes resource addresses. Drift and operations are about keeping Terraform’s view and reality aligned.

🤔 Why Do We Need It?

🔧 Technical Explanation

Operational Terraform work often includes importing existing resources, detecting drift through plan output, using lifecycle settings intentionally, and refactoring resource addresses with moved blocks or state operations when necessary.

hcl
resource "azurerm_resource_group" "platform" {
  name     = "rg-platform-prod"
  location = "eastus"

  lifecycle {
    prevent_destroy = true
  }
}
Operational TaskPurpose
Drift detectionFind differences between code, state, and reality
ImportBring existing resources under Terraform management
Lifecycle controlsProtect critical resources or ignore noisy changes intentionally
RefactorRestructure modules or resource names without unnecessary recreation
🧪
Operational Discipline

Do not confuse a clean syntax check with operational safety. The hard part is often preserving resource identity and avoiding accidental destruction during change.

🌍 Real-World Use Case

A platform team migrates a manually created Azure network into Terraform, imports it, then gradually refactors the configuration into modules. CI/CD runs regular plans to surface drift. Critical production resource groups use prevent_destroy as a last-resort safety net.

🛠️ Hands-on

Useful Operational Commands

bash
terraform import azurerm_resource_group.platform /subscriptions/.../resourceGroups/rg-platform-prod
terraform state list
terraform plan

Testing Mindset

🐛 Debugging Scenario

Problem: A harmless-looking refactor causes Terraform to destroy and recreate production resources.

⚠️
Refactor Trap

Renaming a resource in code can look trivial to a human and still look like delete-and-create to Terraform.

📋 Interview Questions

Beginner

What is drift in Terraform?

Drift is when the real infrastructure no longer matches what Terraform expects based on the configuration and state.

What does terraform import do?

It brings an existing infrastructure object under Terraform state so it can be managed by configuration.

What does prevent_destroy do?

It blocks Terraform from destroying a resource unless that protection is removed, helping guard critical infrastructure.

Why run regular plans even when you are not actively changing code?

To detect drift and surface changes that happened outside the normal Terraform workflow.

Why are unexpected replacements dangerous?

Because they may cause downtime, data loss, or disruptive infrastructure recreation if applied carelessly.

Intermediate

Why is import often harder than it sounds?

Because the configuration must accurately describe the imported resource, ownership boundaries must be clear, and drift may already exist.

When is ignore_changes useful and when is it risky?

It is useful for intentionally externally managed fields, but risky if it hides important drift or bad ownership design.

How do module refactors create risk?

They can change resource addresses and make Terraform think existing objects must be replaced or destroyed.

Why treat operational Terraform changes differently from greenfield builds?

Because there is already live infrastructure and existing consumers, so the cost of mistakes is much higher.

How does CI/CD support drift detection?

Scheduled or pull-request-driven plan runs can surface unexpected infrastructure differences before they become bigger problems.

Scenario-Based

A production plan suddenly shows resource destruction after a refactor. What is your first move?

I stop, inspect address changes and state implications, and plan a safe migration rather than applying the destructive diff.

A team made emergency portal edits during an outage. How do you restore Terraform trust?

Compare live infrastructure, state, and code, then reconcile intentionally by importing, updating configuration, or reverting unsupported changes.

Why is lifecycle.prevent_destroy not a full safety strategy?

It is only a guardrail. Good review, environment protection, and clear ownership still matter more.

How would you test a module refactor safely?

Start in lower environments, compare plans carefully, validate resource identities, and avoid mixing broad refactors with unrelated changes.

What does mature Terraform operations look like?

Drift is monitored, imports are deliberate, refactors are staged, and unexpected replacements are treated as serious signals rather than normal noise.

🧾 Summary

Terraform operations is where platform engineering gets real. The mature skill is not just creating infrastructure, but evolving and protecting it safely over time.