Do not confuse a clean syntax check with operational safety. The hard part is often preserving resource identity and avoiding accidental destruction during change.
Drift, Testing & Operations
Operate Terraform beyond the happy path: detect drift, validate refactors, handle imports, and reduce risk in long-lived infrastructure estates.
🧒 Simple Explanation (ELI5)
Even if Terraform built your infrastructure, the real world changes over time. Someone edits the portal, a provider behaves differently, or a refactor changes resource addresses. Drift and operations are about keeping Terraform’s view and reality aligned.
🤔 Why Do We Need It?
- Long-lived infrastructure always changes.
- Manual edits and emergency fixes create drift.
- Refactoring Terraform code can be risky if done carelessly.
🔧 Technical Explanation
Operational Terraform work often includes importing existing resources, detecting drift through plan output, using lifecycle settings intentionally, and refactoring resource addresses with moved blocks or state operations when necessary.
resource "azurerm_resource_group" "platform" {
name = "rg-platform-prod"
location = "eastus"
lifecycle {
prevent_destroy = true
}
}| Operational Task | Purpose |
|---|---|
| Drift detection | Find differences between code, state, and reality |
| Import | Bring existing resources under Terraform management |
| Lifecycle controls | Protect critical resources or ignore noisy changes intentionally |
| Refactor | Restructure modules or resource names without unnecessary recreation |
🌍 Real-World Use Case
A platform team migrates a manually created Azure network into Terraform, imports it, then gradually refactors the configuration into modules. CI/CD runs regular plans to surface drift. Critical production resource groups use prevent_destroy as a last-resort safety net.
🛠️ Hands-on
Useful Operational Commands
terraform import azurerm_resource_group.platform /subscriptions/.../resourceGroups/rg-platform-prod terraform state list terraform plan
Testing Mindset
- Run plans against lower environments before production.
- Test module changes with example stacks.
- Treat any unexpected replacement as a high-signal event.
🐛 Debugging Scenario
Problem: A harmless-looking refactor causes Terraform to destroy and recreate production resources.
- Check whether resource addresses changed.
- Review module path changes and
for_eachkey changes. - Use migration techniques rather than applying destructive plans blindly.
Renaming a resource in code can look trivial to a human and still look like delete-and-create to Terraform.
📋 Interview Questions
Beginner
Drift is when the real infrastructure no longer matches what Terraform expects based on the configuration and state.
It brings an existing infrastructure object under Terraform state so it can be managed by configuration.
It blocks Terraform from destroying a resource unless that protection is removed, helping guard critical infrastructure.
To detect drift and surface changes that happened outside the normal Terraform workflow.
Because they may cause downtime, data loss, or disruptive infrastructure recreation if applied carelessly.
Intermediate
Because the configuration must accurately describe the imported resource, ownership boundaries must be clear, and drift may already exist.
It is useful for intentionally externally managed fields, but risky if it hides important drift or bad ownership design.
They can change resource addresses and make Terraform think existing objects must be replaced or destroyed.
Because there is already live infrastructure and existing consumers, so the cost of mistakes is much higher.
Scheduled or pull-request-driven plan runs can surface unexpected infrastructure differences before they become bigger problems.
Scenario-Based
I stop, inspect address changes and state implications, and plan a safe migration rather than applying the destructive diff.
Compare live infrastructure, state, and code, then reconcile intentionally by importing, updating configuration, or reverting unsupported changes.
It is only a guardrail. Good review, environment protection, and clear ownership still matter more.
Start in lower environments, compare plans carefully, validate resource identities, and avoid mixing broad refactors with unrelated changes.
Drift is monitored, imports are deliberate, refactors are staged, and unexpected replacements are treated as serious signals rather than normal noise.
🧾 Summary
Terraform operations is where platform engineering gets real. The mature skill is not just creating infrastructure, but evolving and protecting it safely over time.