Troubleshooting Azure Networking
Production-first troubleshooting playbook for the most common networking failures: unreachable VMs, blocked ports, DNS errors, and load balancer issues.
Troubleshooting Method
- Validate symptom and impacted path (source, destination, port, protocol).
- Check DNS resolution before deeper packet analysis.
- Check route path and security rules (NSG, firewall, UDR).
- Check endpoint health (service listening, health probes).
- Capture logs/metrics and confirm fix with repeatable test.
Case 1: VM Not Reachable
| Check | Why |
|---|---|
| NSG inbound on 22/3389 | Admin port may be blocked |
| Public IP / Bastion path | No entry path to private VM |
| NIC effective routes | Traffic may follow wrong route |
| Guest firewall | OS-level firewall may block traffic |
Case 2: Port Blocked Issues
- Use Network Watcher IP Flow Verify to test allow/deny decision.
- Inspect NSG rule priority conflicts.
- Check Azure Firewall deny logs for blocked egress/ingress.
- Confirm application listens on expected interface and port.
Case 3: DNS Resolution Failure
- Run nslookup/dig from source host.
- Check VNet DNS server settings and private DNS zone links.
- Validate private endpoint record resolves to private IP.
- Flush client cache and retest.
Case 4: Load Balancer Not Working
- Probe unhealthy: verify backend returns expected response code/path.
- Backend pool empty/misconfigured.
- NSG denies health probe source traffic.
- Asymmetric routing due to UDR/firewall path mismatch.
Hands-on Commands
Diagnostics CLI
# NSG effective rules for NIC az network nic list-effective-nsg -g rg-net --name vm1Nic # Effective route table az network nic show-effective-route-table -g rg-net --name vm1Nic # Network watcher connectivity test az network watcher test-connectivity \ --source-resource /subscriptions//resourceGroups/rg-net/providers/Microsoft.Compute/virtualMachines/vm1 \ --dest-address 10.30.3.4 --dest-port 1433 # Load balancer backend health az network lb address-pool show -g rg-lb --lb-name lb-web -n be-web # DNS resolution test (inside VM) nslookup app.internal.contoso.local
Real-world Use Case
A release caused API downtime because a new NSG rule accidentally blocked backend port 8443. Effective NSG analysis identified rule priority conflict and restored traffic in minutes.
Interview Questions
Beginner
First three checks when VM is unreachable?
DNS/IP validity, NSG rules, and route path to destination.
Intermediate
Why can healthy VMs still get no traffic from LB?
Probe mismatch, backend registration errors, or NSG blocking probe/client traffic.
Scenario-based
DNS resolves correctly on one subnet but not another.
Check private DNS VNet links and custom DNS forwarding for affected subnet/VNet.
Summary
Most networking incidents are path problems: name resolution, route selection, policy filtering, or endpoint health. A disciplined layered method finds root cause quickly.