AdvancedLesson 11 of 16

IIS Logging, Monitoring, and HTTP Error Codes

Read IIS logs like an incident responder: W3C fields, sc-status/substatus/win32-status, FREB traces, Event Viewer correlation, and practical monitoring for early detection of outages and latency regressions.

🧒 Simple Explanation (ELI5)

IIS logs are a flight recorder for your website. Every request leaves a trace: who asked, what URL, what status code, and how long it took. Error codes are like doctor reports: 404 means "not found", 500 means "server broke", 503 means "service unavailable".

🔧 Why Do We Need It?

🌍 Real-world Analogy

Like CCTV plus attendance logs in a building: you can prove who came in, where they tried to go, whether they were denied, and how long each action took.

⚙️ Technical Explanation

IIS W3C logs live under C:\inetpub\logs\LogFiles\W3SVC<siteId>. Key fields: date, time, c-ip, cs-method, cs-uri-stem, sc-status, sc-substatus, sc-win32-status, time-taken. Use all three status fields for diagnosis. Example: 401.2 5 implies auth config issue; 500.19 0x80070005 indicates config access denied.

FREB (Failed Request Event Buffering) logs module-level pipeline traces in XML under C:\inetpub\logs\FailedReqLogFiles. Enable FREB for targeted status codes (e.g., 500, 502) and latency thresholds to avoid disk bloat.

⚠️
Status Code Decoding Rule

Always interpret IIS outcomes as status.substatus.win32, not status alone. This triad identifies whether failure is auth, config, file system, or backend runtime.

📊 Visual Representation

IIS Observability Stack
Request Logs
W3C per request
Status and timing
Pipeline Tracing
FREB XML traces
Module-level failures
System Events
Event Viewer (WAS/W3SVC)
Crash / pool disable events
Metrics
Perf counters
CPU, req/sec, queue length

⌨️ Commands / Syntax

PowerShell / cmd
# Latest IIS log lines with 5xx
Get-ChildItem C:\inetpub\logs\LogFiles\W3SVC1\u_ex*.log | Sort-Object LastWriteTime -Descending | Select-Object -First 1 |
  ForEach-Object { Select-String -Path $_.FullName -Pattern " 5\d\d " }

# Top 20 URLs producing 500 errors (basic parse)
Get-Content (Get-ChildItem C:\inetpub\logs\LogFiles\W3SVC1\u_ex*.log | Sort-Object LastWriteTime -Descending | Select-Object -First 1).FullName |
  Where-Object {$_ -notmatch '^#' -and $_ -match ' 500 '} |
  ForEach-Object { ($_ -split ' ')[4] } |
  Group-Object | Sort-Object Count -Descending | Select-Object -First 20

# Event IDs related to app pool failures
Get-WinEvent -FilterHashtable @{LogName='Application'; Id=5005,5010,1000} -MaxEvents 50

# Check active worker processes and pools
%windir%\system32\inetsrv\appcmd list wp

💼 Example (Real-world Use Case)

Traffic spike caused 503 bursts. Logs showed 503 2 0 with high time-taken. Event logs showed app pool recycling and queue pressure. Team increased backend DB capacity and optimized blocking calls; 503s stopped.

🧪 Hands-on

  1. Generate a 404 by requesting a missing file; find it in W3C logs.
  2. Enable FREB for status 500 and reproduce an app error.
  3. Open FREB XML and identify the failing module.
  4. Correlate request timestamp with Event Viewer app events.
  5. Create a simple alert threshold: if 5xx > 2% in last 5 min, raise incident.
💡
Retention and Rotation

Move IIS logs off system drive when possible and enforce retention policy (e.g., 30-90 days hot, archive cold storage) to avoid disk-full incidents.

🐛 Debugging Scenario

Failure: Users report intermittent 502.5 on ASP.NET Core app.

🎯 Interview Questions

Beginner

Difference between 404 and 500?

404 means requested resource not found. 500 means server encountered internal error while processing request.

Where are IIS logs stored?

By default in C:\inetpub\logs\LogFiles\W3SVC<siteId>.

What does 503 usually indicate in IIS?

Service unavailable, often app pool stopped/disabled or request queue saturation.

What is FREB?

Failed Request Event Buffering, detailed XML traces of request pipeline execution for selected failures.

Why track time-taken?

It helps detect latency regressions even when status code is 200.

Intermediate

How do you distinguish app error vs platform error?

Correlate IIS status triad with Event Viewer and FREB module trace. Platform errors often show WAS/ANCM events and specific substatus patterns.

What does 500.19 mean?

Configuration data is invalid/unreadable; often malformed web.config, locked section, or permission issue.

How do you monitor app pool health?

Track recycle frequency, stopped/disabled pools, worker process crashes, and queue length counters.

How do you reduce logging overhead?

Log only required fields, sample debug traces, and scope FREB to specific codes/time windows.

How do you correlate across layers?

Use timestamps + request IDs where available to tie IIS log entries, app logs, and infrastructure metrics.

Scenario-based

5xx spike after deployment. First 10 minutes plan?

Freeze rollout, inspect latest IIS logs and Event IDs, compare with previous release, rollback if customer impact persists.

Disk fills due to logs. What then?

Archive/compress old logs, move path to larger volume, set retention and alerts for free space thresholds.

Intermittent 401.3 seen only on one node in farm?

Likely NTFS ACL drift on that node. Compare ACL baseline and app pool identity permissions across nodes.

503 only at peak hour?

Check queue length, worker CPU/thread saturation, and backend latency. Scale out and remove blocking calls.

No app logs but users see errors.

Use IIS logs/FREB and platform events first; request may fail before app code executes.

🌐 Real-world Usage

Teams that can read IIS status triads and correlate logs/events resolve incidents faster and avoid blind restarts.

📝 Summary

IIS observability depends on W3C logs, FREB, Event Viewer, and core counters together. Treat error analysis as correlation, not guesswork.