Always interpret IIS outcomes as status.substatus.win32, not status alone. This triad identifies whether failure is auth, config, file system, or backend runtime.
IIS Logging, Monitoring, and HTTP Error Codes
Read IIS logs like an incident responder: W3C fields, sc-status/substatus/win32-status, FREB traces, Event Viewer correlation, and practical monitoring for early detection of outages and latency regressions.
🧒 Simple Explanation (ELI5)
IIS logs are a flight recorder for your website. Every request leaves a trace: who asked, what URL, what status code, and how long it took. Error codes are like doctor reports: 404 means "not found", 500 means "server broke", 503 means "service unavailable".
🔧 Why Do We Need It?
- Find root causes quickly during incidents.
- Detect trends before outages (rising 5xx or latency).
- Support security investigations with source IP and user-agent evidence.
- Measure SLA performance (response time, error rate, throughput).
🌍 Real-world Analogy
Like CCTV plus attendance logs in a building: you can prove who came in, where they tried to go, whether they were denied, and how long each action took.
⚙️ Technical Explanation
IIS W3C logs live under C:\inetpub\logs\LogFiles\W3SVC<siteId>. Key fields: date, time, c-ip, cs-method, cs-uri-stem, sc-status, sc-substatus, sc-win32-status, time-taken. Use all three status fields for diagnosis. Example: 401.2 5 implies auth config issue; 500.19 0x80070005 indicates config access denied.
FREB (Failed Request Event Buffering) logs module-level pipeline traces in XML under C:\inetpub\logs\FailedReqLogFiles. Enable FREB for targeted status codes (e.g., 500, 502) and latency thresholds to avoid disk bloat.
📊 Visual Representation
⌨️ Commands / Syntax
# Latest IIS log lines with 5xx
Get-ChildItem C:\inetpub\logs\LogFiles\W3SVC1\u_ex*.log | Sort-Object LastWriteTime -Descending | Select-Object -First 1 |
ForEach-Object { Select-String -Path $_.FullName -Pattern " 5\d\d " }
# Top 20 URLs producing 500 errors (basic parse)
Get-Content (Get-ChildItem C:\inetpub\logs\LogFiles\W3SVC1\u_ex*.log | Sort-Object LastWriteTime -Descending | Select-Object -First 1).FullName |
Where-Object {$_ -notmatch '^#' -and $_ -match ' 500 '} |
ForEach-Object { ($_ -split ' ')[4] } |
Group-Object | Sort-Object Count -Descending | Select-Object -First 20
# Event IDs related to app pool failures
Get-WinEvent -FilterHashtable @{LogName='Application'; Id=5005,5010,1000} -MaxEvents 50
# Check active worker processes and pools
%windir%\system32\inetsrv\appcmd list wp
💼 Example (Real-world Use Case)
Traffic spike caused 503 bursts. Logs showed 503 2 0 with high time-taken. Event logs showed app pool recycling and queue pressure. Team increased backend DB capacity and optimized blocking calls; 503s stopped.
🧪 Hands-on
- Generate a 404 by requesting a missing file; find it in W3C logs.
- Enable FREB for status 500 and reproduce an app error.
- Open FREB XML and identify the failing module.
- Correlate request timestamp with Event Viewer app events.
- Create a simple alert threshold: if 5xx > 2% in last 5 min, raise incident.
Move IIS logs off system drive when possible and enforce retention policy (e.g., 30-90 days hot, archive cold storage) to avoid disk-full incidents.
🐛 Debugging Scenario
Failure: Users report intermittent 502.5 on ASP.NET Core app.
- IIS log shows 502 with short time-taken.
- Event Viewer indicates ANCM startup failures after recycle.
- stdout logs reveal missing runtime dependency.
- Install matching .NET Hosting Bundle and restart pool.
- Verify health endpoint returns stable 200.
🎯 Interview Questions
Beginner
404 means requested resource not found. 500 means server encountered internal error while processing request.
By default in C:\inetpub\logs\LogFiles\W3SVC<siteId>.
Service unavailable, often app pool stopped/disabled or request queue saturation.
Failed Request Event Buffering, detailed XML traces of request pipeline execution for selected failures.
It helps detect latency regressions even when status code is 200.
Intermediate
Correlate IIS status triad with Event Viewer and FREB module trace. Platform errors often show WAS/ANCM events and specific substatus patterns.
Configuration data is invalid/unreadable; often malformed web.config, locked section, or permission issue.
Track recycle frequency, stopped/disabled pools, worker process crashes, and queue length counters.
Log only required fields, sample debug traces, and scope FREB to specific codes/time windows.
Use timestamps + request IDs where available to tie IIS log entries, app logs, and infrastructure metrics.
Scenario-based
Freeze rollout, inspect latest IIS logs and Event IDs, compare with previous release, rollback if customer impact persists.
Archive/compress old logs, move path to larger volume, set retention and alerts for free space thresholds.
Likely NTFS ACL drift on that node. Compare ACL baseline and app pool identity permissions across nodes.
Check queue length, worker CPU/thread saturation, and backend latency. Scale out and remove blocking calls.
Use IIS logs/FREB and platform events first; request may fail before app code executes.
🌐 Real-world Usage
Teams that can read IIS status triads and correlate logs/events resolve incidents faster and avoid blind restarts.
📝 Summary
IIS observability depends on W3C logs, FREB, Event Viewer, and core counters together. Treat error analysis as correlation, not guesswork.