Troubleshooting
Diagnose and fix the most common Splunk problems — no data indexed, forwarder failures, license warnings, slow searches, and field extraction issues.
Simple Explanation (ELI5)
When something goes wrong in Splunk, there's always a trace. Splunk logs about itself (internal indexes), forwarders have their own logs, and indexers track every piece of data they process. Troubleshooting in Splunk means following the data trail from source to search result.
Troubleshooting Framework
Approach every Splunk problem by tracing the data path from source to search: App/Server → Forwarder → Network → Indexer Pipeline → Index → Search Head → Results. Identify at which stage data is missing or incorrect.
Issue 1: No Data in Search Results
# Step 1: Verify data exists in the index (without filters) index=prod_app | head 5 | table _time host sourcetype _raw # Step 2: Check index list — is the index correct? | rest /services/data/indexes | table title currentDBSizeMB totalEventCount # Step 3: Check forwarder connection status from search head index=_internal sourcetype=splunkd source=*metrics.log* component=Metrics group=tcpin_connections | stats count by hostname, sourceIp # Step 4: Check indexing throughput per host index=_internal source=*metrics.log* group=per_host_thruput | timechart span=5m avg(kbps) by hostname # Step 5: Verify data is being received at the indexer index=_internal source=*splunkd.log* connection_type=cooked | stats count by host
Issue 2: Universal Forwarder Not Sending Data
# On the forwarder host — check monitored files $SPLUNK_HOME/bin/splunk list monitor # Check forwarder connection status $SPLUNK_HOME/bin/splunk list forward-server # Validate outputs.conf is correct $SPLUNK_HOME/bin/splunk btool outputs list --debug # Check inputs.conf is parsing correctly $SPLUNK_HOME/bin/splunk btool inputs list --debug # Restart the forwarder $SPLUNK_HOME/bin/splunk restart # Check forwarder log for errors tail -f $SPLUNK_HOME/var/log/splunk/splunkd.log | grep -i error
Issue 3: License Warning / Quota Exceeded
# Find top data contributors (by sourcetype) index=_internal source=*license_usage.log* earliest=-1d | stats sum(b) AS bytes by st | eval MB=round(bytes/1024/1024,2) | sort - MB | head 20 | rename st AS sourcetype # Check daily ingestion trend index=_internal source=*license_usage.log* | timechart span=1d sum(b) AS daily_bytes | eval GB=round(daily_bytes/1024/1024/1024,3) # Identify which host is driving volume index=_internal source=*license_usage.log* earliest=-1d | stats sum(b) AS bytes by h | eval MB=round(bytes/1024/1024,2) | sort - MB | head 10 | rename h AS host
Issue 4: Slow Search Performance
# Check job inspector — click search ID in Activity → Jobs # Key metrics: Scan count, Event count, Run time # Use tstats for metadata-only queries (10-100x faster) | tstats count WHERE index=prod_app by sourcetype, host # Check search concurrency — too many simultaneous searches? index=_internal sourcetype=scheduler | timechart span=5m count AS concurrent_searches # Find long-running scheduled searches index=_internal sourcetype=scheduler status=completed | stats avg(run_time) AS avg_runtime by saved_search_name | sort - avg_runtime | head 10 # Check indexer search load index=_internal sourcetype=splunkd component=SearchOperator | stats count by host | sort - count
Issue 5: Field Not Extracted
# Step 1: Verify the raw event has the expected field index=prod_app | head 5 | table _raw # Step 2: Test rex extraction inline index=prod_app | rex field=_raw "duration=(?P<dur_ms>\d+)" | table _time dur_ms _raw # Step 3: Check what sourcetype is being applied index=prod_app | head 10 | table sourcetype _raw # If sourcetype is wrong — fix in inputs.conf: [monitor:///var/log/app/app.log] sourcetype = my_custom_app index = prod_app # In props.conf — define extraction: [my_custom_app] EXTRACT-duration = duration=(?P<duration_ms>\d+) TIME_FORMAT = %Y-%m-%dT%H:%M:%S MAX_TIMESTAMP_LOOKAHEAD = 25 # Validate btool shows the new extraction $SPLUNK_HOME/bin/splunk btool props list my_custom_app --debug
Issue 6: Indexer Not Receiving Data
# Check if indexer listener is active (on indexer host) $SPLUNK_HOME/bin/splunk list inputstatus # Check network connectivity to indexer port 9997 telnet indexer.company.com 9997 # Check indexer receiving log index=_internal source=*splunkd.log* host=indexer01 "Message from" | head 20 | table _time source _raw # Check for parsing errors on indexer index=_internal source=*splunkd.log* log_level=ERROR host=indexer01 | head 20 | table _time _raw
Troubleshooting Checklist
| Symptom | First Check | SPL/Command |
|---|---|---|
| No results in search | Is the index correct? | index=X | head 5 |
| Forwarder silent | outputs.conf / connectivity | splunk list forward-server |
| License warning | Top sourcetype volume | index=_internal source=*license_usage* |
| Search very slow | Job Inspector scan count | Click job ID → Job Inspector |
| Field missing | sourcetype assignment | index=X | head 1 | table sourcetype _raw |
| Wrong timestamps | TIME_FORMAT in props.conf | splunk btool props list sourcetype --debug |
Debugging Scenarios
- Data stops arriving suddenly from one server: Check if the UF service is still running; verify disk space hasn't filled on the forwarder host; confirm the log file is being written to regularly.
- All data arrives 1 hour late: Check for timezone mismatch in TIME_FORMAT — data is indexed correctly but the timestamp extracted is wrong, causing it to appear in the wrong time bucket.
- "Access denied" when searching an index: User's Splunk role doesn't have read permission on that index — update role capabilities in Settings → Roles.
- Dashboard loads stale data: Saved search cache is outdated — check the search schedule frequency and update to match the expected freshness.
Real-world Use Case
A production team reported "no logs for the last 2 hours." The SRE used index=_internal source=*metrics.log* group=tcpin_connections to verify the specific forwarder was not connecting. They then checked splunk list forward-server on the host and found the outputs.conf was pointing to a decommissioned indexer IP. They updated outputs.conf, restarted the forwarder, and logs began flowing again within 60 seconds. Total resolution time: 8 minutes.
Interview Questions
Beginner
In $SPLUNK_HOME/var/log/splunk/splunkd.log locally, and indexed into the _internal index for searchable access via SPL.
It stores Splunk's own operational metrics and events — forwarder connections, indexing throughput, license usage, search job activity.
Run splunk list forward-server on the forwarder, or search index=_internal sourcetype=splunkd group=tcpin_connections on the indexer.
A Splunk CLI tool that validates and displays the merged effective configuration from all conf files, showing which settings apply to a given stanza.
Verify the index name is correct, the time range is appropriate, and data is actually indexed with index=X | head 5.
Intermediate
Query index=_internal source=*license_usage.log* | stats sum(b) by st to find which sourcetypes are consuming the most license volume.
Use the Job Inspector to compare scan count vs event count. High ratio means poor index selectivity — add index/sourcetype filters and narrow the time range.
Add TIME_FORMAT and TIME_PREFIX to props.conf for the sourcetype. Validate with btool. May need to re-index affected data if historical records have wrong timestamps.
The app changed its log format — the regex no longer matches. Run | head 5 | table _raw to inspect current format and update the EXTRACT regex accordingly.
Use splunk btool props list <sourcetype> --debug to verify the merged effective config. Props changes take effect for new data without restart; transforms changes may require restart.
Scenario-based
1. Check index=_internal tcpin_connections | stats by hostname — is the forwarder connected? 2. SSH to server, check if UF is running and log file is active. 3. Check splunk list monitor on forwarder. 4. Check network to indexer port 9997.
Find top sourcetypes causing volume. Apply NULLQUEUE transforms to drop DEBUG logs from the top contributor at the Heavy Forwarder. This reduces indexed volume without affecting service logs.
The timestamp isn't being extracted from the event properly — Splunk uses the indexing time as the event timestamp. Fix TIME_FORMAT and TIME_PREFIX in props.conf for the sourcetype.
Check if the specific index used by this dashboard has data: index=X earliest=-2h | timechart span=10m count — find the gap. Then check forwarder connectivity for that index's sources.
Set explicit sourcetype in inputs.conf, add timestamp extraction to props.conf, build EXTRACT regex in transforms.conf, validate with btool, test inline with rex, then deploy the configuration app via Deployment Server.
Summary
Systematic troubleshooting in Splunk follows the data path: source → forwarder → indexer → search. The _internal index is your best friend — it contains metrics, connection status, and error events for every Splunk component. Combine btool for config validation with SPL queries against _internal for operational diagnosis.