IntermediateLesson 5 of 16

Working with Files and Directories

Read and write files safely, traverse directory structures, list files by pattern, and work with file paths across platforms using the pathlib module—essential for log processing and configuration management.

🧒 Simple Explanation (ELI5)

Files are where data lives on disk—configuration files, logs, data exports. Your script needs to read a config, maybe write a report. The tricky part: file paths differ on Windows (C:\data\log.txt) vs Linux (/var/log/app.log). Python provides tools to abstract this difference, so you write once and it works everywhere.

🔧 Why Do We Need File Operations?

Log analysis: read log files, parse lines, extract errors or metrics.
Configuration: read config files, update settings, write back safely.
Data export: gather operational data and write to CSV or JSON for analysis.
Cross-platform: scripts need to work on Linux servers, Windows machines, and macOS dev boxes—pathlib handles differences.
File safety: use context managers (with statement) to ensure files are closed properly even if errors occur.

⚙️ Technical Explanation

File I/O: opening a file returns a file object, reading it retrieves data, writing puts data back. Context managers (with statement): automatically close files when exiting the block, even if an error occurs. pathlib.Path: modern, cross-platform way to work with file paths; replaces older os.path methods.

🔒

Always Use Context Managers for Files

Never use `f = open(...); f.read()` without closing. Always use `with open(...) as f:`. The with statement guarantees the file is closed, preventing file descriptor leaks and data loss.

⌨️ File and Directory Operations

python

# ===== PATHLIB: MODERN FILE PATH HANDLING =====
from pathlib import Path

# Create path objects (cross-platform)
log_file = Path("/var/log/app.log")
config_dir = Path("/etc/myapp")

# Path operations
filename = log_file.name           # "app.log"
parent_dir = log_file.parent       # Path("/var/log")
stem = log_file.stem              # "app" (without .log)
suffix = log_file.suffix          # ".log"

# Check if path exists
if log_file.exists():
    print("Log file found")

if log_file.is_file():
    print("It is a file")

if log_file.parent.is_dir():
    print("Parent is a directory")

# ===== READING FILES =====
# Read entire file
with open("/var/log/app.log", "r") as file:
    content = file.read()
    print(content)

# Read line by line (memory-efficient for large files)
with open("/var/log/app.log", "r") as file:
    for line in file:
        line = line.rstrip("\n")    # remove newline
        if "ERROR" in line:
            print(f"Error found: {line}")

# Read all lines into list
with open("/var/log/app.log", "r") as file:
    lines = file.readlines()       # includes newlines
    lines = [l.strip() for l in lines]  # remove whitespace

# ===== WRITING FILES =====
# Write (overwrites if exists)
output_file = Path("report.txt")
with open(output_file, "w") as file:
    file.write("Report Header\n")
    file.write("==============\n")
    file.write("Data here\n")

# Append to file
with open(output_file, "a") as file:
    file.write("New line appended\n")

# Write multiple lines
lines = ["Line 1", "Line 2", "Line 3"]
with open(output_file, "w") as file:
    file.writelines([f"{line}\n" for line in lines])

# ===== LISTING FILES =====
# All files in directory
log_dir = Path("/var/log")
all_files = log_dir.iterdir()       # generator of all items
for item in all_files:
    print(item)

# List only files (not directories)
files = [f for f in log_dir.iterdir() if f.is_file()]

# Find files by pattern
log_files = log_dir.glob("*.log")   # all .log files in dir
all_logs = log_dir.rglob("*.log")   # .log files recursively

# ===== CREATING/DELETING DIRECTORIES =====
new_dir = Path("/tmp/myapp")
new_dir.mkdir(parents=True, exist_ok=True)  # create with parents

# Create file (touch)
new_file = new_dir / "data.txt"
new_file.touch()

# Remove file
new_file.unlink()

# Delete directory (if empty)
new_dir.rmdir()

# ===== WORKING WITH PERMISSIONS =====
# Check file permissions
stat_info = log_file.stat()
mode = stat_info.st_mode           # raw mode
size = stat_info.st_size           # file size in bytes
mtime = stat_info.st_mtime         # modification time

# ===== SAFE FILE READING: HANDLING ERRORS =====
def safe_read_log(filepath):
    """Read log file, handling common errors."""
    try:
        with open(filepath, "r") as f:
            return f.read()
    except FileNotFoundError:
        print(f"File not found: {filepath}")
        return None
    except PermissionError:
        print(f"Permission denied: {filepath}")
        return None
    except Exception as e:
        print(f"Error reading file: {e}")
        return None

# ===== PARSING STRUCTURED FILE DATA =====
# Read CSV-like data
def parse_csv_log(filepath):
    """Parse comma-separated log file."""
    data = []
    try:
        with open(filepath, "r") as f:
            for line in f:
                parts = line.strip().split(",")
                if len(parts) >= 3:
                    data.append({
                        "timestamp": parts[0],
                        "level": parts[1],
                        "message": parts[2]
                    })
    except Exception as e:
        print(f"Error parsing: {e}")
    return data

# ===== TEMPORARY FILES =====
import tempfile

# Create temporary file
with tempfile.NamedTemporaryFile(mode="w", delete=False) as tmp:
    tmp.write("temporary data")
    tmp_path = tmp.name

# Use the file
with open(tmp_path, "r") as f:
    content = f.read()

# Clean up (optional—usually auto-deleted)
Path(tmp_path).unlink()

💼 Example (Real-world Use Case)

A monitoring script reads log files from /var/log/app, finds all ERROR lines, counts them by service, and writes a report to /var/reports/errors_today.txt. It uses pathlib to iterate directories cross-platform, safely reads large files line-by-line, and writes the report atomically (writes to temp file first, then renames—ensuring no partial output).

🧪 Hands-on

Write a script that lists all .log files in /var/log (recursively).
Read a log file, count lines containing "ERROR", and print the count.
Create a script that reads a config file, modifies a value, and writes it back safely (using a temp file).
Write a function that finds the newest file in a directory.
Create a CSV report from parsed log data and write it to a file.

🎮

Try It Yourself

Create a script that: (1) reads a file line by line, (2) finds all lines containing a keyword (user input), (3) writes matching lines to a new file called "output.txt". Test with a sample log file or create one.

🐛 Debugging Scenario

Problem: "FileNotFoundError: No such file or directory" when trying to read a file that you believe exists.

Cause: path is incorrect (relative vs absolute), file is on a different system, or permissions prevent listing the directory.
Diagnose: print the full path (absolute_path = Path(filepath).resolve()), use Path.exists() to check before opening, verify permissions on parent directory.
Fix: use absolute paths or carefully handle relative paths. Ensure parent directories exist before creating files. Use pathlib to avoid path separator confusion across platforms.

🎯 Interview Questions

Beginner

Why use a context manager (with statement) for file operations?▾

The with statement automatically closes the file when exiting the block, even if an error occurs. Without it, you risk leaving files open (file descriptor leak), losing data, and eventually crashing when the system runs out of file descriptors.

What is the difference between read(), readline(), and readlines()?▾

read() reads the entire file into memory (risky for huge files). readline() reads one line. readlines() reads all lines into a list. For large log files, loop over the file object directly (for line in file:) or use readline() to avoid loading everything into memory at once.

When should you use pathlib instead of os.path?▾

Always use pathlib for new code. It is more modern, more readable, and handles Windows/Linux path differences automatically. os.path is legacy—pathlib is the Python 3 standard for path operations.

Scenario-based

Write a function that safely reads a config file, handling FileNotFoundError and PermissionError, and returns default config if there is an error?▾

Use try/except to catch FileNotFoundError and PermissionError separately, log the error, and return a default dict. Wrap the file open in a with statement. Example: try open/read config else return {"default": "values"} on error.

🌐 Real-world Usage

Kubernetes logs to files, log aggregation reads those logs, monitoring scripts parse config files from /etc, deployment scripts write state files, CI/CD systems manage artifact directories. File I/O is everywhere in DevOps.

📝 Summary

Use with statements for file operations to ensure cleanup. Pathlib handles cross-platform paths correctly. Read large files line-by-line to conserve memory. Write to temp files then rename for atomicity. Check for file existence and permissions before operating. These practices prevent common issues like file leaks, data corruption, and portability problems.

PreviousFunctions and Code Organization ← Back to Course NextRegular Expressions and Text Processing