IntermediateLesson 7 of 16

Working with JSON and YAML

Parse JSON APIs, read YAML config files, transform data structures, and work with structured data—the lingua franca of modern cloud infrastructure automation.

🧒 Simple Explanation (ELI5)

JSON and YAML are file formats for storing structured data. Think of them as filing cabinets: instead of writing text, you organize data in labeled boxes (keys) with values inside. Kubernetes manifests are YAML. Cloud APIs return JSON. Your script reads these files, extracts the data, and works with it as Python dictionaries (labeled boxes with data inside).

🔧 Why Do We Need JSON and YAML?

⚙️ Technical Explanation

JSON: key-value pairs using {}, arrays using [], strings in quotes. YAML: uses indentation instead of braces, colon-separated key-value pairs, native support for dates and multiline strings. In Python, both parse into dictionaries and lists—you work with native Python data structures.

💡
YAML Whitespace Matters

YAML uses indentation to show structure. Two spaces per level is standard, but be consistent. Mixing spaces and tabs breaks YAML parsing. Use YAML linters (yamllint) to validate before committing to version control.

⌨️ JSON and YAML Operations

python
import json
import yaml
from pathlib import Path

# ===== JSON PARSING =====
# JSON string to Python dict
json_str = '{"hostname": "web01", "port": 8080, "ssl": true}'
data = json.loads(json_str)
print(data["hostname"])     # "web01"
print(data["port"])         # 8080 (int, not string!)

# JSON null becomes Python None
json_with_null = '{"status": null}'
parsed = json.loads(json_with_null)
print(parsed["status"])     # None

# ===== JSON FROM FILE =====
with open("config.json", "r") as f:
    config = json.load(f)   # note: load() not loads()
    print(config)

# ===== PYTHON TO JSON =====
# Python dict to JSON string
deployment = {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {"name": "web-app"},
    "spec": {"replicas": 3}
}
json_output = json.dumps(deployment, indent=2)  # indent for readability
print(json_output)

# ===== JSON TO FILE =====
with open("deployment.json", "w") as f:
    json.dump(deployment, f, indent=2)

# ===== YAML PARSING =====
yaml_str = """
hostname: web01
port: 8080
ssl: true
tags:
  - production
  - monitored
services:
  - name: nginx
    status: running
  - name: postgres
    status: stopped
"""

data = yaml.safe_load(yaml_str)
print(data["hostname"])         # "web01"
print(data["port"])             # 8080
print(data["tags"])             # ['production', 'monitored']
print(data["services"][0])      # {"name": "nginx", "status": "running"}

# ===== YAML FROM FILE =====
with open("deployment.yaml", "r") as f:
    manifest = yaml.safe_load(f)
    replicas = manifest["spec"]["replicas"]
    print(f"Replicas: {replicas}")

# ===== YAML MULTILINE STRINGS =====
# YAML supports multiline text
yaml_multiline = """
script: |
  #!/bin/bash
  echo "Deploying..."
  kubectl apply -f deployment.yaml
description: >
  This script deploys the application
  to the Kubernetes cluster
"""

parsed = yaml.safe_load(yaml_multiline)
print(parsed["script"])
# #!/bin/bash
# echo "Deploying..."
# kubectl apply -f deployment.yaml

# ===== PYTHON TO YAML =====
config = {
    "apiVersion": "v1",
    "kind": "ConfigMap",
    "metadata": {"name": "app-config"},
    "data": {
        "log_level": "INFO",
        "debug": False
    }
}

yaml_output = yaml.dump(config, default_flow_style=False)
print(yaml_output)

# Write to file
with open("configmap.yaml", "w") as f:
    yaml.dump(config, f, default_flow_style=False)

# ===== WORKING WITH NESTED DATA =====
# Access nested values safely
deployment = {
    "metadata": {"name": "app"},
    "spec": {"replicas": 3, "selector": {"matchLabels": {"app": "web"}}}
}

# Direct access (risky if key missing)
name = deployment["metadata"]["name"]

# Safe access using .get()
replicas = deployment.get("spec", {}).get("replicas")   # returns None if missing

# ===== TRANSFORMING DATA =====
# Kubernetes Deployment to scaled-down version
full_deployment = {
    "apiVersion": "apps/v1",
    "kind": "Deployment",
    "metadata": {"name": "web-app", "namespace": "default"},
    "spec": {
        "replicas": 3,
        "selector": {"matchLabels": {"app": "web"}},
        "template": {
            "metadata": {"labels": {"app": "web"}},
            "spec": {"containers": [{"name": "web", "image": "nginx:latest", "ports": [{"containerPort": 80}]}]}
        }
    }
}

# Extract just the essential parts for a smaller manifest
minimal = {
    "apiVersion": full_deployment["apiVersion"],
    "kind": full_deployment["kind"],
    "metadata": {"name": full_deployment["metadata"]["name"]},
    "spec": {"replicas": 1}  # Reduce replicas
}

# ===== VALIDATION =====
# Check if required fields exist
def validate_deployment(deploy_dict):
    """Check if deployment has required fields."""
    required_fields = ["apiVersion", "kind", "metadata", "spec"]
    for field in required_fields:
        if field not in deploy_dict:
            raise ValueError(f"Missing required field: {field}")
    if deploy_dict["kind"] != "Deployment":
        raise ValueError("kind must be 'Deployment'")
    return True

# ===== READING MULTIPLE FILES =====
yaml_dir = Path("./manifests")
all_manifests = []

for yaml_file in yaml_dir.glob("*.yaml"):
    with open(yaml_file) as f:
        manifest = yaml.safe_load(f)
        all_manifests.append(manifest)

print(f"Loaded {len(all_manifests)} manifests")

# ===== COMMON PITFALL: JSON IN JSON =====
# API returns a JSON string containing JSON
api_response = '{"data": "{\\"name\\": \\"app\\"}"}'
outer = json.loads(api_response)
inner = json.loads(outer["data"])
print(inner["name"])  # "app"

# ===== USE SAFE_LOAD FOR YAML =====
# Never use yaml.load() — always use yaml.safe_load()
# load() can execute arbitrary Python code (security risk)
# safe_load() only constructs simple Python objects
data = yaml.safe_load(yaml_str_untrusted)  # SAFE
# data = yaml.load(yaml_str_untrusted)      # DANGEROUS

💼 Example (Real-world Use Case)

A deployment script reads a Kubernetes manifest (YAML), updates the image tag from "v1.0" to "v2.0", scales replicas based on environment (dev=1, prod=3), converts to JSON, sends to the API, and parses the response. All using JSON/YAML parsing and transformation—a daily DevOps task.

🧪 Hands-on

  1. Read a JSON file, modify a value, and write it back.
  2. Read a YAML file (Kubernetes manifest), extract the image tag, and print it.
  3. Convert a Python dictionary to both JSON and YAML and write to files.
  4. Read a JSON API response (from a file), extract a nested value, and check if it meets a condition.
  5. Parse YAML with multiline strings and manipulate the extracted data.
🎮
Try It Yourself

Create a YAML file representing a mock Kubernetes Deployment. Write a Python script that reads it, scales the replicas to 5, changes the image to "myapp:v2", and prints the modified manifest as formatted JSON.

🐛 Debugging Scenario

Problem: "json.decoder.JSONDecodeError: Expecting value" when parsing JSON.

🎯 Interview Questions

Beginner

What is the difference between json.load() and json.loads()?

json.load() reads from a file object. json.loads() parses a string. load vs loads: file vs string. Same pattern in YAML: yaml.load() vs yaml.loads() (though yaml.safe_load and yaml.safe_load_all are more common).

When would you use YAML vs JSON?

Use YAML for configuration files (more readable), JSON for APIs (smaller, faster parsing). Kubernetes and Helm use YAML for manifests because indentation is visual. APIs use JSON because it is more compact. Both parse to the same Python structures.

Why use yaml.safe_load() instead of yaml.load()?

yaml.load() can execute arbitrary Python code (security risk). safe_load() only constructs simple Python objects. Always use safe_load unless you have a specific reason not to (rare in DevOps).

Scenario-based

Write code to read a Kubernetes manifest, extract the image, and check if it is using the latest tag. What would you do if the image is missing?

Use yaml.safe_load() to parse. Extract the image using nested .get() calls to handle missing keys: image = manifest.get("spec", {}).get("template", {}).get("spec", {}).get("containers", [{}])[0].get("image", "unknown"). Check if "latest" is in the image string. If "latest" or no version tag, flag it.

🌐 Real-world Usage

Kubernetes manifests are YAML. Helm values are YAML. Terraform outputs JSON. CloudFormation is JSON. All cloud APIs return JSON. Every DevOps workflow involves reading and writing JSON/YAML—this is non-negotiable.

📝 Summary

json.load(file) and json.loads(string) parse JSON into Python dicts/lists. yaml.safe_load() parses YAML (never yaml.load()—it is a security risk). json.dump(obj, file) writes to file, json.dumps(obj, indent=2) formats as string. Both JSON and YAML parse to identical Python structures; work with them as dicts and lists. Use .get() for safe access to nested keys. Always validate and handle parsing errors (corrupted files, malformed data).