Azure SDK is modular: `pip install azure-identity azure-mgmt-compute` installs identity and VM management. You do not install everything—only the services you need. Same for Kubernetes: `pip install kubernetes`.
Cloud SDKs: Azure and Kubernetes
Use official Python SDKs to programmatically manage cloud resources—provision VMs in Azure, deploy to Kubernetes, query cloud APIs like a native—cloud automation becomes code.
🧒 Simple Explanation (ELI5)
Cloud providers publish libraries (SDKs) so you do not need to manually craft HTTP requests. Instead of writing raw API calls, you use Python classes and methods that do the work: `azure_client.compute_operations.virtual_machines.create()` creates a VM with proper auth, error handling, and retries already baked in. Same for Kubernetes—instead of `kubectl` commands or hand-crafted API calls, use the Python client to manage clusters programmatically.
🔧 Why Do We Need Cloud SDKs?
- Abstraction: SDKs handle authentication, pagination, retries—you write 5 lines instead of 50.
- Type safety: IDEs can autocomplete and validate parameters instead of guessing.
- Infrastructure-as-code: define infrastructure in Python instead of Terraform or YAML.
- Programmatic control: integrate cloud operations into orchestration workflows.
- Multi-cloud: same pattern (install SDK, authenticate, call operations) works for AWS, Azure, GCP—DevOps learns once, applies everywhere.
⚙️ Technical Explanation
Azure SDK: provides clients for each service (compute for VMs, storage for blobs, etc.). Kubernetes Python client: mirrors the API—same resources (Pod, Deployment, Service) exist as Python classes. Authentication: SDKs support multiple auth methods (env variables, managed identity, credentials file, interactive).
⌨️ Cloud SDK Patterns
# ===== AZURE AUTHENTICATION =====
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient
from azure.mgmt.resource import ResourceManagementClient
# DefaultAzureCredential tries multiple auth methods in order:
# 1. Environment variables (AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, etc.)
# 2. Managed identity (if running in Azure service)
# 3. CLI credentials (az login)
# 4. Interactive browser login
credential = DefaultAzureCredential()
subscription_id = "your-subscription-id"
# Create clients for different Azure services
compute_client = ComputeManagementClient(credential, subscription_id)
resource_client = ResourceManagementClient(credential, subscription_id)
# ===== CREATE AZURE VM =====
# Define VM parameters
vm_params = {
"location": "eastus",
"os_profile": {
"computer_name": "myvm",
"admin_username": "azureuser",
"linux_configuration": {
"disable_password_authentication": True,
"ssh_public_keys": [{
"path": "/home/azureuser/.ssh/authorized_keys",
"key_data": "ssh-rsa AAAAB3NzaC1yc2E..." # Your public key
}]
}
},
"hardware_profile": {"vm_size": "Standard_B2s"},
"storage_profile": {
"image_reference": {
"publisher": "Canonical",
"offer": "0001-com-ubuntu-server-focal",
"sku": "20_04-lts-gen2",
"version": "latest"
}
},
"network_profile": {
"network_interfaces": [{
"id": "/subscriptions/.../networkInterfaces/mynic",
"properties": {"primary": True}
}]
}
}
# Create VM (async operation)
resource_group = "my-rg"
vm_name = "myvm"
async_vm_create = compute_client.virtual_machines.begin_create_or_update(
resource_group,
vm_name,
vm_params
)
# Wait for creation to complete
vm = async_vm_create.result()
print(f"VM created: {vm.name}")
# ===== LIST AZURE RESOURCES =====
# List all VMs in resource group
vms = compute_client.virtual_machines.list(resource_group)
for vm in vms:
print(f"VM: {vm.name}, Location: {vm.location}")
# List all resource groups
resource_groups = resource_client.resource_groups.list()
for rg in resource_groups:
print(f"RG: {rg.name}")
# ===== DELETE AZURE VM =====
async_vm_delete = compute_client.virtual_machines.begin_delete(
resource_group,
vm_name
)
async_vm_delete.wait() # wait for deletion
print(f"VM deleted: {vm_name}")
# ===== KUBERNETES CLIENT =====
from kubernetes import client, config
# Load kubeconfig (from ~/.kube/config or KUBECONFIG env var)
config.load_kube_config()
# Or load from a specific file
config.load_kube_config(config_file="/path/to/kubeconfig")
# Create API client
v1 = client.CoreV1Api()
apps_v1 = client.AppsV1Api()
# ===== KUBERNETES: LIST PODS =====
# List all pods in a namespace
namespace = "default"
pods = v1.list_namespaced_pod(namespace)
for pod in pods.items:
print(f"Pod: {pod.metadata.name}, Status: {pod.status.phase}")
# ===== KUBERNETES: CREATE DEPLOYMENT =====
deployment_manifest = {
"apiVersion": "apps/v1",
"kind": "Deployment",
"metadata": {"name": "myapp", "namespace": "default"},
"spec": {
"replicas": 3,
"selector": {"matchLabels": {"app": "myapp"}},
"template": {
"metadata": {"labels": {"app": "myapp"}},
"spec": {
"containers": [{
"name": "myapp",
"image": "nginx:latest",
"ports": [{"containerPort": 80}]
}]
}
}
}
}
# Create deployment using apps API
apps_v1.create_namespaced_deployment(
namespace,
deployment_manifest
)
print("Deployment created")
# ===== KUBERNETES: WATCH FOR CHANGES =====
from kubernetes import watch
# Watch all pods in namespace and print changes
w = watch.Watch()
for event in w.stream(v1.list_namespaced_pod, namespace):
event_type = event["type"]
pod = event["object"]
print(f"{event_type}: {pod.metadata.name} in {pod.status.phase}")
# ===== KUBERNETES: EXEC INTO POD =====
from kubernetes.stream import stream
# Execute command in pod
exec_command = [
"/bin/sh", "-c", "ls -la /app"
]
resp = stream(
v1.connect_get_namespaced_pod_exec,
"myapp-abcde",
namespace,
command=exec_command,
stderr=True,
stdin=False,
stdout=True,
tty=False
)
print(f"Exec output:\n{resp}")
# ===== REAL-WORLD EXAMPLE: SCALE DEPLOYMENT =====
def scale_kubernetes_deployment(name, namespace, replicas):
"""Scale a deployment to specified replicas."""
apps_v1 = client.AppsV1Api()
# Get current deployment
deployment = apps_v1.read_namespaced_deployment(name, namespace)
# Update replica count
deployment.spec.replicas = replicas
# Patch the deployment
apps_v1.patch_namespaced_deployment(name, namespace, deployment)
print(f"Scaled {name} to {replicas} replicas")
# ===== REAL-WORLD EXAMPLE: WAIT FOR DEPLOYMENT =====
def wait_for_deployment_ready(name, namespace, timeout=300):
"""Wait for deployment to be fully rolled out."""
import time
apps_v1 = client.AppsV1Api()
start = time.time()
while time.time() - start < timeout:
deployment = apps_v1.read_namespaced_deployment(name, namespace)
# Check if all replicas are ready
if (deployment.spec.replicas == deployment.status.ready_replicas):
print(f"Deployment {name} is ready")
return True
time.sleep(5)
raise TimeoutError(f"Deployment {name} did not become ready within {timeout}s")
# ===== ERROR HANDLING =====
from kubernetes.client.rest import ApiException
try:
# Try to get a pod that might not exist
pod = v1.read_namespaced_pod("nonexistent", "default")
except client.rest.ApiException as e:
if e.status == 404:
print("Pod not found")
else:
print(f"API error: {e.status} - {e.reason}")
except Exception as e:
print(f"Error: {e}")
# ===== INSTALL MODULES =====
# Typically done outside Python with pip:
# pip install azure-identity azure-mgmt-compute
# pip install kubernetes
# These are large packages with many dependencies—install only what you need
💼 Example (Real-world Use Case)
A Python script provisions infrastructure: creates an Azure resource group, deploys a Kubernetes cluster in AKS, waits for it to be ready, then deploys a microservices application to it. All using Python SDKs without manually running CLI commands or crafting HTTP requests. The same script works across environments (dev, staging, prod) because it is code, not manual steps.
🧪 Hands-on
- List all resource groups in your Azure subscription using the SDK.
- List all pods in a Kubernetes namespace and print their status.
- Create a Kubernetes Deployment and wait for it to be ready.
- Scale a Deployment using the Kubernetes Python client.
- Handle API exceptions (404, 500, etc.) gracefully.
Write a script that: (1) lists all running pods in your cluster, (2) for each pod, checks its resource requests (CPU, memory), (3) logs pods that exceed thresholds, (4) optionally scales down underutilized deployments. Use the Kubernetes Python client.
🐛 Debugging Scenario
Problem: "403 Forbidden: User does not have permission" when calling Azure SDK.
- Cause: authenticated user/service principal lacks role permissions (RBAC).
- Diagnose: check your account role in Azure Portal > Resource Group > Access Control. Use `az role assignment list` to see granted roles.
- Fix: ask admin to assign appropriate role (Contributor, Owner, or specific role like Virtual Machine Contributor). For service principals, explicitly grant the required scope/role.
🎯 Interview Questions
Beginner
DefaultAzureCredential tries multiple authentication methods in order (env vars, managed identity, CLI creds, browser login). Using it makes scripts work locally (CLI), in CI/CD (service principal), and in Azure services (managed identity) without code changes. It is the recommended auth method for production scripts.
Creating VMs or deployments takes time—Azure/Kubernetes returns immediately and processes in background. The async pattern lets your script proceed without blocking, or wait for completion with .result() or .wait(). Long-running operations return an "operation" object you poll or wait on.
Use try/except for client.rest.ApiException. Check e.status: 404 means not found, 500 means server error. Each status needs different handling—404 might be expected, 500 usually means retry or alert.
Scenario-based
Use apps_v1.create_namespaced_deployment() to deploy. Then call wait_for_deployment() polling status.ready_replicas. If replicas do not match within timeout, raise error. Wrap in try/except for ApiException to handle deployment errors (bad image, resource limits exceeded, etc.).
🌐 Real-world Usage
Every infrastructure-as-code tool (Terraform, Pulumi, CloudFormation) uses SDKs under the hood. CD/CD platforms use SDKs to deploy and manage resources. Monitoring systems query cloud APIs with SDKs. Infrastructure automation is built on these libraries.
📝 Summary
Cloud SDKs abstract common cloud operations. Azure SDK provides clients for each service (compute, storage, etc.). Kubernetes SDK mirrors Kubernetes API resources. Authenticate with DefaultAzureCredential for flexibility. Long-running operations use async patterns (.begin_*, .result()). Handle API exceptions by status code (404 not found, 500 server error). Use SDKs to programmatically automate cloud infrastructure instead of manual CLI commands or HTTP requests.