Hands-onPractical Design

Real-world Architectures

Design & deploy complete architectures for real scenarios: e-commerce platform, SaaS applications, data analytics, and enterprise systems.

Lab 1: E-commerce Platform (99.95% SLA)

Requirements

Global users (US, EU, Asia)
Millions of transactions daily
99.95% availability SLA
Latency <100ms for 95% of users
Real-time analytics on transactions
Budget: Enterprise (cost not primary driver)

Proposed Architecture


        GLOBAL LAYER

        └── Azure Front Door (geo-routing, DDoS protection)


        REGION 1 (East US - Primary)

        ├── Public IP + Azure Firewall

        ├── App Service (3 instances, auto-scale)

        ├── App Insights (monitoring)

        ├── SQL DB (geo-replicated to EU)

        ├── Redis Cache (session store)

        ├── Service Bus (async messaging)

        └── Cosmos DB (global replication)


        REGION 2 (West EU - HA)

        ├── App Service (active, handles EU traffic)

        ├── SQL DB (geo-replica)

        └── Redis Cache (replica)


        DATA LAYER

        ├── SQL DB: Orders, products, inventory

        ├── Cosmos DB: Carts (eventual consistency)

        ├── Event Hub: Transaction stream

        └── Data Lake: Analytics

Key Decisions

Front Door: Global load balancer, auto-routes based on latency, DDoS edge protection
App Service + zones: Spreads apps across zones (99.95% SLA)
SQL DB + geo-replication: Consistency for orders (strong), replicate to EU for read-local
Cosmos DB + multi-region: Shopping carts (eventual consistency acceptable), sub-millisecond reads
Auto-scale: CPU-based scaling for busy periods (Black Friday)
Cache (Redis): Session store, prevent DB overload
Service Bus: Decouple checkout from inventory (async), prevent timeouts

Failure Scenarios

App instance fails (1 of 3): LB routes to healthy instances, no downtime
Zone fails (entire zone): Across 3 zones, max 1/3 capacity lost. Auto-scale adds instances
Primary region fails (entire East US): Front Door routes all traffic to EU. geo-replica takes over. Data lag <5 seconds
Cosmos DB partition fails: Multi-region partition, fallback to replica

Cost Estimate

App Service (auto-scale 3-10): ~$2k/month
SQL DB (Premium): ~$1.5k/month
Cosmos DB (400 RU): ~$800/month
Front Door: ~$500/month
Data egress (inter-region): ~$600/month
Total: ~$5.4k/month

Lab 2: SaaS Analytics Platform

Requirements

Multi-tenant SaaS (1000s of customers)
Real-time dashboards & reports
Data isolation (customer A can't see B's data)
99.9% SLA
Cost-sensitive (small/medium customers premium-sensitive)

Proposed Architecture


        IDENTITY LAYER

        └── Azure AD B2C (customer login, single sign-on)


        API GATEWAY

        └── API Management (rate limiting, auth, billing)


        APPLICATION TIER

        └── Container Instances (auto-scale, 5-50 replicas)


        DATABASE TIER (Per-Tenant Isolation)

        ├── Option 1: Separate DB per customer (max isolation)

        ├── Option 2: Shared DB + row-level security (cost savings)

        └── Choose mix: Enterprise = separate, SMB = shared


        REAL-TIME ANALYTICS

        ├── Stream Analytics (ingests logs 24/7)

        ├── Event Hub (1000s events/sec)

        ├── Power BI Embedded (personalized dashboards)

        └── Data Lake (query history)


        COST OPTIMIZATION

        ├── Spot containers for non-critical tasks

        ├── Auto-scale down at night (off-peak)

        └── Reserved capacity for baseline

Key Decisions

Multi-tenant isolation: Separate DBs for enterprise (high isolation/cost), shared for SMB
Row-level security (RLS): If shared DB, enforce RLS at database level (customer sees only their rows)
Container instances: Cheap, stateless, scales fast. Good for SaaS
API Management: Rate limiting per tenant (premium = higher limit), billing integration
Power BI Embedded: Personalized dashboards per customer (branding, custom metrics)
Spot containers: Run analytics/batch jobs on spot (70% cheaper), resilient to eviction

Scaling Pattern

Customer onboarding: New customer → provisioned in shared DB (or new isolated if enterprise) → dashboard created in Power BI
Load growth: As customer's volume grows, can migrate to isolated DB (zero-downtime with replication)

Lab 3: Enterprise Data Lake (Batch + Real-time)

Requirements

Ingest: Batch (daily uploads) + Real-time (logs, IoT, APIs)
Governance: Multi-team access, PII masking, audit trails
Analytics: SQL queries + Spark notebooks

Proposed Architecture


        INGESTION

        ├── Batch: Data Factory (daily ETL jobs)

        ├── Real-time: Event Hub → Stream Analytics → ADLS

        └── API: Logic Apps (webhook triggers)


        DATA LAKE (Raw → Processed → Curated)

        ├── Raw layer: ADLS container (immutable, audit logged)

        ├── Processed layer: Delta tables (versioned, ACID)

        └── Curated layer: Optimized for analytics


        GOVERNANCE

        ├── Purview (data catalog, lineage, PII detection)

        ├── Synapse access control (who sees what)

        └── Audit logging (track data access)


        ANALYTICS

        ├── Synapse SQL (distributed SQL queries)

        ├── Spark (ML, complex ETL)

        └── Power BI (dashboards)

Key Decisions

Data Factory: Schedules daily ETL pipelines, handles partial failures with retry
Delta Lake: ACID transactions on data lake (prevents corruption, enables time travel)
Purview: Automatic PII detection, masking rules per team (Finance sees customer IDs, Marketing sees masked)
Synapse: Distributed SQL (query TBs fast), + Spark for ML

Best Practices Summary

Design Patterns Found in Real Architectures

Layering: Public → Apps → Data (separation of concerns)
Async patterns: Use Service Bus/Event Hub to decouple components (prevent timeouts)
Caching layer: Redis for hot data (massive throughput increase)
Multi-tenancy consideration: Decide: isolated DB per customer vs shared DB + RLS (cost-isolation tradeoff)
Monitoring everywhere: Application Insights + Azure Monitor in every design (observability = debugging faster)
Cost consciousness: Auto-scale, spot VMs, tiered storage (real architects optimize cost)

Summary

E-commerce: Front Door (global), zones for HA, SQL + Cosmos for consistency/performance, Service Bus for async
SaaS: Multi-tenancy (separate or shared + RLS), containers for scaling, Power BI for dashboards
Data Lake: Raw/Processed/Curated layers, Delta for ACID, Purview for governance
Common themes: Async patterns, caching, monitoring, cost optimization, multi-layer security

Your Turn: Design Challenge

Scenario: A logistics company needs to track shipments globally. Drivers upload location + package info every 30 seconds. Dashboard shows routes + delivery ETAs. 500,000 drivers. Must survive region failure.

Design this architecture on paper (30 min). Answer: Pick components for ingestion (hint: 500k devices = need event hub, not API), storage (timeseries DB?), real-time processing, dashboard. Then compare with classmates!