Real-world Architectures
Design & deploy complete architectures for real scenarios: e-commerce platform, SaaS applications, data analytics, and enterprise systems.
Lab 1: E-commerce Platform (99.95% SLA)
Requirements
- Global users (US, EU, Asia)
- Millions of transactions daily
- 99.95% availability SLA
- Latency <100ms for 95% of users
- Real-time analytics on transactions
- Budget: Enterprise (cost not primary driver)
Proposed Architecture
GLOBAL LAYER
└── Azure Front Door (geo-routing, DDoS protection)
REGION 1 (East US - Primary)
├── Public IP + Azure Firewall
├── App Service (3 instances, auto-scale)
├── App Insights (monitoring)
├── SQL DB (geo-replicated to EU)
├── Redis Cache (session store)
├── Service Bus (async messaging)
└── Cosmos DB (global replication)
REGION 2 (West EU - HA)
├── App Service (active, handles EU traffic)
├── SQL DB (geo-replica)
└── Redis Cache (replica)
DATA LAYER
├── SQL DB: Orders, products, inventory
├── Cosmos DB: Carts (eventual consistency)
├── Event Hub: Transaction stream
└── Data Lake: Analytics
Key Decisions
- Front Door: Global load balancer, auto-routes based on latency, DDoS edge protection
- App Service + zones: Spreads apps across zones (99.95% SLA)
- SQL DB + geo-replication: Consistency for orders (strong), replicate to EU for read-local
- Cosmos DB + multi-region: Shopping carts (eventual consistency acceptable), sub-millisecond reads
- Auto-scale: CPU-based scaling for busy periods (Black Friday)
- Cache (Redis): Session store, prevent DB overload
- Service Bus: Decouple checkout from inventory (async), prevent timeouts
Failure Scenarios
- App instance fails (1 of 3): LB routes to healthy instances, no downtime
- Zone fails (entire zone): Across 3 zones, max 1/3 capacity lost. Auto-scale adds instances
- Primary region fails (entire East US): Front Door routes all traffic to EU. geo-replica takes over. Data lag <5 seconds
- Cosmos DB partition fails: Multi-region partition, fallback to replica
Cost Estimate
- App Service (auto-scale 3-10): ~$2k/month
- SQL DB (Premium): ~$1.5k/month
- Cosmos DB (400 RU): ~$800/month
- Front Door: ~$500/month
- Data egress (inter-region): ~$600/month
- Total: ~$5.4k/month
Lab 2: SaaS Analytics Platform
Requirements
- Multi-tenant SaaS (1000s of customers)
- Real-time dashboards & reports
- Data isolation (customer A can't see B's data)
- 99.9% SLA
- Cost-sensitive (small/medium customers premium-sensitive)
Proposed Architecture
IDENTITY LAYER
└── Azure AD B2C (customer login, single sign-on)
API GATEWAY
└── API Management (rate limiting, auth, billing)
APPLICATION TIER
└── Container Instances (auto-scale, 5-50 replicas)
DATABASE TIER (Per-Tenant Isolation)
├── Option 1: Separate DB per customer (max isolation)
├── Option 2: Shared DB + row-level security (cost savings)
└── Choose mix: Enterprise = separate, SMB = shared
REAL-TIME ANALYTICS
├── Stream Analytics (ingests logs 24/7)
├── Event Hub (1000s events/sec)
├── Power BI Embedded (personalized dashboards)
└── Data Lake (query history)
COST OPTIMIZATION
├── Spot containers for non-critical tasks
├── Auto-scale down at night (off-peak)
└── Reserved capacity for baseline
Key Decisions
- Multi-tenant isolation: Separate DBs for enterprise (high isolation/cost), shared for SMB
- Row-level security (RLS): If shared DB, enforce RLS at database level (customer sees only their rows)
- Container instances: Cheap, stateless, scales fast. Good for SaaS
- API Management: Rate limiting per tenant (premium = higher limit), billing integration
- Power BI Embedded: Personalized dashboards per customer (branding, custom metrics)
- Spot containers: Run analytics/batch jobs on spot (70% cheaper), resilient to eviction
Scaling Pattern
- Customer onboarding: New customer → provisioned in shared DB (or new isolated if enterprise) → dashboard created in Power BI
- Load growth: As customer's volume grows, can migrate to isolated DB (zero-downtime with replication)
Lab 3: Enterprise Data Lake (Batch + Real-time)
Requirements
- Ingest: Batch (daily uploads) + Real-time (logs, IoT, APIs)
- Governance: Multi-team access, PII masking, audit trails
- Analytics: SQL queries + Spark notebooks
Proposed Architecture
INGESTION
├── Batch: Data Factory (daily ETL jobs)
├── Real-time: Event Hub → Stream Analytics → ADLS
└── API: Logic Apps (webhook triggers)
DATA LAKE (Raw → Processed → Curated)
├── Raw layer: ADLS container (immutable, audit logged)
├── Processed layer: Delta tables (versioned, ACID)
└── Curated layer: Optimized for analytics
GOVERNANCE
├── Purview (data catalog, lineage, PII detection)
├── Synapse access control (who sees what)
└── Audit logging (track data access)
ANALYTICS
├── Synapse SQL (distributed SQL queries)
├── Spark (ML, complex ETL)
└── Power BI (dashboards)
Key Decisions
- Data Factory: Schedules daily ETL pipelines, handles partial failures with retry
- Delta Lake: ACID transactions on data lake (prevents corruption, enables time travel)
- Purview: Automatic PII detection, masking rules per team (Finance sees customer IDs, Marketing sees masked)
- Synapse: Distributed SQL (query TBs fast), + Spark for ML
Best Practices Summary
Design Patterns Found in Real Architectures
- Layering: Public → Apps → Data (separation of concerns)
- Async patterns: Use Service Bus/Event Hub to decouple components (prevent timeouts)
- Caching layer: Redis for hot data (massive throughput increase)
- Multi-tenancy consideration: Decide: isolated DB per customer vs shared DB + RLS (cost-isolation tradeoff)
- Monitoring everywhere: Application Insights + Azure Monitor in every design (observability = debugging faster)
- Cost consciousness: Auto-scale, spot VMs, tiered storage (real architects optimize cost)
Summary
- E-commerce: Front Door (global), zones for HA, SQL + Cosmos for consistency/performance, Service Bus for async
- SaaS: Multi-tenancy (separate or shared + RLS), containers for scaling, Power BI for dashboards
- Data Lake: Raw/Processed/Curated layers, Delta for ACID, Purview for governance
- Common themes: Async patterns, caching, monitoring, cost optimization, multi-layer security
Your Turn: Design Challenge
Scenario: A logistics company needs to track shipments globally. Drivers upload location + package info every 30 seconds. Dashboard shows routes + delivery ETAs. 500,000 drivers. Must survive region failure.
Design this architecture on paper (30 min). Answer: Pick components for ingestion (hint: 500k devices = need event hub, not API), storage (timeseries DB?), real-time processing, dashboard. Then compare with classmates!