AdvancedLesson 9 of 16

CI/CD for Machine Learning Models

Automate the path from Git commit to validated model promotion with Azure DevOps stages, Azure ML jobs, deployment smoke tests, approvals, and canary releases.

🧒 Simple Explanation (ELI5)

Normal CI/CD checks whether code builds and tests pass. ML CI/CD also checks whether the model itself is good enough and safe enough to release.

🔧 Why Do We Need It?

🌍 Real-world Analogy

A pharmaceutical production line has mandatory inspection points and signed approvals before product release. ML CI/CD provides that same discipline for model delivery.

⚙️ Technical Explanation

CI for ML usually covers repo quality, schema validation, packaging checks, and pipeline integrity. Training and validation stages then produce a candidate model and compare it to the current baseline. CD promotes only validated registry-backed artifacts into staging and production with smoke tests, approvals, and controlled traffic movement. The important distinction is that a successful deployment command is not enough; the release must also prove the model still behaves acceptably.

🧪
Strong Practical Pipeline

A strong ML pipeline has three evidence boundaries: CI proves the repo is healthy, training and validation prove the candidate is promotable, and deployment checks prove the serving system behaves correctly.

📊 Visual Representation

ML CI/CD Flow
🧾 Git Commit
🧪 CI + Train
✅ Validation Gates
🚀 Staging → Prod

⌨️ Commands / Syntax

yaml
trigger:
- main

stages:
- stage: ci
  jobs:
  - job: lint_and_test
    steps:
    - script: pytest tests/
    - script: python validate_schema.py

- stage: train_and_validate
  dependsOn: ci
  jobs:
  - job: train_model
    steps:
    - script: az ml job create --file train.yml
    - script: python check_metrics.py --min_auc 0.84 --max_latency_ms 200

- stage: deploy_staging
  dependsOn: train_and_validate
  jobs:
  - job: smoke_test
    steps:
    - script: az ml online-deployment create --file deployment.yml
    - script: az ml online-endpoint invoke --name churn-endpoint --request-file smoke-test.json

- stage: deploy_prod
  dependsOn: deploy_staging
  condition: succeeded()
  jobs:
  - deployment: canary_release
    environment: ml-prod
    strategy:
      runOnce:
        deploy:
          steps:
          - script: az ml online-endpoint update --name churn-endpoint --traffic blue=90 green=10

💼 Example (Real-world Use Case)

A churn model release begins when the main branch changes. CI validates the scoring code and schema assumptions. Azure ML trains a candidate model. Validation compares the candidate against the production baseline. Azure DevOps deploys to staging, runs smoke tests, waits for approval, and then sends 10% of production traffic to the new version before full promotion.

🧪 Hands-on

  1. List the pipeline stages your ML system should have from commit to production.
  2. For each stage, define what should fail fast automatically and what should require approval.
  3. Choose one production signal that should trigger rollback.
  4. Identify which parts of your current ML release still happen outside the pipeline.

🎮 Try It Yourself

🎮
Pipeline Design

Design an Azure DevOps pipeline for a fraud model with four stages: CI, training, validation, and deployment. Add one model-quality gate, one latency gate, one smoke test, and one production approval before 100% traffic.

🐛 Debugging Scenario

Problem: the pipeline deploys a model to production even though validation metrics were below threshold.

🎯 Interview Questions

Beginner

Why is CI/CD harder for ML than normal apps?

Because ML releases depend on data quality and model behavior, not just code correctness.

What should CI test in an ML repo?

Code quality, schema checks, unit tests, packaging, and pipeline integrity.

What should CD test for a model release?

Model thresholds, smoke tests, deployment health, and release safety signals.

Why use staging for models?

Staging validates serving behavior before production traffic is exposed.

What is a release gate?

It is a rule or approval that must pass before promotion continues.

Intermediate

Why should deployment not start from raw training output?

Because only validated, registered artifacts should be eligible for release.

What ML-specific checks belong in CI/CD?

Metric thresholds, fairness checks, latency limits, calibration checks, and baseline comparisons.

Why separate training and deployment pipelines?

It keeps production release decisions controlled and prevents accidental promotion of unvalidated runs.

What should happen when a gate is overridden?

There should be explicit justification, audit logging, and ideally a temporary exception process.

What is the biggest CI/CD anti-pattern in MLOps?

Using a normal app pipeline that ignores model-specific quality and risk checks.

Scenario-based

A release passed infrastructure tests but business metrics dropped in canary. What failed?

The release process lacked business-aware promotion checks; infrastructure health alone was treated as sufficient.

A team says approvals slow them down. How do you respond?

Approvals should exist only at high-risk boundaries, where they protect the business from expensive mistakes.

A model release pipeline retrains on every pull request. Why is that risky?

It is expensive, noisy, and usually unnecessary; most PRs should validate pipeline integrity rather than trigger full production training.

What if governance requires signoff but the team wants full automation?

Automate everything to staging and keep human approval only at the final production boundary.

How do you prove your ML CI/CD pipeline is working well?

Show faster safe release cycles, fewer bad deployments, shorter rollback time, and clear audit trails for each promotion.

🌐 Real-world Usage

High-performing ML teams treat model delivery like software delivery plus model assurance. The goal is not slower delivery; it is safer speed.

📝 Summary

ML CI/CD is software delivery plus model-quality control. The right pipeline automates evidence and forces discipline where business risk is high.