Deployment Strategies
A deployment strategy defines how you replace the old version of your application with the new one. The right strategy balances speed, risk, cost, and complexity — the goal is to ship changes safely with minimal (or zero) downtime.
SLOs, error budgets, and release risk
Section titled “SLOs, error budgets, and release risk”Deployments consume reliability: every change can break things. If you use SLOs and error budgets, coordinate release cadence with budget health—canaries and blue/green reduce blast radius when budget is tight; big-bang deploys during an outage window are harder to justify when the budget is already burned. Policy is a team/org choice; the important part is explicit tradeoffs between velocity and safety.
Strategy Overview
Section titled “Strategy Overview”| Strategy | Downtime | Risk | Complexity | Infrastructure Cost | Rollback Speed |
|---|---|---|---|---|---|
| Recreate | Yes | High | Low | 1x | Slow (redeploy) |
| Rolling Update | No | Medium | Low | 1x–1.25x | Medium |
| Blue/Green | No | Low | Medium | 2x | Fast (switch) |
| Canary | No | Low | High | 1x + small | Fast (route back) |
| Feature Flags | No | Low | Medium | 1x | Instant (toggle) |
Recreate
Section titled “Recreate”Stop the old version, then start the new version. Simplest strategy but causes downtime.
Before: v1 v1 v1 (running)
Deploy: --- --- --- (all stopped)
After: v2 v2 v2 (started)| Pros | Cons |
|---|---|
| Simplest to implement | Downtime during deployment |
| No version conflicts | Users see errors during the gap |
| Clean state | No gradual validation |
When to use: Development/testing environments, batch processing jobs, or applications where brief downtime is acceptable (scheduled maintenance windows).
Kubernetes
Section titled “Kubernetes”apiVersion: apps/v1kind: Deploymentspec: strategy: type: Recreate # Kill all old pods, then create new onesRolling Update
Section titled “Rolling Update”Gradually replace old instances with new ones, a few at a time. No downtime — at least some instances are always running.
Time 0: v1 v1 v1 v1Time 1: v2 v1 v1 v1 (1 replaced)Time 2: v2 v2 v1 v1 (2 replaced)Time 3: v2 v2 v2 v1 (3 replaced)Time 4: v2 v2 v2 v2 (done)| Pros | Cons |
|---|---|
| Zero downtime | Both versions run simultaneously (compatibility required) |
| Built into Kubernetes, ECS, etc. | Slower rollout |
| Gradual validation | Rollback requires another rolling update |
When to use: Default strategy for most production workloads. Works well when v1 and v2 are compatible (API, database schema).
Kubernetes
Section titled “Kubernetes”apiVersion: apps/v1kind: Deploymentspec: replicas: 4 strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # At most 1 extra pod during update maxUnavailable: 0 # Always keep all replicas availableAWS CodeDeploy
Section titled “AWS CodeDeploy”In-place rolling deployment for EC2 instances:
version: 0.0os: linuxhooks: BeforeInstall: - location: scripts/stop.sh AfterInstall: - location: scripts/start.sh ValidateService: - location: scripts/healthcheck.shBlue/Green Deployment
Section titled “Blue/Green Deployment”Maintain two identical environments (blue = current, green = new). Deploy to green, test it, then switch traffic. If something goes wrong, switch back instantly.
Load Balancer │ ┌───────┴───────┐ ▼ ▼ Blue (v1) Green (v2) (live traffic) (idle, being tested)
After switch: ▼ ▼ Blue (v1) Green (v2) (idle) (live traffic)| Pros | Cons |
|---|---|
| Instant rollback (switch back) | 2x infrastructure cost (both environments running) |
| Full testing of new version before traffic | Database migrations need careful handling |
| Zero downtime | More infrastructure to manage |
When to use: Critical applications where instant rollback is required and infrastructure cost is acceptable.
Implementation Patterns
Section titled “Implementation Patterns”| Platform | How |
|---|---|
| Kubernetes | Two Deployments + Service selector swap |
| AWS | Two Auto Scaling Groups + ALB target group swap, or Elastic Beanstalk swap |
| Azure | App Service deployment slots (swap slot) |
| DNS | Switch DNS record (slower due to TTL, not recommended) |
Kubernetes Example
Section titled “Kubernetes Example”# Deploy v2 as "green"kubectl apply -f deployment-green.yaml
# Test greencurl https://green.internal.myapp.com/health
# Switch traffic: update Service selectorkubectl patch service myapp -p '{"spec":{"selector":{"version":"v2"}}}'
# Rollback: switch selector back to v1kubectl patch service myapp -p '{"spec":{"selector":{"version":"v1"}}}'Azure Deployment Slots
Section titled “Azure Deployment Slots”# Deploy v2 to staging slotaz webapp deployment source config-zip \ --resource-group myapp-rg \ --name myapp \ --slot staging \ --src app.zip
# Test the staging slotcurl https://myapp-staging.azurewebsites.net/health
# Swap staging → production (instant, zero downtime)az webapp deployment slot swap \ --resource-group myapp-rg \ --name myapp \ --slot staging \ --target-slot production
# Rollback: swap againaz webapp deployment slot swap \ --resource-group myapp-rg \ --name myapp \ --slot production \ --target-slot stagingCanary Deployment
Section titled “Canary Deployment”Route a small percentage of traffic to the new version. Monitor for errors. If healthy, gradually increase traffic until 100%.
Time 0: 100% ──► v1
Time 1: 95% ──► v1 5% ──► v2 (canary — watch error rate, latency)
Time 2: 70% ──► v1 30% ──► v2 (looking good)
Time 3: 100% ──► v2 (fully rolled out)| Pros | Cons |
|---|---|
| Lowest risk — real traffic validates the change | Complex (needs traffic splitting, monitoring) |
| Issues caught early with small blast radius | Requires good observability |
| Gradual confidence building | Both versions must be compatible |
When to use: High-traffic production services where even a small error rate has significant impact. Requires mature monitoring and traffic management.
Implementation Patterns
Section titled “Implementation Patterns”| Platform | How |
|---|---|
| Kubernetes + Istio | VirtualService weight-based routing |
| Kubernetes + Argo Rollouts | CRD with automated canary analysis |
| AWS CodeDeploy | canary deployment configuration |
| AWS ALB | Weighted target groups |
| Nginx | split_clients or weighted upstreams |
Kubernetes with Istio
Section titled “Kubernetes with Istio”For mesh architecture, istioctl, and common 503 / mTLS issues, see Istio.
apiVersion: networking.istio.io/v1beta1kind: VirtualServicespec: hosts: [myapp] http: - route: - destination: host: myapp subset: stable # v1 weight: 95 - destination: host: myapp subset: canary # v2 weight: 5AWS CodeDeploy Canary
Section titled “AWS CodeDeploy Canary”{ "deploymentConfigName": "CodeDeployDefault.LambdaCanary10Percent5Minutes", "computePlatform": "Lambda"}This routes 10% of traffic to the new version, waits 5 minutes, then shifts 100%.
Feature Flags
Section titled “Feature Flags”Decouple deployment from release. Deploy code with a feature hidden behind a flag, then enable it gradually — without redeploying.
# Feature flag check in application codeif feature_flags.is_enabled("new-checkout-flow", user=current_user): return new_checkout()else: return old_checkout()Deploy v2 (flag off): All users see old behaviorEnable flag for 5%: 5% see new behaviorEnable flag for 50%: 50% see new behaviorEnable flag for 100%: Everyone sees new behaviorDisable flag: Instant rollback (no redeploy)| Pros | Cons |
|---|---|
| Instant enable/disable (no redeploy) | Flag logic adds code complexity |
| Granular targeting (% of users, user segments, regions) | Stale flags accumulate (“flag debt”) |
| A/B testing built in | Requires a feature flag service |
Feature Flag Services
Section titled “Feature Flag Services”| Service | Type |
|---|---|
| LaunchDarkly | SaaS (most popular) |
| Unleash | Open source |
| Flagsmith | Open source + SaaS |
| Split.io | SaaS |
| AWS AppConfig | AWS-native |
| Azure App Configuration | Azure-native |
| Flipt | Open source, self-hosted |
Feature Flags vs Canary
Section titled “Feature Flags vs Canary”| Feature Flags | Canary | |
|---|---|---|
| Granularity | Code path (specific feature) | Entire application version |
| Targeting | User segments, %, region | Traffic % (all-or-nothing per request) |
| Rollback | Toggle flag off (instant) | Route traffic back (fast) |
| Complexity | In application code | In infrastructure / routing |
| Best for | New features, experiments | New versions, infrastructure changes |
A/B Testing vs Canary
Section titled “A/B Testing vs Canary”Both serve a subset of users, but the goal is different:
| A/B Testing | Canary | |
|---|---|---|
| Goal | Measure which version performs better (metrics) | Validate the new version is safe (errors) |
| Duration | Days to weeks (statistical significance) | Minutes to hours |
| Variants | 2+ (A, B, C…) | 2 (old, new) |
| Decision | Which variant wins? | Is the new version healthy? |
| Tool | Feature flag service with analytics | Deployment tooling (Istio, Argo Rollouts, CodeDeploy) |
Rollback Strategies
Section titled “Rollback Strategies”When a deployment goes wrong, you need to get back to the previous working version:
| Strategy | Speed | How |
|---|---|---|
| Blue/Green swap | Seconds | Switch traffic back to the old environment |
| Feature flag toggle | Seconds | Disable the flag |
| Canary abort | Seconds | Route 100% traffic back to stable |
| Kubernetes rollback | Seconds–minutes | kubectl rollout undo deployment/myapp |
| Revert and redeploy | Minutes | Git revert + push + full pipeline run |
| Restore from backup | Minutes–hours | Database restore + redeploy (last resort) |
Kubernetes Rollback
Section titled “Kubernetes Rollback”# View rollout historykubectl rollout history deployment/myapp
# Undo to the previous revisionkubectl rollout undo deployment/myapp
# Undo to a specific revisionkubectl rollout undo deployment/myapp --to-revision=3Best Practices for Rollback
Section titled “Best Practices for Rollback”- Always keep the previous version deployable — don’t delete old artifacts or images.
- Database migrations must be backward compatible — v1 and v2 should work with the same schema during transition.
- Automate rollback triggers — if error rate exceeds threshold, roll back automatically.
- Test rollback procedures — don’t discover your rollback is broken during an incident.
Choosing a Strategy
Section titled “Choosing a Strategy”| Scenario | Recommended Strategy |
|---|---|
| Dev/test environment | Recreate (simple, fast) |
| Standard production deploy | Rolling update (zero downtime, built-in) |
| Critical service, need instant rollback | Blue/green |
| High-traffic service, need gradual validation | Canary |
| New feature, want granular control | Feature flags |
| Regulatory requirement for approval | Blue/green or canary with manual gate |
| Database-heavy migration | Blue/green (test fully before switching) |
Platform References
Section titled “Platform References”For platform-specific deployment configuration, see:
- Kubernetes: Rolling updates and rollback — Production Patterns
- AWS: CodeDeploy strategies (in-place, blue/green, canary) — CI/CD on AWS
- Azure: Deployment slots and swap — DevOps on Azure
Key Takeaways
Section titled “Key Takeaways”- Recreate is simplest but causes downtime — use for dev/test only.
- Rolling update is the default for most production workloads — zero downtime, gradual replacement.
- Blue/green gives instant rollback by maintaining two environments — higher cost but lower risk.
- Canary routes a small % of traffic to the new version first — lowest risk, highest complexity.
- Feature flags decouple deploy from release — toggle features without redeploying.
- Every deployment strategy needs a rollback plan — test it before you need it.
- Database migrations must be backward compatible when running two versions simultaneously.