Skip to content

Deployment Strategies

First PublishedLast UpdatedByAtif Alam

A deployment strategy defines how you replace the old version of your application with the new one. The right strategy balances speed, risk, cost, and complexity — the goal is to ship changes safely with minimal (or zero) downtime.

Deployments consume reliability: every change can break things. If you use SLOs and error budgets, coordinate release cadence with budget health—canaries and blue/green reduce blast radius when budget is tight; big-bang deploys during an outage window are harder to justify when the budget is already burned. Policy is a team/org choice; the important part is explicit tradeoffs between velocity and safety.

StrategyDowntimeRiskComplexityInfrastructure CostRollback Speed
RecreateYesHighLow1xSlow (redeploy)
Rolling UpdateNoMediumLow1x–1.25xMedium
Blue/GreenNoLowMedium2xFast (switch)
CanaryNoLowHigh1x + smallFast (route back)
Feature FlagsNoLowMedium1xInstant (toggle)

Stop the old version, then start the new version. Simplest strategy but causes downtime.

Before: v1 v1 v1 (running)
Deploy: --- --- --- (all stopped)
After: v2 v2 v2 (started)
ProsCons
Simplest to implementDowntime during deployment
No version conflictsUsers see errors during the gap
Clean stateNo gradual validation

When to use: Development/testing environments, batch processing jobs, or applications where brief downtime is acceptable (scheduled maintenance windows).

apiVersion: apps/v1
kind: Deployment
spec:
strategy:
type: Recreate # Kill all old pods, then create new ones

Gradually replace old instances with new ones, a few at a time. No downtime — at least some instances are always running.

Time 0: v1 v1 v1 v1
Time 1: v2 v1 v1 v1 (1 replaced)
Time 2: v2 v2 v1 v1 (2 replaced)
Time 3: v2 v2 v2 v1 (3 replaced)
Time 4: v2 v2 v2 v2 (done)
ProsCons
Zero downtimeBoth versions run simultaneously (compatibility required)
Built into Kubernetes, ECS, etc.Slower rollout
Gradual validationRollback requires another rolling update

When to use: Default strategy for most production workloads. Works well when v1 and v2 are compatible (API, database schema).

apiVersion: apps/v1
kind: Deployment
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # At most 1 extra pod during update
maxUnavailable: 0 # Always keep all replicas available

In-place rolling deployment for EC2 instances:

appspec.yml
version: 0.0
os: linux
hooks:
BeforeInstall:
- location: scripts/stop.sh
AfterInstall:
- location: scripts/start.sh
ValidateService:
- location: scripts/healthcheck.sh

Maintain two identical environments (blue = current, green = new). Deploy to green, test it, then switch traffic. If something goes wrong, switch back instantly.

Load Balancer
┌───────┴───────┐
▼ ▼
Blue (v1) Green (v2)
(live traffic) (idle, being tested)
After switch:
▼ ▼
Blue (v1) Green (v2)
(idle) (live traffic)
ProsCons
Instant rollback (switch back)2x infrastructure cost (both environments running)
Full testing of new version before trafficDatabase migrations need careful handling
Zero downtimeMore infrastructure to manage

When to use: Critical applications where instant rollback is required and infrastructure cost is acceptable.

PlatformHow
KubernetesTwo Deployments + Service selector swap
AWSTwo Auto Scaling Groups + ALB target group swap, or Elastic Beanstalk swap
AzureApp Service deployment slots (swap slot)
DNSSwitch DNS record (slower due to TTL, not recommended)
Terminal window
# Deploy v2 as "green"
kubectl apply -f deployment-green.yaml
# Test green
curl https://green.internal.myapp.com/health
# Switch traffic: update Service selector
kubectl patch service myapp -p '{"spec":{"selector":{"version":"v2"}}}'
# Rollback: switch selector back to v1
kubectl patch service myapp -p '{"spec":{"selector":{"version":"v1"}}}'
Terminal window
# Deploy v2 to staging slot
az webapp deployment source config-zip \
--resource-group myapp-rg \
--name myapp \
--slot staging \
--src app.zip
# Test the staging slot
curl https://myapp-staging.azurewebsites.net/health
# Swap staging → production (instant, zero downtime)
az webapp deployment slot swap \
--resource-group myapp-rg \
--name myapp \
--slot staging \
--target-slot production
# Rollback: swap again
az webapp deployment slot swap \
--resource-group myapp-rg \
--name myapp \
--slot production \
--target-slot staging

Route a small percentage of traffic to the new version. Monitor for errors. If healthy, gradually increase traffic until 100%.

Time 0: 100% ──► v1
Time 1: 95% ──► v1
5% ──► v2 (canary — watch error rate, latency)
Time 2: 70% ──► v1
30% ──► v2 (looking good)
Time 3: 100% ──► v2 (fully rolled out)
ProsCons
Lowest risk — real traffic validates the changeComplex (needs traffic splitting, monitoring)
Issues caught early with small blast radiusRequires good observability
Gradual confidence buildingBoth versions must be compatible

When to use: High-traffic production services where even a small error rate has significant impact. Requires mature monitoring and traffic management.

PlatformHow
Kubernetes + IstioVirtualService weight-based routing
Kubernetes + Argo RolloutsCRD with automated canary analysis
AWS CodeDeploycanary deployment configuration
AWS ALBWeighted target groups
Nginxsplit_clients or weighted upstreams

For mesh architecture, istioctl, and common 503 / mTLS issues, see Istio.

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
spec:
hosts: [myapp]
http:
- route:
- destination:
host: myapp
subset: stable # v1
weight: 95
- destination:
host: myapp
subset: canary # v2
weight: 5
{
"deploymentConfigName": "CodeDeployDefault.LambdaCanary10Percent5Minutes",
"computePlatform": "Lambda"
}

This routes 10% of traffic to the new version, waits 5 minutes, then shifts 100%.

Decouple deployment from release. Deploy code with a feature hidden behind a flag, then enable it gradually — without redeploying.

# Feature flag check in application code
if feature_flags.is_enabled("new-checkout-flow", user=current_user):
return new_checkout()
else:
return old_checkout()
Deploy v2 (flag off): All users see old behavior
Enable flag for 5%: 5% see new behavior
Enable flag for 50%: 50% see new behavior
Enable flag for 100%: Everyone sees new behavior
Disable flag: Instant rollback (no redeploy)
ProsCons
Instant enable/disable (no redeploy)Flag logic adds code complexity
Granular targeting (% of users, user segments, regions)Stale flags accumulate (“flag debt”)
A/B testing built inRequires a feature flag service
ServiceType
LaunchDarklySaaS (most popular)
UnleashOpen source
FlagsmithOpen source + SaaS
Split.ioSaaS
AWS AppConfigAWS-native
Azure App ConfigurationAzure-native
FliptOpen source, self-hosted
Feature FlagsCanary
GranularityCode path (specific feature)Entire application version
TargetingUser segments, %, regionTraffic % (all-or-nothing per request)
RollbackToggle flag off (instant)Route traffic back (fast)
ComplexityIn application codeIn infrastructure / routing
Best forNew features, experimentsNew versions, infrastructure changes

Both serve a subset of users, but the goal is different:

A/B TestingCanary
GoalMeasure which version performs better (metrics)Validate the new version is safe (errors)
DurationDays to weeks (statistical significance)Minutes to hours
Variants2+ (A, B, C…)2 (old, new)
DecisionWhich variant wins?Is the new version healthy?
ToolFeature flag service with analyticsDeployment tooling (Istio, Argo Rollouts, CodeDeploy)

When a deployment goes wrong, you need to get back to the previous working version:

StrategySpeedHow
Blue/Green swapSecondsSwitch traffic back to the old environment
Feature flag toggleSecondsDisable the flag
Canary abortSecondsRoute 100% traffic back to stable
Kubernetes rollbackSeconds–minuteskubectl rollout undo deployment/myapp
Revert and redeployMinutesGit revert + push + full pipeline run
Restore from backupMinutes–hoursDatabase restore + redeploy (last resort)
Terminal window
# View rollout history
kubectl rollout history deployment/myapp
# Undo to the previous revision
kubectl rollout undo deployment/myapp
# Undo to a specific revision
kubectl rollout undo deployment/myapp --to-revision=3
  • Always keep the previous version deployable — don’t delete old artifacts or images.
  • Database migrations must be backward compatible — v1 and v2 should work with the same schema during transition.
  • Automate rollback triggers — if error rate exceeds threshold, roll back automatically.
  • Test rollback procedures — don’t discover your rollback is broken during an incident.
ScenarioRecommended Strategy
Dev/test environmentRecreate (simple, fast)
Standard production deployRolling update (zero downtime, built-in)
Critical service, need instant rollbackBlue/green
High-traffic service, need gradual validationCanary
New feature, want granular controlFeature flags
Regulatory requirement for approvalBlue/green or canary with manual gate
Database-heavy migrationBlue/green (test fully before switching)

For platform-specific deployment configuration, see:

  • Recreate is simplest but causes downtime — use for dev/test only.
  • Rolling update is the default for most production workloads — zero downtime, gradual replacement.
  • Blue/green gives instant rollback by maintaining two environments — higher cost but lower risk.
  • Canary routes a small % of traffic to the new version first — lowest risk, highest complexity.
  • Feature flags decouple deploy from release — toggle features without redeploying.
  • Every deployment strategy needs a rollback plan — test it before you need it.
  • Database migrations must be backward compatible when running two versions simultaneously.