Skip to content

Deployment

Carbon Connect uses a blue/green deployment strategy on AWS ECS Fargate with automated rollback capabilities.


Deployment Strategy

Blue/Green Deployment

The API service uses AWS CodeDeploy-managed blue/green deployments:

sequenceDiagram
    participant GH as GitHub Actions
    participant ECR as ECR Registry
    participant ECS as ECS Service
    participant ALB as Load Balancer
    participant Old as Blue (Current)
    participant New as Green (New)

    GH->>ECR: Push Docker image
    GH->>ECS: Update task definition
    ECS->>New: Launch Green tasks
    Note over New: Health checks pass
    ALB->>New: Route traffic to Green
    ALB--xOld: Stop routing to Blue
    ECS->>Old: Drain and stop Blue tasks
    Note over New: Green is now production
  1. New container image is pushed to ECR
  2. ECS task definition is updated with the new image
  3. New (green) tasks are launched alongside existing (blue) tasks
  4. ALB health checks verify the green tasks are healthy
  5. Traffic is shifted from blue to green
  6. Blue tasks are drained and stopped

Docker Images

API Image

File: Dockerfile

# Multi-stage build
FROM python:3.11-slim as builder
# Install Poetry and dependencies
# ...

FROM python:3.11-slim as runtime
# Copy installed packages
# Run: uvicorn backend.app.main:app --host 0.0.0.0 --port 8000
  • Base: python:3.11-slim
  • Platform: linux/arm64 (Graviton)
  • Entrypoint: uvicorn backend.app.main:app --host 0.0.0.0 --port 8000

Celery Worker Image

File: Dockerfile.celery

  • Base: Same as API image
  • Entrypoint: celery -A backend.app.worker.celery_app worker -l info

Celery Beat Image

  • Same image as worker
  • Entrypoint override: celery -A backend.app.worker.celery_app beat -l info

ECS Task Definitions

API Task

Setting Value
Family carbon-connect-{env}-api
CPU 1024 (1 vCPU)
Memory 2048 MB
Port 8000
Health check GET /api/v1/health
Capacity Fargate (on-demand)
Log driver awslogs

Worker Task

Setting Value
Family carbon-connect-{env}-worker
CPU 1024 (1 vCPU)
Memory 2048 MB
Port None (no inbound traffic)
Capacity Fargate Spot (70% savings)
Log driver awslogs

Beat Task

Setting Value
Family carbon-connect-{env}-beat
CPU 256 (0.25 vCPU)
Memory 512 MB
Port None
Capacity Fargate (on-demand)
Replicas 1 (exactly one scheduler)

Health Checks

ALB Health Check

Parameter Value
Path /api/v1/health
Protocol HTTP
Port 8000
Interval 30 seconds
Timeout 5 seconds
Healthy threshold 3
Unhealthy threshold 3

ECS Health Check

Container-level health check configured in the task definition:

{
  "command": ["CMD-SHELL", "curl -f http://localhost:8000/api/v1/health || exit 1"],
  "interval": 30,
  "timeout": 5,
  "retries": 3,
  "startPeriod": 60
}

Post-Deployment Smoke Tests

After deployment, the CI pipeline verifies:

  1. API health endpoint returns 200
  2. /grants?limit=1 endpoint responds
  3. /reference/countries endpoint responds
# Health check with retries
for i in {1..5}; do
    HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
        https://app.carbonconnect.eu/api/v1/health)
    if [ "$HTTP_CODE" == "200" ]; then break; fi
    sleep 10
done

Database Migrations

Migrations run as one-off ECS tasks before service deployment:

aws ecs run-task \
    --cluster carbon-connect-{env} \
    --task-definition carbon-connect-{env}-api \
    --launch-type FARGATE \
    --overrides '{"containerOverrides":[{"name":"api","command":["poetry","run","alembic","upgrade","head"]}]}'

The CI pipeline waits for the migration task to complete and checks its exit code before proceeding with the deployment.


Rollback Procedure

Automatic Rollback

If the API deployment fails (health checks do not pass within 15 minutes), the production workflow automatically triggers a rollback:

# Get previous task definition
previous_task_def = aws ecs describe-services \
    --query 'services[0].deployments[1].taskDefinition'

# Rollback
aws ecs update-service \
    --task-definition $previous_task_def \
    --force-new-deployment

Manual Rollback

# 1. Find the previous task definition revision
aws ecs describe-services \
    --cluster carbon-connect-prod \
    --services carbon-connect-prod-api \
    --query 'services[0].deployments[].taskDefinition'

# 2. Roll back to the previous revision
aws ecs update-service \
    --cluster carbon-connect-prod \
    --service carbon-connect-prod-api \
    --task-definition carbon-connect-prod-api:PREVIOUS_REVISION \
    --force-new-deployment

# 3. If database migration needs rollback
aws ecs run-task \
    --cluster carbon-connect-prod \
    --task-definition carbon-connect-prod-api \
    --overrides '{"containerOverrides":[{"name":"api","command":["poetry","run","alembic","downgrade","-1"]}]}'

Pre-Deployment Backup

Production deployments automatically create an RDS snapshot before any changes:

aws rds create-db-snapshot \
    --db-instance-identifier carbon-connect-prod \
    --db-snapshot-identifier carbon-connect-prod-pre-deploy-$(date +%Y%m%d-%H%M%S)

Deployment Flow Summary

flowchart TB
    A[GitHub Release / Manual Trigger] --> B[Validate Image in ECR]
    B --> C[Create RDS Snapshot]
    C --> D{Run Migrations?}
    D -->|Yes| E[Run Alembic via ECS Task]
    D -->|No| F[Skip]
    E --> G[Deploy API<br/>Blue/Green]
    F --> G
    G --> H[Deploy Workers]
    H --> I[Deploy Beat]
    I --> J[Smoke Tests]
    J -->|Pass| K[Success Notification]
    J -->|Fail| L[Rollback to Previous]
    G -->|Fail| L
    L --> M[Failure Notification]

Environment Promotion

Stage Trigger Approval Deployment
Development Manual None Direct apply
Staging Push to staging branch None Automatic
Production GitHub Release or manual dispatch Required (production environment) Blue/green with rollback