Deployment¶
Carbon Connect uses a blue/green deployment strategy on AWS ECS Fargate with automated rollback capabilities.
Deployment Strategy¶
Blue/Green Deployment¶
The API service uses AWS CodeDeploy-managed blue/green deployments:
sequenceDiagram
participant GH as GitHub Actions
participant ECR as ECR Registry
participant ECS as ECS Service
participant ALB as Load Balancer
participant Old as Blue (Current)
participant New as Green (New)
GH->>ECR: Push Docker image
GH->>ECS: Update task definition
ECS->>New: Launch Green tasks
Note over New: Health checks pass
ALB->>New: Route traffic to Green
ALB--xOld: Stop routing to Blue
ECS->>Old: Drain and stop Blue tasks
Note over New: Green is now production - New container image is pushed to ECR
- ECS task definition is updated with the new image
- New (green) tasks are launched alongside existing (blue) tasks
- ALB health checks verify the green tasks are healthy
- Traffic is shifted from blue to green
- Blue tasks are drained and stopped
Docker Images¶
API Image¶
File: Dockerfile
# Multi-stage build
FROM python:3.11-slim as builder
# Install Poetry and dependencies
# ...
FROM python:3.11-slim as runtime
# Copy installed packages
# Run: uvicorn backend.app.main:app --host 0.0.0.0 --port 8000
- Base:
python:3.11-slim - Platform:
linux/arm64(Graviton) - Entrypoint:
uvicorn backend.app.main:app --host 0.0.0.0 --port 8000
Celery Worker Image¶
File: Dockerfile.celery
- Base: Same as API image
- Entrypoint:
celery -A backend.app.worker.celery_app worker -l info
Celery Beat Image¶
- Same image as worker
- Entrypoint override:
celery -A backend.app.worker.celery_app beat -l info
ECS Task Definitions¶
API Task¶
| Setting | Value |
|---|---|
| Family | carbon-connect-{env}-api |
| CPU | 1024 (1 vCPU) |
| Memory | 2048 MB |
| Port | 8000 |
| Health check | GET /api/v1/health |
| Capacity | Fargate (on-demand) |
| Log driver | awslogs |
Worker Task¶
| Setting | Value |
|---|---|
| Family | carbon-connect-{env}-worker |
| CPU | 1024 (1 vCPU) |
| Memory | 2048 MB |
| Port | None (no inbound traffic) |
| Capacity | Fargate Spot (70% savings) |
| Log driver | awslogs |
Beat Task¶
| Setting | Value |
|---|---|
| Family | carbon-connect-{env}-beat |
| CPU | 256 (0.25 vCPU) |
| Memory | 512 MB |
| Port | None |
| Capacity | Fargate (on-demand) |
| Replicas | 1 (exactly one scheduler) |
Health Checks¶
ALB Health Check¶
| Parameter | Value |
|---|---|
| Path | /api/v1/health |
| Protocol | HTTP |
| Port | 8000 |
| Interval | 30 seconds |
| Timeout | 5 seconds |
| Healthy threshold | 3 |
| Unhealthy threshold | 3 |
ECS Health Check¶
Container-level health check configured in the task definition:
{
"command": ["CMD-SHELL", "curl -f http://localhost:8000/api/v1/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
}
Post-Deployment Smoke Tests¶
After deployment, the CI pipeline verifies:
- API health endpoint returns 200
/grants?limit=1endpoint responds/reference/countriesendpoint responds
# Health check with retries
for i in {1..5}; do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" \
https://app.carbonconnect.eu/api/v1/health)
if [ "$HTTP_CODE" == "200" ]; then break; fi
sleep 10
done
Database Migrations¶
Migrations run as one-off ECS tasks before service deployment:
aws ecs run-task \
--cluster carbon-connect-{env} \
--task-definition carbon-connect-{env}-api \
--launch-type FARGATE \
--overrides '{"containerOverrides":[{"name":"api","command":["poetry","run","alembic","upgrade","head"]}]}'
The CI pipeline waits for the migration task to complete and checks its exit code before proceeding with the deployment.
Rollback Procedure¶
Automatic Rollback¶
If the API deployment fails (health checks do not pass within 15 minutes), the production workflow automatically triggers a rollback:
# Get previous task definition
previous_task_def = aws ecs describe-services \
--query 'services[0].deployments[1].taskDefinition'
# Rollback
aws ecs update-service \
--task-definition $previous_task_def \
--force-new-deployment
Manual Rollback¶
# 1. Find the previous task definition revision
aws ecs describe-services \
--cluster carbon-connect-prod \
--services carbon-connect-prod-api \
--query 'services[0].deployments[].taskDefinition'
# 2. Roll back to the previous revision
aws ecs update-service \
--cluster carbon-connect-prod \
--service carbon-connect-prod-api \
--task-definition carbon-connect-prod-api:PREVIOUS_REVISION \
--force-new-deployment
# 3. If database migration needs rollback
aws ecs run-task \
--cluster carbon-connect-prod \
--task-definition carbon-connect-prod-api \
--overrides '{"containerOverrides":[{"name":"api","command":["poetry","run","alembic","downgrade","-1"]}]}'
Pre-Deployment Backup¶
Production deployments automatically create an RDS snapshot before any changes:
aws rds create-db-snapshot \
--db-instance-identifier carbon-connect-prod \
--db-snapshot-identifier carbon-connect-prod-pre-deploy-$(date +%Y%m%d-%H%M%S)
Deployment Flow Summary¶
flowchart TB
A[GitHub Release / Manual Trigger] --> B[Validate Image in ECR]
B --> C[Create RDS Snapshot]
C --> D{Run Migrations?}
D -->|Yes| E[Run Alembic via ECS Task]
D -->|No| F[Skip]
E --> G[Deploy API<br/>Blue/Green]
F --> G
G --> H[Deploy Workers]
H --> I[Deploy Beat]
I --> J[Smoke Tests]
J -->|Pass| K[Success Notification]
J -->|Fail| L[Rollback to Previous]
G -->|Fail| L
L --> M[Failure Notification] Environment Promotion¶
| Stage | Trigger | Approval | Deployment |
|---|---|---|---|
| Development | Manual | None | Direct apply |
| Staging | Push to staging branch | None | Automatic |
| Production | GitHub Release or manual dispatch | Required (production environment) | Blue/green with rollback |