Skip to main content
Looking for Kubernetes? This guide covers ECS (Elastic Container Service) deployment. For Kubernetes/EKS deployment, see the EKS + Terraform guide.

Overview

Deploy Runlayer on AWS ECS (Elastic Container Service) with Fargate using Terraform. This provides a serverless container deployment without managing Kubernetes clusters. Choose between minimal configuration (5 required parameters) or enterprise configuration (full customization). Infrastructure Repository: anysource-AI/runlayer-infra
This deployment creates AWS resources that incur costs. Typical costs range from $100-400/month depending on configuration.

Quick Start

Step 1: Get the infrastructure code Choose one of these options:
# Option A: Clone the infrastructure repository directly
git clone https://github.com/anysource-AI/runlayer-infra.git
cd runlayer-infra

# Option B: If you already have the main project
cd infra  # Infrastructure code is included as a subtree

Configuration Options

Minimal Configuration

Required:
environment       = "production"              # Environment name
region            = "us-east-1"              # AWS region
domain_name       = "ai.yourcompany.com"     # Your domain (required for SSL)
account           = 123456789012             # AWS account ID
Smart production-ready defaults include:
  • Database: Aurora PostgreSQL 16.6, 2-16 ACUs, private subnets, 7-day backups
  • Security: Public ALB with internet access, private database/cache
  • SSL: Automatic ACM certificate creation (requires domain ownership; enable enable_acm_dns_validation to have Terraform manage Route53 validation records and the ALIAS record pointing to the ALB)
  • Scaling: 2 backend + 2 frontend containers, auto-scale to 10 max
  • Resources: Backend 512 CPU/1024 MB, Frontend 512 CPU/1024 MB
  • Resources: Backend 2048 CPU/4096 MB, Frontend 512 CPU/1024 MB
  • Network: 3-AZ VPC with /16 CIDR, public/private subnets
  • Monitoring: CloudWatch logs for all services

Enterprise Configuration

All minimal options plus 160+ customizable parameters:
database_name     = "anysource_prod"
database_username = "postgres"        # Database master username
database_config = {
  engine_version      = "16.6"        # PostgreSQL version
  min_capacity       = 4              # Min Aurora capacity (ACUs)
  max_capacity       = 32             # Max Aurora capacity (ACUs)
  publicly_accessible = false         # Keep private (recommended)
  backup_retention   = 30             # Backup retention days
  subnet_type        = "private"      # Use private subnets
}
alb_access_type = "public"            # "public" or "private"
alb_allowed_cidrs = [                 # Security group IP ranges
  "0.0.0.0/0"                        # Internet (change for security)
  # "203.0.113.0/24",                # Your office IPs
  # "198.51.100.0/24"                # Your VPN IPs
]

# WAF IP Allowlisting (recommended for production)
waf_enable_ip_allowlisting = true     # Enable WAF-based IP filtering
waf_allowlist_ipv4_cidrs = [          # Allowed IPv4 CIDR blocks
  "203.0.113.0/24",                   # Office network
  "198.51.100.42/32",                 # Specific IP address
]

# SSL Certificate options
ssl_certificate_arn = "arn:aws:acm:..." # Use existing cert

# OR let Terraform create and validate the certificate automatically (and point DNS to the ALB):
enable_acm_dns_validation = true           # Creates Route53 validation + ALIAS records
hosted_zone_name = "prod.yourcompany.com"  # Hosted zone in this AWS account (required when enabled)

# Enable dual ALB for split-horizon DNS
# Public ALB for internet traffic + Internal ALB for private network traffic
enable_dual_alb = true

# Optional: Use an existing private hosted zone
private_hosted_zone_id = "Z1234567890ABC"

# Optional: Associate with a specific VPC (defaults to service VPC)
private_hosted_zone_vpc_id = "vpc-1234567890abcdef0"

# Optional: Associate with additional VPCs (for VPC peering scenarios)
private_hosted_zone_additional_vpc_ids = [
  "vpc-0987654321fedcba0",  # Peered VPC 1
  "vpc-1111222233334444",   # Peered VPC 2
]
Use Case:
  • Public access required (e.g., ChatGPT, external integrations)
  • Private network access for internal services (stays within VPC/peering)
  • Single domain name (runlayer.example.com) resolves differently based on network context
How it works:
  • Public DNS → Public ALB (internet traffic)
  • Private DNS → Internal ALB (VPC/peered traffic)
  • ECS services register with both ALBs
  • WAF applies only to public ALB
Security Note: VPC peering connections require peer_owner_id to be specified for all connections. This validation ensures you only accept connections from known and trusted AWS accounts. Connections are automatically accepted after validation.
Enable VPC peering to allow traffic between the Runlayer VPC and other VPCs (for example, a customer’s existing VPC or internal services VPC).
# Accept and configure VPC peering connections
vpc_peering_connections = {
  "customer-vpc" = {
    peering_connection_id = "pcx-0abc123def456"  # VPC peering connection ID from the initiating side
    peer_vpc_cidr         = "172.16.0.0/16"      # CIDR block of the peer VPC
    peer_owner_id         = "123456789012"        # AWS account ID of the peer (REQUIRED)
    peer_region           = "us-east-1"           # AWS region of the peer (optional)
  }
}

# Optional: For dual ALB setup, associate private hosted zone with peered VPC
enable_dual_alb = true
private_hosted_zone_additional_vpc_ids = ["vpc-customer123"]
Security Features:
  • Required peer validation - All connections must specify peer_owner_id
  • Automatic acceptance after validation - Validation is the security control
  • Automatic routing - Routes configured only for validated connections
  • Security group integration - Backend allows traffic from validated peer CIDRs
Prerequisites:
  1. The peer VPC must initiate the peering connection first
  2. You need the peering connection ID (pcx-xxxxx)
  3. You need the peer VPC’s CIDR block
  4. You MUST provide the peer AWS account ID for security validation
Use Cases:
  • Connecting to a customer’s existing VPC for internal API access
  • Multi-VPC architectures with centralized services
  • Hybrid cloud setups with on-premises connectivity
VPC peering only works when the module creates the VPC (not with existing_vpc_id).
services_configurations = {
  "backend" = {
    desired_count     = 3             # Number of instances
    min_capacity      = 2             # Min for auto-scaling
    max_capacity      = 10            # Max for auto-scaling
    cpu              = 2048           # CPU units (1024 = 1 vCPU)
    memory           = 4096           # Memory in MB

    # Auto-scaling thresholds
    cpu_auto_scalling_target_value    = 70    # Scale at 70% CPU
    memory_auto_scalling_target_value = 80    # Scale at 80% memory
  }

  "frontend" = {
    desired_count = 2
    cpu          = 512
    memory       = 1024
  }
}
cidr = "10.0.0.0/16" # VPC CIDR block
region_az = ["us-east-1a", "us-east-1b", "us-east-1c"]
public_subnets = ["10.0.1.0/24", "10.0.2.0/24", "10.0.3.0/24"]
private_subnets = ["10.0.4.0/24", "10.0.5.0/24", "10.0.6.0/24"]
database_subnets = ["10.0.7.0/24", "10.0.8.0/24", "10.0.9.0/24"]
# Enable CloudWatch monitoring and alerting
enable_monitoring = true              # Create CloudWatch alarms
slack_channel_id = "C1234567890"      # Your #alerts channel ID (enables Slack alerts)
slack_team_id = "T1234567890"         # Your Slack workspace ID
Automatic CloudWatch alarms created for:
  • ECS Services: CPU utilization (80%), Memory utilization (85%)
  • RDS Database: CPU utilization (80%), Connection count (50), Freeable Memory, Disk Queue Depth, Read/Write IOPS, Free Storage Space
  • Redis Cache: CPU utilization (80%), Memory utilization (85%)
  • Load Balancer: Response time (5s), Unhealthy targets, 5XX error count
  • VPC: Flow logs for network traffic analysis and security monitoring
Enhanced monitoring capabilities include:
  • Configurable alarm thresholds for all RDS metrics
  • 365-day log retention for ECS services and prestart containers
  • VPC Flow Logs with customizable traffic type monitoring
  • ALB 5XX error tracking with configurable thresholds
CloudWatch monitoring - comprehensive monitoring with configurable alarms for all infrastructure components.Advanced RDS monitoring configuration:
rds_alarm_config = {
  FreeableMemory = {
    period    = 300
    threshold = 268435456 # 256MB
    unit      = "Bytes"
  }
  DiskQueueDepth = {
    period    = 300
    threshold = 5
    unit      = "Count"
  }
  WriteIOPS = {
    period    = 300
    threshold = 1000
    unit      = "Count"
  }
  ReadIOPS = {
    period    = 300
    threshold = 1000
    unit      = "Count"
  }
  Storage = {
    period    = 300
    threshold = 107374182400 # 100GB
    unit      = "Bytes"
  }
}

# ALB 5XX error monitoring
alb_5xx_alarm_period    = 300  # 5 minute period
alb_5xx_alarm_threshold = 1    # Alert on any 5XX error
# Environment Variables

env_vars = {
ENVIRONMENT = "production"
COMPANY = "YourCompany"
}

Prerequisites

1

Install Tools

# AWS CLI
brew install awscli
aws configure

# Terraform

brew install terraform
terraform version

2

AWS Requirements

  • AWS account with sufficient permissions
  • Domain name you control (required for SSL certificate creation)
  • Adequate service quotas (VPC, RDS, ECS)
The domain name is required because Runlayer creates an SSL certificate automatically. You must be able to validate domain ownership through DNS.
3

SSL Certificate (Enterprise Only)

# Optional: Request certificate in ACM
aws acm request-certificate \
  --domain-name "*.yourcompany.com" \
  --validation-method DNS \
  --region us-east-1

# Get Route53 hosted zone ID
aws route53 list-hosted-zones-by-name \
  --dns-name yourcompany.com

Deployment

1. Get Infrastructure Code

# Clone the infrastructure repository
git clone https://github.com/anysource-AI/runlayer-infra.git
cd runlayer-infra

# OR use the subtree in your main project
cd infra

2. Configure

# Minimal configuration (recommended)
cp minimal.tfvars.example production.tfvars

# OR Enterprise configuration
cp enterprise.tfvars.example production.tfvars

3. Deploy

terraform init
terraform plan -var-file="production.tfvars"
terraform apply -var-file="production.tfvars"
Deployment takes 15-20 minutes. SSL certificate validation may add 5-10 minutes.

4. Automated Database Setup

Database initialization is completely automated: What happens automatically:
  • Database Connection: Prestart container waits for Aurora PostgreSQL to be ready
  • Schema Migration: Runs alembic upgrade head to apply latest database schema
  • Logging: All setup activity logged to CloudWatch under prestart-logs-[environment]
  • Error Handling: Backend won’t start if database setup fails
Secrets are automatically generated:
  • Database password and secret keys are created securely

5. Update Application Secrets (Optional)

# Secrets are automatically generated during deployment!
# Database password and secret keys are created securely.

# If you need to update any secrets after deployment:
aws secretsmanager update-secret \
  --secret-id "anysource-production-app-secrets-PROD2024" \
  --secret-string '{
    "CUSTOM_API_KEY": "your-custom-value"
  }'

6. Verify Deployment

# Get application URL
terraform output alb_dns_name

# Test health endpoint
curl https://your-domain.com/api/v1/utils/health-check/

Architecture

Cost Estimation

ComponentMinimalEnterprise
ECS Services$50-80$150-300
Aurora Database$30-60$100-400
ElastiCache$15-30$50-150
Load Balancer$20-25$20-25
Monitoring$5-10$10-25
Other (NAT, Storage)$10-20$30-50
Monthly Total$130-225$360-950
Costs vary by region and usage. Use AWS Pricing Calculator for precise estimates.

Common Use Cases

Private Enterprise Deployment

alb_access_type = "private"
alb_allowed_cidrs = ["10.0.0.0/8"]  # Corporate network only

# Additional WAF protection for corporate network
waf_enable_ip_allowlisting = true
waf_allowlist_ipv4_cidrs = [
  "10.100.0.0/16",    # Corporate HQ
  "10.200.0.0/16",    # Regional offices
]

database_config = {
  publicly_accessible = false
  backup_retention = 30
}

High Availability Production

database_config = {
  min_capacity = 8
  max_capacity = 64
}
services_configurations = {
  "backend" = {
    desired_count = 4
    max_capacity = 20
  }
}

Development Environment

environment = "development"
database_config = {
  min_capacity = 2
  max_capacity = 4
  backup_retention = 1
}
services_configurations = {
  "backend" = { desired_count = 1 }
  "frontend" = { desired_count = 1 }
}
# Disable monitoring for development
enable_monitoring = false

Production with Monitoring

environment = "production"
enable_monitoring = true
slack_channel_id = "C1234567890"
slack_team_id = "T1234567890"

database_config = {
  min_capacity = 4
  max_capacity = 32
  backup_retention = 30
}
services_configurations = {
  "backend" = {
    desired_count = 3
    max_capacity = 15
  }
}

Troubleshooting

Solution: Ensure DNS validation records are created in Route53:
# Check certificate status
aws acm describe-certificate --certificate-arn your-cert-arn

# Verify DNS records
dig _acme-challenge.your-domain.com TXT
Solution: Check security groups and target health:
# Check target group health
aws elbv2 describe-target-health --target-group-arn your-tg-arn

# Check ECS service status
aws ecs describe-services --cluster anysource-production
Solution: Optimize resource sizing:
  • Use environment = "development" for testing
  • Reduce database min_capacity and max_capacity
  • Lower service desired_count and CPU/memory
  • Set shorter backup_retention periods
Solution: Check CloudWatch alarm configuration:
# Check CloudWatch alarms exist
aws cloudwatch describe-alarms --state-value ALARM

# Check alarm actions and thresholds
aws cloudwatch describe-alarms --alarm-names "your-alarm-name"
Common issues:
  • CloudWatch alarm thresholds too high
  • Alarm actions disabled
  • Missing CloudWatch permissions

ECS vs EKS: Which to Choose?

FactorECS (This Guide)EKS
ComplexityLower - Simpler to manageHigher - Kubernetes expertise needed
Cost$130-225/month$200-800/month
ScalingAuto-scaling with FargateMore granular control
EcosystemAWS-specificKubernetes ecosystem
Best ForSimpler deployments, AWS-nativeComplex workloads, multi-cloud

Next Steps