Overview
Deploy Runlayer on Amazon EKS (Elastic Kubernetes Service) with production-ready infrastructure provisioned by Terraform. This guide covers three deployment scenarios to fit different infrastructure requirements. Terraform Module Location:infra/aws-helm/terraform-eks/
This deployment creates AWS resources that incur costs. Typical costs range
from $200-800/month depending on configuration and usage.
Deployment Scenarios
Choose the scenario that best fits your infrastructure:Full Stack
Create EverythingNew VPC + New EKS cluster + Application infrastructureBest for: New deployments, greenfield projects
Existing VPC
Use Your NetworkExisting VPC + New EKS cluster + Application infrastructureBest for: Integrating with existing network infrastructure
Existing EKS
Minimal InfrastructureExisting VPC + Existing EKS + Application infrastructure onlyBest for: Shared clusters, platform team managed EKS
Prerequisites
1
Install Tools
2
AWS Requirements
- AWS account with administrator access
- Sufficient service quotas (VPC, EKS, RDS, ElastiCache)
- IAM permissions to create resources
3
Prepare Secrets
Generate strong secrets for your deployment:
Scenario 1: Full Stack Deployment
Create a complete production-ready environment with new VPC and EKS cluster.What Gets Created
- VPC: 10.0.0.0/16 with public/private subnets across 3 AZs
- EKS Cluster: Kubernetes 1.33 with managed node groups
- RDS: Aurora PostgreSQL Serverless v2 (2-16 ACUs)
- Redis: ElastiCache Redis cluster
- IAM Roles: IRSA roles for EBS CSI, ALB Controller, CloudWatch, Application
- Security: KMS encryption, security groups, private subnets
- Monitoring: CloudWatch logs and metrics
Step-by-Step Deployment
1. Get the Terraform module:terraform.tfvars:
Architecture
Cost Estimation
| Component | Configuration | Monthly Cost |
|---|---|---|
| EKS Cluster | Control plane | $73 |
| EC2 Nodes | 4x m6i.2xlarge | $400-500 |
| RDS Aurora | 2-16 ACUs | $60-240 |
| ElastiCache | cache.t3.medium | $40-50 |
| NAT Gateway | 3 AZs | $100-120 |
| ALB | Application Load Balancer | $20-25 |
| Data Transfer | Varies by usage | $20-50 |
| CloudWatch | Logs and metrics | $10-30 |
| Total | $723-1,088/month |
Costs vary by region and usage patterns. Use AWS Pricing Calculator for precise estimates.
Scenario 2: Existing VPC Deployment
Deploy EKS cluster into your existing VPC infrastructure.Prerequisites
Your existing VPC must have:-
Private Subnets (required):
- At least 2 private subnets across different AZs
- NAT Gateway for internet access
- Sufficient IP address space for pods
-
Public Subnets (recommended):
- At least 2 public subnets across different AZs
- Internet Gateway attached
-
Subnet Tags (for auto-discovery):
-
VPC Settings:
- DNS hostnames enabled
- DNS resolution enabled
Configuration
Tag Your Subnets
If your subnets aren’t tagged, run this script:What Gets Created
✅ EKS Cluster: New Kubernetes cluster in your VPC ✅ Node Groups: Managed node groups ✅ RDS Database: Aurora PostgreSQL ✅ Redis Cache: ElastiCache Redis ✅ IAM Roles: All IRSA roles ✅ Security Groups: For EKS, RDS, Redis ❌ VPC: Uses your existing VPC ❌ Subnets: Uses your existing subnets ❌ NAT Gateway: Uses your existing NAT GatewayCost Savings
By using existing VPC infrastructure:- Save $100-120/month on NAT Gateway costs (if already provisioned)
- Save $5-10/month on VPC Flow Logs (if already enabled)
- Total Savings: ~$105-130/month
Scenario 3: Existing EKS Cluster
Add application infrastructure (RDS, Redis, IAM roles) to an existing EKS cluster.When to Use This
- ✅ Platform team manages EKS, application teams manage apps
- ✅ Multiple applications share the same EKS cluster
- ✅ EKS cluster managed outside Terraform
- ✅ You only need application infrastructure
Prerequisites
-
Existing EKS Cluster:
- Kubernetes version 1.19+
- OIDC provider enabled
-
Required Add-ons (must be pre-installed):
- VPC CNI
- kube-proxy
- CoreDNS
- EBS CSI Driver with IRSA role
- AWS Load Balancer Controller with IRSA role
-
Cluster Information:
- Cluster name
- OIDC provider ARN
Get OIDC Provider ARN
Configuration
What Gets Created
✅ Application IRSA Role: IAM role for your application pods with permissions for Bedrock, and Secrets Manager ✅ RDS Database: Aurora PostgreSQL Serverless v2 ✅ Redis Cache: ElastiCache Redis ✅ Secrets: AWS Secrets Manager secrets ✅ Security Groups: For RDS and RedisWhat Does NOT Get Created
❌ EKS Cluster: Uses your existing cluster ❌ Node Groups: Uses your existing nodes ❌ Cluster Add-ons: Uses your existing add-ons ❌ System IRSA Roles: EBS CSI, ALB Controller, CloudWatch ❌ KMS Key: Uses your existing cluster encryptionCost Savings
By using existing EKS infrastructure:- Save $73/month on EKS control plane
- Save $400-500/month on EC2 nodes (if shared)
- Save $100-120/month on NAT Gateways (if shared)
- Total Savings: ~$573-693/month
Multi-Application Example
Deploy multiple applications to the same cluster:Using as a Terraform Module
Reference this module from your own Terraform project:Security Best Practices
Production Checklist
- Restrict API Access: Use
cluster_endpoint_public_access_cidrsto limit access - Enable Encryption: Set
enable_cluster_encryption = true - Private Endpoints: Consider
cluster_endpoint_public_access = false - Strong Secrets: Use 32+ character random strings
- Deletion Protection: Enable for RDS in production
- Backup Retention: Set appropriate retention periods
- Monitoring: Enable CloudWatch monitoring
- VPC Flow Logs: Enable for network monitoring
- IAM Least Privilege: Review and restrict IAM policies
- Regular Updates: Keep Kubernetes version current
Secrets Management
Never commit secrets to version control! Use one of these approaches:-
Environment Variables:
-
AWS Secrets Manager:
-
Terraform Cloud/Enterprise:
- Store sensitive variables in Terraform Cloud
- Mark as sensitive
- Use workspace-specific values
Troubleshooting
Cluster Creation Fails
Cluster Creation Fails
Common Causes:
- Insufficient IAM permissions
- Service quota limits reached
- Subnet IP address exhaustion
Nodes Not Joining Cluster
Nodes Not Joining Cluster
Common Causes:
- Security group misconfiguration
- IAM role issues
- Subnet routing problems
Database Connection Failures
Database Connection Failures
Common Causes:
- Security group rules
- Wrong endpoint
- Password mismatch
High Costs
High Costs
Cost Optimization Tips:
- Right-size node instances:
- Use smaller instances for development
- Enable cluster autoscaler
- Optimize database:
- Lower min_capacity for non-production
- Reduce backup retention
- Reduce NAT Gateway costs:
- Use single NAT Gateway for development
- Consider VPC endpoints for AWS services
- Monitor usage:
Maintenance and Updates
Kubernetes Version Upgrades
Backup and Disaster Recovery
Automated Backups:- RDS: Daily automated backups (configurable retention)
- EKS: Backup using Velero or AWS Backup
Next Steps
Deploy Application
Deploy Runlayer application using Helm charts
ECS Alternative
Consider ECS deployment if you prefer containers without Kubernetes
Monitoring
Set up comprehensive monitoring and alerting
SSL Certificates
Configure SSL certificates with ACM or cert-manager
Support
For issues and questions:- GitHub Issues: anysource-AI/Runlayer
- Email: [email protected]
- Documentation: docs.runlayer.com