Troubleshooting

This guide covers common issues and troubleshooting procedures for Runlayer. For complex issues or enterprise support, contact our technical team.

Quick Diagnostics

System Health Check

Kubernetes (Helm/EKS)

kubectl get pods -n anysource
kubectl get svc -n anysource
kubectl get events -n anysource --sort-by=.metadata.creationTimestamp

ECS (Terraform)

aws ecs list-services --cluster <cluster>
aws ecs list-tasks --cluster <cluster> --service-name <service>
aws ecs describe-services --cluster <cluster> --services <service>

Database Connectivity

# Test PostgreSQL connection
psql "postgresql://<user>:<password>@<rds-endpoint>:5432/<db>"

# Test Redis connection
redis-cli -h <redis-endpoint> -p 6379 ping

Common Issues

Service Won’t Start

Symptoms: Services fail to start or immediately exit Solutions:

Kubernetes: Inspect pod status and events
- kubectl describe pod <pod> -n anysource
- kubectl logs <pod> -n anysource
ECS: Inspect stopped tasks and CloudWatch logs
- aws ecs describe-tasks --cluster <cluster> --tasks <task-id>
- Review the task’s CloudWatch log group
Verify configuration values and secrets (Kubernetes Secrets or AWS SSM/Secrets Manager)
Check security groups and network connectivity between services

Database Connection Issues

Symptoms: Application cannot connect to database Solutions:

Verify the database endpoint and credentials
Check security groups / network policies for database access
Review database logs in AWS (RDS logs / CloudWatch)
Validate environment variables or secret values used by the backend

Performance Issues

Symptoms: Slow response times or high resource usage Solutions:

Check resource utilization (CloudWatch metrics or Kubernetes metrics)
Review database performance and slow query logs
Verify Redis cache connectivity and hit rate
Inspect application logs for errors and timeouts

ACM Certificate Issues

Symptoms: Terraform fails looking up certificate or ACM certificates remain in PENDING_VALIDATION. Wildcard lookup fails (default behavior): The ECS module derives a wildcard domain from your domain (e.g., ecs.staging.runlayer.com → *.staging.runlayer.com) and looks up an existing certificate.

Verify a wildcard certificate exists: aws acm list-certificates --query "CertificateSummaryList[?contains(DomainName, '*')]"
If no wildcard certificate exists, either create one manually or set enable_acm_dns_validation = true to have Terraform create it.

DNS validation fails (when creating new certificates):

Confirm the Route53 hosted zone exists in the AWS account running Terraform.
Ensure hosted_zone_name matches the zone name (e.g., staging.runlayer.com).
Re-run terraform apply so the _acme-challenge CNAME records are created automatically.

Authentication Problems

Symptoms: Users cannot log in or access resources Solutions:

Verify authentication configuration
Check external identity provider connectivity
Review user permissions and roles
Check JWT token configuration

Log Analysis

Application Logs

Kubernetes (Helm/EKS)

kubectl logs -n anysource deploy/backend -f
kubectl logs -n anysource deploy/webapp -f

ECS (Terraform)

aws logs tail /aws/ecs/<service> --follow

Database Logs

Review database logs in AWS (RDS logs / CloudWatch).

Enterprise Support

For complex issues, performance optimization, or enterprise-level troubleshooting:

Enterprise Technical Support

Contact our technical team for advanced troubleshooting and 24/7 support

Support Information

When contacting support, please include:

System Information: OS, deployment method, AWS region
Error Messages: Complete error messages and stack traces
Log Files: Relevant application and system logs
Configuration: Sanitized configuration files (remove secrets)
Steps to Reproduce: Detailed steps that led to the issue

Escalation Process

Level 1: Basic troubleshooting (this guide)
Level 2: Advanced diagnostics (contact support)
Level 3: Engineering escalation (critical issues)

Preventive Measures

Regular Monitoring: Set up health checks and alerting
Log Rotation: Configure proper log management
Resource Monitoring: Monitor CPU, memory, and disk usage
Backup Verification: Regularly test backup and restore procedures

Contact our support team for comprehensive monitoring setup and proactive issue prevention.

​Troubleshooting

​Quick Diagnostics

​System Health Check

​Database Connectivity

​Common Issues

​Service Won’t Start

​Database Connection Issues

​Performance Issues

​ACM Certificate Issues

​Authentication Problems

​Log Analysis

​Application Logs

​Database Logs

​Enterprise Support