Troubleshooting
This guide covers common issues and troubleshooting procedures for Runlayer. For complex issues or enterprise support, contact our technical team.Quick Diagnostics
System Health Check
Kubernetes (Helm/EKS)Database Connectivity
Common Issues
Service Won’t Start
Symptoms: Services fail to start or immediately exit Solutions:- Kubernetes: Inspect pod status and events
kubectl describe pod <pod> -n anysourcekubectl logs <pod> -n anysource
- ECS: Inspect stopped tasks and CloudWatch logs
aws ecs describe-tasks --cluster <cluster> --tasks <task-id>- Review the task’s CloudWatch log group
- Verify configuration values and secrets (Kubernetes Secrets or AWS SSM/Secrets Manager)
- Check security groups and network connectivity between services
Database Connection Issues
Symptoms: Application cannot connect to database Solutions:- Verify the database endpoint and credentials
- Check security groups / network policies for database access
- Review database logs in AWS (RDS logs / CloudWatch)
- Validate environment variables or secret values used by the backend
Performance Issues
Symptoms: Slow response times or high resource usage Solutions:- Check resource utilization (CloudWatch metrics or Kubernetes metrics)
- Review database performance and slow query logs
- Verify Redis cache connectivity and hit rate
- Inspect application logs for errors and timeouts
ACM Certificate Issues
Symptoms: Terraform fails looking up certificate or ACM certificates remain inPENDING_VALIDATION.
Wildcard lookup fails (default behavior):
The ECS module derives a wildcard domain from your domain (e.g., ecs.staging.runlayer.com → *.staging.runlayer.com) and looks up an existing certificate.
- Verify a wildcard certificate exists:
aws acm list-certificates --query "CertificateSummaryList[?contains(DomainName, '*')]" - If no wildcard certificate exists, either create one manually or set
enable_acm_dns_validation = trueto have Terraform create it.
- Confirm the Route53 hosted zone exists in the AWS account running Terraform.
- Ensure
hosted_zone_namematches the zone name (e.g.,staging.runlayer.com). - Re-run
terraform applyso the_acme-challengeCNAME records are created automatically.
Authentication Problems
Symptoms: Users cannot log in or access resources Solutions:- Verify authentication configuration
- Check external identity provider connectivity
- Review user permissions and roles
- Check JWT token configuration
Log Analysis
Application Logs
Kubernetes (Helm/EKS)Database Logs
Review database logs in AWS (RDS logs / CloudWatch).Enterprise Support
For complex issues, performance optimization, or enterprise-level troubleshooting:Enterprise Technical Support
Contact our technical team for advanced troubleshooting and 24/7 support
Support Information
When contacting support, please include:- System Information: OS, deployment method, AWS region
- Error Messages: Complete error messages and stack traces
- Log Files: Relevant application and system logs
- Configuration: Sanitized configuration files (remove secrets)
- Steps to Reproduce: Detailed steps that led to the issue
Escalation Process
- Level 1: Basic troubleshooting (this guide)
- Level 2: Advanced diagnostics (contact support)
- Level 3: Engineering escalation (critical issues)
Preventive Measures
- Regular Monitoring: Set up health checks and alerting
- Log Rotation: Configure proper log management
- Resource Monitoring: Monitor CPU, memory, and disk usage
- Backup Verification: Regularly test backup and restore procedures