> ## Documentation Index
> Fetch the complete documentation index at: https://docs.runlayer.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Troubleshooting

> Common issues and troubleshooting procedures for Runlayer

# Troubleshooting

This guide covers common issues and troubleshooting procedures for Runlayer. For complex issues or enterprise support, contact our technical team.

## Quick Diagnostics

### System Health Check

**Kubernetes (Helm/EKS)**

```bash theme={null}
kubectl get pods -n anysource
kubectl get svc -n anysource
kubectl get events -n anysource --sort-by=.metadata.creationTimestamp
```

**ECS (Terraform)**

```bash theme={null}
aws ecs list-services --cluster <cluster>
aws ecs list-tasks --cluster <cluster> --service-name <service>
aws ecs describe-services --cluster <cluster> --services <service>
```

### Database Connectivity

```bash theme={null}
# Test PostgreSQL connection
psql "postgresql://<user>:<password>@<rds-endpoint>:5432/<db>"

# Test Redis connection
redis-cli -h <redis-endpoint> -p 6379 ping
```

## Common Issues

### Service Won't Start

**Symptoms**: Services fail to start or immediately exit

**Solutions**:

1. **Kubernetes**: Inspect pod status and events
   * `kubectl describe pod <pod> -n anysource`
   * `kubectl logs <pod> -n anysource`
2. **ECS**: Inspect stopped tasks and CloudWatch logs
   * `aws ecs describe-tasks --cluster <cluster> --tasks <task-id>`
   * Review the task's CloudWatch log group
3. Verify configuration values and secrets (Kubernetes Secrets or AWS SSM/Secrets Manager)
4. Check security groups and network connectivity between services

### Database Connection Issues

**Symptoms**: Application cannot connect to database

**Solutions**:

1. Verify the database endpoint and credentials
2. Check security groups / network policies for database access
3. Review database logs in AWS (RDS logs / CloudWatch)
4. Validate environment variables or secret values used by the backend

### Performance Issues

**Symptoms**: Slow response times or high resource usage

**Solutions**:

1. Check resource utilization (CloudWatch metrics or Kubernetes metrics)
2. Review database performance and slow query logs
3. Verify Redis cache connectivity and hit rate
4. Inspect application logs for errors and timeouts

### ACM Certificate Issues

**Symptoms**: Terraform fails looking up certificate or ACM certificates remain in `PENDING_VALIDATION`.

**Wildcard lookup fails** (default behavior):

The ECS module derives a wildcard domain from your domain (e.g., `ecs.staging.runlayer.com` → `*.staging.runlayer.com`) and looks up an existing certificate.

1. Verify a wildcard certificate exists: `aws acm list-certificates --query "CertificateSummaryList[?contains(DomainName, '*')]"`
2. If no wildcard certificate exists, either create one manually or set `enable_acm_dns_validation = true` to have Terraform create it.

**DNS validation fails** (when creating new certificates):

1. Confirm the Route53 hosted zone exists in the AWS account running Terraform.
2. Ensure `hosted_zone_name` matches the zone name (e.g., `staging.runlayer.com`).
3. Re-run `terraform apply` so the `_acme-challenge` CNAME records are created automatically.

### Authentication Problems

**Symptoms**: Users cannot log in or access resources

**Solutions**:

1. Verify authentication configuration
2. Check external identity provider connectivity
3. Review user permissions and roles
4. Check JWT token configuration

## Log Analysis

### Application Logs

**Kubernetes (Helm/EKS)**

```bash theme={null}
kubectl logs -n anysource deploy/backend -f
kubectl logs -n anysource deploy/webapp -f
```

**ECS (Terraform)**

```bash theme={null}
aws logs tail /aws/ecs/<service> --follow
```

### Database Logs

Review database logs in AWS (RDS logs / CloudWatch).

## Enterprise Support

For complex issues, performance optimization, or enterprise-level troubleshooting:

<Card title="Enterprise Technical Support" icon="life-ring" href="mailto:support@runlayer.com">
  Contact our technical team for advanced troubleshooting and 24/7 support
</Card>

## Support Information

When contacting support, please include:

* **System Information**: OS, deployment method, AWS region
* **Error Messages**: Complete error messages and stack traces
* **Log Files**: Relevant application and system logs
* **Configuration**: Sanitized configuration files (remove secrets)
* **Steps to Reproduce**: Detailed steps that led to the issue

## Escalation Process

1. **Level 1**: Basic troubleshooting (this guide)
2. **Level 2**: Advanced diagnostics (contact support)
3. **Level 3**: Engineering escalation (critical issues)

## Preventive Measures

* **Regular Monitoring**: Set up health checks and alerting
* **Log Rotation**: Configure proper log management
* **Resource Monitoring**: Monitor CPU, memory, and disk usage
* **Backup Verification**: Regularly test backup and restore procedures

Contact our support team for comprehensive monitoring setup and proactive issue prevention.
