Troubleshooting

This guide covers common issues encountered when developing, deploying, and operating the Lithosphere platform. Solutions are drawn from the infrastructure configuration (infra/README.md) and deployment secrets documentation (SECRETS.md).


Configuration Not Loading

If services are not picking up configuration changes, follow these diagnostic steps:

Check Docker Compose Configuration

# Validate the compose file syntax and merged configuration
docker compose config

# View container logs for a specific service
docker compose logs SERVICE_NAME

# View logs with timestamps and follow mode
docker compose logs -f --timestamps SERVICE_NAME

Validate Prometheus Configuration

# Check Prometheus config syntax inside the container
docker compose exec prometheus promtool check config /etc/prometheus/prometheus.yml

Apply Configuration Updates


Dashboards Not Appearing in Grafana

If custom dashboards are not visible in Grafana after adding them:

  1. Check the provisioning path is correct. Dashboard JSON files must be placed in the infra/grafana/dashboards/ directory.

  2. Validate JSON syntax:

  3. Check Grafana logs for provisioning errors:

  4. Restart Grafana to reload provisioned dashboards:

Pre-configured dashboards included with the Lithosphere infrastructure:

  • System Overview -- VPS health (CPU, RAM, Disk, Network)

  • API Monitoring -- Request rates, latencies, errors, logs

  • Container Metrics -- Docker container resource usage


Alerts Not Firing

If expected alerts are not being triggered in the monitoring stack:

  1. Check alert rule syntax in the Prometheus UI at http://localhost:9091/alerts.

  2. Verify Alertmanager is configured in prometheus.yml:

  3. Check Alertmanager status and active alerts:

  4. Verify alert rule file at infra/prometheus/alerts/lithosphere-alerts.yml -- rules cover service health (API down, Indexer down, DB down), performance issues (high latency, error rates), resource usage (CPU, memory, disk), and container issues (restarts, high resource usage).


SSH Connection Failed

Cause: The SSH_PRIVATE_KEY GitHub secret is incorrectly formatted or the corresponding public key is not installed on the target server.

Solution:

  1. Verify the SSH_PRIVATE_KEY secret includes the complete key with BEGIN and END lines:

  2. Verify the public key is authorized on the server:

  3. Test the connection manually:

  4. If generating a new key pair:


Health Check Failed

Cause: The API service did not start correctly or is not responding on the expected port.

Solution:

  1. Check the API server logs:

  2. Verify the API container is running:

  3. Check that required environment variables are set:

    • DATABASE_URL -- PostgreSQL connection string

    • LITHO_RPC_URL -- Blockchain RPC endpoint

    • LITHO_CHAIN_ID -- Chain ID (should be 61)

  4. Test the health endpoint locally:


Build Failed

Cause: Syntax errors in a Dockerfile, missing dependencies, or incompatible base image versions.

Solution:

  1. Check the build output for specific error messages.

  2. Verify Dockerfile syntax for each service.

  3. Ensure all referenced files and directories exist in the build context.

  4. Check that base images are accessible (network issues with Docker Hub).

  5. Try building a specific service in isolation:


Manual Deployment Steps

If the automated GitHub Actions deployment fails, deploy manually:


Performance Tuning

Reduce Disk Usage

Reduce retention periods for metrics and logs:

Reduce Memory Usage

Set memory limits on monitoring containers:

Optimize Scrape Intervals

Increase scrape intervals for less-critical metrics:


Common Environment Variables

Ensure these are set correctly in the .env file or exported before running Docker Compose:


Getting Help

If you are unable to resolve an issue using this guide:

  1. Check the CI/CD Guide for pipeline-specific issues.

  2. Review the infrastructure README at Makulu/infra/README.md for detailed monitoring configuration.

  3. Review the secrets documentation at .github/SECRETS.md for deployment credential setup.

  4. Open an issue on the GitHub repositoryarrow-up-right with:

    • The error message or unexpected behavior

    • Steps to reproduce

    • Environment details (Local, Devnet, Staging, or Mainnet)

    • Relevant log output

Last updated