Active Health Check #

Active health check adalah mekanisme di mana Caddy secara proaktif dan berkala mengirim request ke setiap backend untuk memverifikasi bahwa backend tersebut sehat dan siap menerima traffic. Berbeda dari passive health check yang menunggu error dari request nyata, active health check mendeteksi masalah sebelum user merasakan dampaknya.

Bayangkan active health check sebagai “petugas keamanan” yang secara rutin mengetuk pintu setiap backend dan memastikan semuanya responsif. Jika satu backend tidak menjawab atau menjawab dengan error, Caddy segera menghapusnya dari rotasi — semua ini terjadi di background, transparan untuk user.

Cara Kerja Active Health Check #

Caddy background goroutine berjalan terus-menerus:

    ┌─────────────────────────────────────────┐
    │  Every health_interval (misal: 10s)     │
    │                                         │
    │  GET /health → backend-1:3000           │
    │    Status 200 → ✓ Healthy               │
    │    Status 500 → ✗ Unhealthy             │
    │    Timeout    → ✗ Unhealthy             │
    │                                         │
    │  GET /health → backend-2:3000           │
    │    Status 200 → ✓ Healthy               │
    │                                         │
    │  GET /health → backend-3:3000           │
    │    Timeout    → ✗ Unhealthy             │
    │    (di-skip dari rotasi sampai pulih)   │
    └─────────────────────────────────────────┘

Timeline:
  T=0s    Semua backend healthy → traffic normal
  T=30s   backend-3 timeout → tandai unhealthy
  T=30s   Traffic dialihkan ke backend-1 & 2
  T=40s   backend-3 masih tidak respond
  T=50s   backend-3 kembali respond 200
  T=50s   backend-3 kembali masuk rotasi

Konfigurasi Dasar #

example.com {
    reverse_proxy backend-1:3000 backend-2:3000 backend-3:3000 {
        # Endpoint yang akan di-cek
        health_uri /health
        
        # Seberapa sering cek (default: 30s jika health_uri di-set)
        health_interval 10s
        
        # Timeout per health check request
        health_timeout 5s
        
        # Status code yang dianggap sehat (default: 200)
        health_status 200
    }
}

Semua Opsi Health Check #

example.com {
    reverse_proxy backend-1:3000 backend-2:3000 {
        # ── Endpoint ──────────────────────────────────────────────
        health_uri /health          # Path yang dicek
        health_port 9090            # Port berbeda dari port serving (opsional)
                                    # Berguna jika health endpoint di port admin
        
        # ── Timing ────────────────────────────────────────────────
        health_interval 10s         # Interval antar check
        health_timeout  5s          # Timeout per request
        
        # ── Validasi Response ──────────────────────────────────────
        health_status 200           # Kode yang dianggap sehat
                                    # Bisa range: 200 (default) atau 2xx
        
        # Cek apakah body response berisi string tertentu
        health_body "\"status\":\"ok\""
        # Regex juga didukung: health_body `"status"\s*:\s*"(ok|healthy)"`
        
        # ── Header Kustom ─────────────────────────────────────────
        health_headers {
            Accept "application/json"
            X-Health-Check "caddy"
            Authorization "Bearer {env.HEALTH_CHECK_TOKEN}"
        }
        
        # ── Method ───────────────────────────────────────────────
        # Default: GET
        # Untuk endpoint yang butuh method berbeda: belum built-in
        # Tapi bisa pakai path yang menerima GET
    }
}

Health Check dengan Port Terpisah #

Beberapa arsitektur memisahkan port untuk health check dari port serving utama. Ini berguna untuk:

Port 3000 → Traffic user (production)
Port 9090 → Health check, metrics, admin (internal only)

Keuntungan:
- Health endpoint tidak dapat diakses dari internet
- Bisa mengembalikan info lebih detail tanpa khawatir security
- Tidak berkontribusi ke traffic log user

example.com {
    reverse_proxy backend-1:3000 backend-2:3000 {
        # Caddy kirim health check ke port 9090
        # tapi forward traffic user ke port 3000
        health_uri  /health
        health_port 9090
        health_interval 15s
    }
}

Membuat Health Endpoint yang Informatif #

Health endpoint yang baik melakukan lebih dari sekadar return 200. Ia harus memverifikasi komponen-komponen kritis:

// Node.js/Express — Health endpoint yang komprehensif
const express = require('express');
const { Pool } = require('pg');
const Redis = require('ioredis');

const app = express();
const db = new Pool({ connectionString: process.env.DATABASE_URL });
const redis = new Redis(process.env.REDIS_URL);

app.get('/health', async (req, res) => {
    const startTime = Date.now();
    const health = {
        status: 'ok',
        timestamp: new Date().toISOString(),
        version: process.env.APP_VERSION || 'unknown',
        uptime: process.uptime(),
        checks: {}
    };

    // Check 1: Database connectivity
    try {
        const dbStart = Date.now();
        await db.query('SELECT 1');
        health.checks.database = {
            status: 'ok',
            latency_ms: Date.now() - dbStart
        };
    } catch (err) {
        health.checks.database = { status: 'error', error: err.message };
        health.status = 'degraded';
    }

    // Check 2: Redis connectivity
    try {
        const redisStart = Date.now();
        await redis.ping();
        health.checks.redis = {
            status: 'ok',
            latency_ms: Date.now() - redisStart
        };
    } catch (err) {
        health.checks.redis = { status: 'error', error: err.message };
        // Redis error = warning, not critical (depends on your app)
    }

    // Check 3: Memory usage
    const memUsage = process.memoryUsage();
    const memUsedMB = Math.round(memUsage.heapUsed / 1024 / 1024);
    const memTotalMB = Math.round(memUsage.heapTotal / 1024 / 1024);
    health.checks.memory = {
        status: memUsedMB < 900 ? 'ok' : 'warning',  // Alert jika > 900MB
        used_mb: memUsedMB,
        total_mb: memTotalMB
    };

    health.total_latency_ms = Date.now() - startTime;

    // Return 200 hanya jika semua critical checks ok
    const statusCode = health.status === 'ok' ? 200 : 503;
    res.status(statusCode).json(health);
});

// Health check untuk Caddy: lebih simpel, hanya return status
app.get('/health/simple', async (req, res) => {
    try {
        await db.query('SELECT 1');
        res.status(200).json({ status: 'ok' });
    } catch {
        res.status(503).json({ status: 'error' });
    }
});

# Python/FastAPI — Health endpoint
from fastapi import FastAPI, Response
import asyncpg
import aioredis
import time

app = FastAPI()

@app.get("/health")
async def health_check():
    health = {"status": "ok", "checks": {}}
    
    # Check database
    try:
        conn = await asyncpg.connect(DATABASE_URL)
        start = time.time()
        await conn.fetchval("SELECT 1")
        await conn.close()
        health["checks"]["database"] = {
            "status": "ok",
            "latency_ms": round((time.time() - start) * 1000, 2)
        }
    except Exception as e:
        health["checks"]["database"] = {"status": "error", "error": str(e)}
        health["status"] = "degraded"
    
    status_code = 200 if health["status"] == "ok" else 503
    return Response(content=str(health), status_code=status_code)

Validasi Body Response #

Kadang status code 200 saja tidak cukup — kamu ingin memastikan response berisi data yang tepat:

example.com {
    reverse_proxy backend-1:3000 backend-2:3000 {
        health_uri  /health
        health_status 200
        
        # Pastikan body berisi "status":"ok"
        # Jika backend return 200 tapi body menunjukkan error,
        # Caddy akan tandai sebagai unhealthy
        health_body "\"status\":\"ok\""
    }
}

Response yang dianggap SEHAT:
  HTTP/1.1 200 OK
  {"status":"ok","uptime":3600}      ✓ Status 200 + body mengandung "status":"ok"
  
Response yang dianggap TIDAK SEHAT:
  HTTP/1.1 200 OK
  {"status":"degraded","error":"db"}  ✗ Status 200 tapi body tidak mengandung "status":"ok"
  
  HTTP/1.1 503 Service Unavailable
  {"status":"error"}                  ✗ Status 503

Integrasi dengan Deployment Pipeline #

Health check bisa dimanfaatkan dalam deployment pipeline untuk zero-downtime deploy:

#!/bin/bash
# deploy.sh — Zero-downtime deployment script

NEW_VERSION=$1
BACKEND_HOST="backend-2"
BACKEND_PORT=3000
CADDY_ADMIN="http://localhost:2019"

echo "Deploying version $NEW_VERSION to $BACKEND_HOST..."

# Step 1: Deploy versi baru ke backend-2 (tanpa traffic)
ssh $BACKEND_HOST "
    docker pull myapp:$NEW_VERSION
    docker stop myapp || true
    docker run -d --name myapp -p $BACKEND_PORT:$BACKEND_PORT myapp:$NEW_VERSION
"

# Step 2: Tunggu backend-2 ready (health check)
echo "Waiting for backend-2 to be healthy..."
MAX_WAIT=60
WAITED=0
while [ $WAITED -lt $MAX_WAIT ]; do
    STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
        http://$BACKEND_HOST:$BACKEND_PORT/health)
    if [ "$STATUS" = "200" ]; then
        echo "backend-2 is healthy!"
        break
    fi
    sleep 5
    WAITED=$((WAITED + 5))
    echo "  Still waiting... ($WAITED/$MAX_WAIT seconds)"
done

if [ $WAITED -ge $MAX_WAIT ]; then
    echo "ERROR: backend-2 failed to become healthy in ${MAX_WAIT}s"
    exit 1
fi

# Step 3: Caddy otomatis akan menyertakan backend-2 setelah health check pass
echo "Caddy will automatically include backend-2 in rotation."
echo "Deployment complete!"

Monitoring Health Check Status #

# Lihat status health semua upstream
curl -s http://localhost:2019/reverse_proxy/upstreams/ | jq '
  .[] | {
    address: .address,
    healthy: .healthy,
    total_requests: .num_requests,
    consecutive_failures: .fails
  }
'

# Alert jika ada upstream yang unhealthy
check_upstreams() {
    UNHEALTHY=$(curl -s http://localhost:2019/reverse_proxy/upstreams/ | \
        jq '[.[] | select(.healthy == false)] | length')
    
    if [ "$UNHEALTHY" -gt 0 ]; then
        echo "ALERT: $UNHEALTHY upstream(s) are unhealthy!"
        curl -s http://localhost:2019/reverse_proxy/upstreams/ | \
            jq '.[] | select(.healthy == false) | .address'
        # Kirim alert ke Slack, PagerDuty, dll.
    fi
}

# Jalankan setiap menit via cron
# */1 * * * * /usr/local/bin/check_upstreams.sh

Ringkasan #

Active health check mengirim request berkala ke setiap backend — jauh lebih proaktif dari passive health check yang menunggu error dari traffic nyata.
Konfigurasi minimal: health_uri /health + health_interval 10s + health_timeout 5s sudah memberikan perlindungan yang baik.
Gunakan health_port terpisah jika kamu ingin health endpoint di port internal yang tidak expose ke publik.
Tambahkan health_body untuk validasi konten response — berguna saat backend return 200 tapi dalam kondisi degraded.
Health endpoint yang baik harus cek database, cache, dan dependency kritis lainnya — bukan hanya “apakah process berjalan”.
Manfaatkan health check dalam deployment pipeline — tunggu backend baru healthy sebelum mengirim traffic, Caddy otomatis include backend setelah health check pass.

← Sebelumnya: Weighted Berikutnya: Passive Health Check →