Monitoring

Health Checks

Three health endpoints are available without authentication:

GET /health

Returns ok with HTTP 200 when the server is running. Use this for:

  • Load balancer health checks.
  • Container orchestrator liveness probes.
# Kubernetes liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10

GET /health/ready

Executes a lightweight DuckDB query (SELECT 1 FROM events_all LIMIT 0) to verify the database is operational. Returns:

  • 200 OK — database is ready and accepting queries.
  • 503 Service Unavailable — database is not ready (use this as a readiness probe, not liveness).
# Kubernetes readiness probe
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3

GET /health/detailed

Returns a JSON object with component-level status. See Health & Metrics API for the full schema.


Prometheus Metrics

GET /metrics returns Prometheus text format metrics (text/plain; version=0.0.4).

If MALLARD_METRICS_TOKEN is set, this endpoint requires Authorization: Bearer <token>.

Gauges

MetricTypeDescription
mallard_buffered_eventsgaugeEvents in memory, not yet flushed to Parquet
mallard_cache_entriesgaugeCached query results in memory
mallard_auth_configuredgauge1 if admin password is set, 0 otherwise
mallard_geoip_loadedgauge1 if GeoIP database loaded successfully
mallard_filter_botsgauge1 if bot filtering is active
mallard_behavioral_extensiongauge1 if behavioral extension loaded, 0 otherwise

Counters

MetricTypeDescription
mallard_events_ingested_totalcounterTotal events accepted through POST /api/event
mallard_flush_failures_totalcounterTotal buffer flush failures
mallard_rate_limit_rejections_totalcounterTotal requests rejected by the per-site rate limiter
mallard_login_failures_totalcounterTotal failed login attempts
mallard_cache_hits_totalcounterTotal query cache hits
mallard_cache_misses_totalcounterTotal query cache misses

Prometheus Scrape Configuration

scrape_configs:
  - job_name: mallard_metrics
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: /metrics
    scrape_interval: 30s
    # If MALLARD_METRICS_TOKEN is set:
    authorization:
      credentials: your-metrics-token

Example Output

# HELP mallard_buffered_events Number of events in the in-memory buffer
# TYPE mallard_buffered_events gauge
mallard_buffered_events 42

# HELP mallard_cache_entries Number of cached query results
# TYPE mallard_cache_entries gauge
mallard_cache_entries 3

# HELP mallard_behavioral_extension Whether behavioral extension is loaded
# TYPE mallard_behavioral_extension gauge
mallard_behavioral_extension 1

# HELP mallard_events_ingested_total Total events ingested
# TYPE mallard_events_ingested_total counter
mallard_events_ingested_total 158432

# HELP mallard_cache_hits_total Total query cache hits
# TYPE mallard_cache_hits_total counter
mallard_cache_hits_total 9871

# HELP mallard_cache_misses_total Total query cache misses
# TYPE mallard_cache_misses_total counter
mallard_cache_misses_total 1204

Grafana Dashboard

A minimal Grafana panel configuration for key metrics:

{
  "panels": [
    {
      "title": "Ingestion Rate",
      "targets": [{"expr": "rate(mallard_events_ingested_total[5m])"}]
    },
    {
      "title": "Buffered Events",
      "targets": [{"expr": "mallard_buffered_events"}]
    },
    {
      "title": "Cache Hit Rate",
      "targets": [{"expr": "rate(mallard_cache_hits_total[5m]) / (rate(mallard_cache_hits_total[5m]) + rate(mallard_cache_misses_total[5m]))"}]
    },
    {
      "title": "Rate Limit Rejections",
      "targets": [{"expr": "rate(mallard_rate_limit_rejections_total[5m])"}]
    }
  ]
}

Structured Logging

Mallard Metrics uses tracing for structured logging. Two formats are supported:

Text (default)

Human-readable output with timestamps, log levels, and structured fields:

2024-01-15T10:00:00.123Z  INFO mallard_metrics: Starting Mallard Metrics host="0.0.0.0" port=8000
2024-01-15T10:00:00.456Z  INFO mallard_metrics: Behavioral extension loaded
2024-01-15T10:00:00.457Z  INFO mallard_metrics: Listening addr="0.0.0.0:8000"

JSON

Set MALLARD_LOG_FORMAT=json for machine-parseable output compatible with log aggregators (Loki, Elasticsearch, Splunk):

{"timestamp":"2024-01-15T10:00:00.123Z","level":"INFO","fields":{"message":"Flushed events to Parquet","count":42},"target":"mallard_metrics::ingest::buffer","request_id":"a3f2c1d8-..."}

Every log line emitted during a request carries a request_id field matching the X-Request-ID response header, enabling end-to-end log correlation.

Log Level Control

Use the RUST_LOG environment variable (standard tracing-subscriber env-filter syntax):

RUST_LOG=mallard_metrics=debug,tower_http=info

Default: mallard_metrics=info,tower_http=info


Alerting Recommendations

AlertConditionSeverity
Server downup{job="mallard_metrics"} == 0Critical
Large event buffermallard_buffered_events > 5000Warning
High flush failuresincrease(mallard_flush_failures_total[5m]) > 0Warning
Auth not configuredmallard_auth_configured == 0Warning
High rate limit rejectionsrate(mallard_rate_limit_rejections_total[5m]) > 10Info
Low cache hit rate(cache_hits / (cache_hits + cache_misses)) < 0.5Info
GeoIP not loadedmallard_geoip_loaded == 0Info
Behavioral extension missingmallard_behavioral_extension == 0Info