Monitoring

Health Checks

Three health endpoints are available without authentication:

`GET /health`

Returns ok with HTTP 200 when the server is running. Use this for:

Load balancer health checks.
Container orchestrator liveness probes.

# Kubernetes liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 10

`GET /health/ready`

Executes a lightweight DuckDB query (SELECT 1 FROM events_all LIMIT 0) to verify the database is operational. Returns:

200 OK — database is ready and accepting queries.
503 Service Unavailable — database is not ready (use this as a readiness probe, not liveness).

# Kubernetes readiness probe
readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 15
  failureThreshold: 3

`GET /health/detailed`

Returns a JSON object with component-level status. See Health & Metrics API for the full schema.

Prometheus Metrics

GET /metrics returns Prometheus text format metrics (text/plain; version=0.0.4).

If MALLARD_METRICS_TOKEN is set, this endpoint requires Authorization: Bearer <token>.

Gauges

Metric	Type	Description
`mallard_buffered_events`	gauge	Events in memory, not yet flushed to Parquet
`mallard_cache_entries`	gauge	Cached query results in memory
`mallard_auth_configured`	gauge	`1` if admin password is set, `0` otherwise
`mallard_geoip_loaded`	gauge	`1` if GeoIP database loaded successfully
`mallard_filter_bots`	gauge	`1` if bot filtering is active
`mallard_behavioral_extension`	gauge	`1` if behavioral extension loaded, `0` otherwise

Counters

Metric	Type	Description
`mallard_events_ingested_total`	counter	Total events accepted through `POST /api/event`
`mallard_flush_failures_total`	counter	Total buffer flush failures
`mallard_rate_limit_rejections_total`	counter	Total requests rejected by the per-site rate limiter
`mallard_login_failures_total`	counter	Total failed login attempts
`mallard_cache_hits_total`	counter	Total query cache hits
`mallard_cache_misses_total`	counter	Total query cache misses

Prometheus Scrape Configuration

scrape_configs:
  - job_name: mallard_metrics
    static_configs:
      - targets: ['localhost:8000']
    metrics_path: /metrics
    scrape_interval: 30s
    # If MALLARD_METRICS_TOKEN is set:
    authorization:
      credentials: your-metrics-token

Example Output

# HELP mallard_buffered_events Number of events in the in-memory buffer
# TYPE mallard_buffered_events gauge
mallard_buffered_events 42

# HELP mallard_cache_entries Number of cached query results
# TYPE mallard_cache_entries gauge
mallard_cache_entries 3

# HELP mallard_behavioral_extension Whether behavioral extension is loaded
# TYPE mallard_behavioral_extension gauge
mallard_behavioral_extension 1

# HELP mallard_events_ingested_total Total events ingested
# TYPE mallard_events_ingested_total counter
mallard_events_ingested_total 158432

# HELP mallard_cache_hits_total Total query cache hits
# TYPE mallard_cache_hits_total counter
mallard_cache_hits_total 9871

# HELP mallard_cache_misses_total Total query cache misses
# TYPE mallard_cache_misses_total counter
mallard_cache_misses_total 1204

Grafana Dashboard

A minimal Grafana panel configuration for key metrics:

{
  "panels": [
    {
      "title": "Ingestion Rate",
      "targets": [{"expr": "rate(mallard_events_ingested_total[5m])"}]
    },
    {
      "title": "Buffered Events",
      "targets": [{"expr": "mallard_buffered_events"}]
    },
    {
      "title": "Cache Hit Rate",
      "targets": [{"expr": "rate(mallard_cache_hits_total[5m]) / (rate(mallard_cache_hits_total[5m]) + rate(mallard_cache_misses_total[5m]))"}]
    },
    {
      "title": "Rate Limit Rejections",
      "targets": [{"expr": "rate(mallard_rate_limit_rejections_total[5m])"}]
    }
  ]
}

Structured Logging

Mallard Metrics uses tracing for structured logging. Two formats are supported:

Text (default)

Human-readable output with timestamps, log levels, and structured fields:

2024-01-15T10:00:00.123Z  INFO mallard_metrics: Starting Mallard Metrics host="0.0.0.0" port=8000
2024-01-15T10:00:00.456Z  INFO mallard_metrics: Behavioral extension loaded
2024-01-15T10:00:00.457Z  INFO mallard_metrics: Listening addr="0.0.0.0:8000"

JSON

Set MALLARD_LOG_FORMAT=json for machine-parseable output compatible with log aggregators (Loki, Elasticsearch, Splunk):

{"timestamp":"2024-01-15T10:00:00.123Z","level":"INFO","fields":{"message":"Flushed events to Parquet","count":42},"target":"mallard_metrics::ingest::buffer","request_id":"a3f2c1d8-..."}

Every log line emitted during a request carries a request_id field matching the X-Request-ID response header, enabling end-to-end log correlation.

Log Level Control

Use the RUST_LOG environment variable (standard tracing-subscriber env-filter syntax):

RUST_LOG=mallard_metrics=debug,tower_http=info

Default: mallard_metrics=info,tower_http=info

Alerting Recommendations

Alert	Condition	Severity
Server down	`up{job="mallard_metrics"} == 0`	Critical
Large event buffer	`mallard_buffered_events > 5000`	Warning
High flush failures	`increase(mallard_flush_failures_total[5m]) > 0`	Warning
Auth not configured	`mallard_auth_configured == 0`	Warning
High rate limit rejections	`rate(mallard_rate_limit_rejections_total[5m]) > 10`	Info
Low cache hit rate	`(cache_hits / (cache_hits + cache_misses)) < 0.5`	Info
GeoIP not loaded	`mallard_geoip_loaded == 0`	Info
Behavioral extension missing	`mallard_behavioral_extension == 0`	Info