Mallard Metrics

Self-hosted, privacy-focused web analytics powered by DuckDB and the behavioral extension.

Single binary. Single process. Zero external dependencies.


What is Mallard Metrics?

Mallard Metrics is a lightweight, GDPR/CCPA-compliant alternative to cloud analytics platforms. It runs entirely on your infrastructure, stores no personally identifiable information, and requires no cookies or consent banners.

Built in Rust for predictable, low resource usage. The embedded DuckDB database — combined with the behavioral extension — provides SQL-native behavioral analytics: funnels, retention cohorts, session analysis, sequence matching, and flow analysis. No third-party services involved.

Mallard Metrics Dashboard

Core Properties

PropertyValue
LanguageRust (MSRV 1.94.0)
Web frameworkAxum 0.8.x
DatabaseDuckDB (disk-based, embedded, in-process)
Analyticsbehavioral extension (loaded at runtime)
StorageDate-partitioned Parquet files (ZSTD-compressed)
FrontendPreact + HTM (no build step, embedded in binary)
DeploymentStatic musl binary, FROM scratch Docker image
Tests333 passing (262 unit + 71 integration)

Key Features

Privacy by Design

  • No cookies — Visitor identification uses a daily-rotating HMAC-SHA256 hash of IP + User-Agent + daily salt.
  • No PII storage — IP addresses are hashed and discarded; they are never written to disk.
  • Daily salt rotation — Visitor IDs change every 24 hours, preventing long-term tracking.
  • Privacy-preserving — Pseudonymous visitor IDs; no cookies; no raw IP storage. See Security & Privacy for details.

Single Binary Deployment

  • One process handles ingestion, storage, querying, authentication, and the dashboard.
  • DuckDB is embedded — no separate database to install or operate.
  • FROM scratch Docker image: the binary is the only file in the container.
  • WAL-based durability: disk-backed DuckDB preserves hot events through crashes.

Analytical Power

CategoryCapabilities
Core metricsUnique visitors, pageviews, bounce rate, pages/session
BreakdownsPages, referrers, browsers, OS, devices, countries
Time-seriesHourly and daily aggregations
Funnel analysisMulti-step conversion funnels via window_funnel()
Retention cohortsWeekly retention grids via retention()
Session analyticsDuration, depth via sessionize()
Sequence matchingBehavioral patterns via sequence_match()
Flow analysisNext-page navigation via sequence_next_node()

Production Ready

  • Argon2id authentication — Password-protected dashboard with cryptographic session tokens.
  • API key management — Programmatic access with SHA-256 hashed keys (mm_ prefix), disk-persisted.
  • Rate limiting — Per-site token-bucket rate limiter on the ingestion endpoint.
  • Query caching — TTL-based in-memory cache for analytics queries.
  • Bot filtering — Automatic filtering of known bot User-Agents.
  • GeoIP — MaxMind GeoLite2 integration with graceful fallback.
  • Data retention — Configurable automatic cleanup of old Parquet partitions.
  • Graceful shutdown — Buffered events are flushed before process exit.
  • Prometheus metricsGET /metrics for scraping with counters for ingestion, auth, cache, and rate limiting.
  • OWASP security headers — Including HSTS, CSP, Permissions-Policy, and X-Request-ID.
  • CSRF protection — Origin/Referer validation on all state-mutating endpoints.
  • Brute-force protection — Per-IP login lockout with configurable threshold and lockout duration.
  • GDPR-friendly mode — Single MALLARD_GDPR_MODE=true toggle strips referrers, rounds timestamps, reduces GeoIP precision, and enables the Art. 17 data-erasure API.

When Should You Use Mallard Metrics?

Mallard Metrics is a good fit when you:

  • Want full control over your analytics data on your own server.
  • Need GDPR/CCPA compliance without third-party data processors.
  • Are running a small-to-medium website and want low operational overhead.
  • Need advanced behavioral analytics (funnels, retention, sequences) without a SaaS subscription.
  • Want to demonstrate the power of DuckDB's behavioral extension in a real-world production context.

It is not designed for:

  • Multi-region distributed analytics at very high volume (millions of events/minute).
  • Real-time dashboards with sub-second latency requirements.
  • Replacing a full data warehouse.

Project Status

Mallard Metrics is actively developed and production-ready. See GitHub for the latest releases and issue tracker.

The behavioral extension powering advanced analytics is developed at github.com/tomtom215/duckdb-behavioral.