NebulaDNS is a modern, observable, API-first authoritative DNS server written in safe Rust. It replaces 2001-era daemons (TinyDNS, djbdns) and 40-year-old C codebases (BIND) with a single small binary that ships with metrics, a control plane, and a Kubernetes operator โ so the next failure is detected in seconds, not by customers.
We ran djbdns 1.05 (released 2001) behind a CDN. It sends AXFR responses with QDCOUNT=0, which BIND 9.18+ rejects as FORMERR. Our downstream secondaries silently upgraded to BIND 9.18, and our zone-transfer redundancy eroded invisibly for 13 months until only one working agent remained.
No /metrics, no API, no post-deploy verification, no way to see that half the fleet had stopped transferring. Engineers had to SSH into six servers to reconstruct state during the incident.
NebulaDNS is the direct answer. Every failure mode that caused those incidents is a first-class feature here: propagation verification, peer software fingerprinting, atomic versioned deploys, deterministic SOA serials, and an API for every operator action.
Every operator action โ create zone, edit record, rotate TSIG key, trigger rollover, roll back โ is a single authenticated REST/gRPC call. The CLI and UI are thin clients.
Prometheus /metrics is always-on with zero hot-path cost. Enum-typed labels cap cardinality at compile time. Structured JSON logs, OTLP traces.
A deploy isn't "done" until every declared downstream secondary reports the new SOA serial. The propagation gate is built in โ the single feature missing from every other auth DNS server.
The server records version.bind CHAOS responses from every declared secondary. Silent upgrades can't erode your redundancy invisibly.
#![forbid(unsafe_code)] across every crate. Continuous fuzzing on the wire codec, zone parser, and DNSSEC signer.
Deterministic allocation means no p99.9 hiccups. Lock-free zone store served through arc-swap; readers never block.
Zone data is content-addressed and deployed atomically. Roll forward and rollback are one-liners. No partial reads. No "edit the file and SIGHUP".
Auto-generated, monotonic serials. Retry-with-same-serial is impossible by construction โ no more wedged recoveries.
Helm chart, operator, and CRDs (Zone, Record, Secondary, TsigKey, DeployGate). GitOps-ready from day one. Drop-in replacement for CoreDNS as cluster DNS.
Strict RFC 1034/1035/5936/1995/1996/8945/4034-5/7766/7858/8484/9250. Errors are explicit (QdCountMismatch) โ never silently malformed.
The dashboard shows, at a glance, which secondaries last transferred each zone and when. No more silent 13-month erosion.
systemd unit with ProtectSystem=strict, CAP_NET_BIND_SERVICE only, seccomp syscall filter, read-only rootfs. Non-root container, distroless base, โค 25 MB compressed.
NebulaDNS + Prometheus + Grafana, with a real example zone and a dig-based smoke container.
git clone https://github.com/bwalia/nebuladns
cd nebuladns
docker compose -f deploy/docker/compose/docker-compose.yml \
up --build -d
# Real DNS over UDP and TCP
dig @127.0.0.1 -p 5353 www.example.com A +short
# 192.0.2.10
# 192.0.2.11
# Dashboards
open http://127.0.0.1:3000 # Grafana
open http://127.0.0.1:9091 # Prometheus
StatefulSet, Service, PDB, NetworkPolicy, ServiceMonitor, PrometheusRule, and a helm test smoke pod, ready to helm install.
helm install nebuladns \
oci://ghcr.io/nebuladns/charts/nebuladns \
--namespace nebuladns --create-namespace \
--values values.yaml
helm test nebuladns -n nebuladns --logs
kubectl -n nebuladns port-forward svc/nebuladns 9090:9090
curl http://127.0.0.1:9090/metrics | head
Hardened systemd unit with watchdog + sd_notify. Ansible role with atomic symlink swap, smoke test, and rollback.
cd deploy/ansible
ansible-galaxy collection install -r requirements.yml
ansible-playbook -i inventory.ini playbook.yml \
-e nebuladns_version=v0.1.0 \
-e @environments/staging.yml
# Rollback? One command, a few seconds per host.
ansible-playbook -i inventory.ini rollback.yml
cargo build --release --bin nebuladns --bin nebulactl
./target/release/nebuladns --config config/nebuladns.demo.toml &
dig @127.0.0.1 -p 15353 www.example.com A +short
./target/release/nebulactl health
| Capability | TinyDNS | BIND 9 | Knot | NSD | PowerDNS | CoreDNS | NebulaDNS |
|---|---|---|---|---|---|---|---|
| Full REST API | โ | partial (rndc) | partial (knotc) | โ | partial | โ | โ |
Native Prometheus /metrics | โ | XML/JSON only | module | module | module | โ | โ always-on |
| Built-in propagation gate | โ | โ | โ | โ | โ | โ | โ |
| Kubernetes operator + CRDs | โ | โ | โ | โ | โ | โ | โ |
| Online DNSSEC signing | โ | โ | โ | โ (offline) | โ | partial | โ (HSM/KMS) |
| Memory-safe language | โ (C) | โ (C) | โ (C) | โ (C) | โ (C++) | โ ๏ธ (Go, GC) | โ (Rust, no GC) |
| Peer software fingerprinting | โ | โ | โ | โ | โ | โ | โ |
| Atomic versioned zone store | โ | โ | โ | โ | SQL-backed | โ | โ content-addressed |
| React dashboard included | โ | โ | โ | โ | third-party | โ | โฌ v1.0 (M9) |
โ
= shipped today ยท โฌ
= planned for v1.0 GA ยท โ ๏ธ = with caveats. See the full competitive analysis in PROJECT_PROMPT.md ยง2.5.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ React Web UI (v1.0) โ
โโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTPS (OpenAPI)
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโ
โ Control Plane API โ
โ axum (REST) + tonic (gRPC) โ
โ AuthN: mTLS + OIDC AuthZ: RBAC โ
โโโโโโโโโฌโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโโโโโโโโผโโโ โโโโโโโโโผโโโโโโโโโโโโโ
โ Zone Manager โ โ Propagation โ
โ validate โ โ Verifier โ
โ sign (DNSSEC) โ โ polls secondaries โ
โ atomic commit โ โ confirms SOA โ
โโโโโโโโโโฌโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Zone Store (content-addressed) โ
โ sled / redb / pluggable โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ DNS Data Plane โ
โ tokio + io_uring (Linux โฅ 5.19) โ
โ UDP ยท TCP ยท DoT ยท DoH ยท DoQ โ
โ AXFR / IXFR / NOTIFY / TSIG โ
โ DNSSEC online signer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Observability spine: tracing โ OpenTelemetry โ Prometheus + Loki + Tempo
Hand-rolled zero-copy wire codec. Lock-free zone snapshots via arc-swap. No allocations on the hot path in steady state.
OpenAPI spec is generated from code via utoipa. Every write is idempotent, dry-run-able, and audit-logged.
Content-addressed, versioned. Rollback is a one-liner. Writes are atomic; readers never block. Optional S3-backed continuous backup.
Metrics are always-on with a compile-time cardinality budget. The design contract: zero hot-path allocation, โค 1% CPU overhead, โค 1 ยตs added to p99 latency. The full catalogue is in the planning prompt; here's a slice of what's live today.
# Wire / query pipeline
nebula_dns_queries_total{proto,qtype,rcode}
nebula_dns_query_duration_seconds_bucket{proto,qtype,le}
nebula_dns_dropped_total{reason}
nebula_dns_formerr_total{peer,direction,reason}
# Zone transfer / NOTIFY (M3)
nebula_axfr_attempts_total{peer,zone,direction,result}
nebula_axfr_last_success_timestamp_seconds{peer,zone}
nebula_peer_version_info{peer,software,version} # the 1326 signal
# Propagation verifier (M6)
nebula_zone_current_soa_serial{zone}
nebula_secondary_observed_soa_serial{zone,peer}
nebula_zone_propagation_lag_seconds{zone,peer}
nebula_zone_propagation_converged{zone} 0|1
# Runtime
nebula_build_info{version,commit,rustc,target} 1
nebula_process_resident_memory_bytes
nebula_memory_hot_path_allocations_total # MUST stay at 0
Workspace, CI, /livez, /readyz, /metrics, systemd unit, distroless container, Helm skeleton, fuzz harness.
Full RFC 1035 message codec, name compression, RR types (A/AAAA/NS/CNAME/SOA/MX/TXT/PTR/SRV/CAA), TOML zone loader, UDP + TCP listeners, real dig answering end-to-end.
Wildcards, CNAME chasing, glue, RFC 2308 negative responses, EDNS negotiation, response-rate-limiting.
AXFR, IXFR, NOTIFY, TSIG + daily interop matrix against BIND / Knot / NSD / PowerDNS / CoreDNS.
Online signing (Ed25519 / ECDSA / RSA), automatic KSK/ZSK rollover (RFC 6781), HSM (PKCS#11) and AWS KMS backends.
The feature that makes incidents 1273 and 1326 impossible to recur.
Raft in-region, async log-ship cross-region, active-active with CRDTs, explicit failover gates.
First-class Zone / Record / Secondary / TsigKey / DeployGate resources. ExternalDNS provider. CoreDNS cluster-DNS drop-in.
Full milestone schedule and SLOs: PROJECT_PROMPT.md ยง15.
Every tagged commit produces a GitHub Release with static Linux (musl) and macOS
binaries for nebuladns and
nebulactl, per-archive
SHA-256 checksums, and auto-generated
release notes grouped by PR label. Releases are immutable โ a rollback is a new
version, never a rewritten tag.
Static binaries for x86_64 and aarch64 โ Linux (musl) and macOS. Verify with the SHA256SUMS file shipped with every release.
The release runbook. How to cut a version, what the workflow does, how to verify an artifact, how to roll back โ and a dedicated section for AI coding agents.
Read RELEASE.md โPushing a v*.*.* tag re-runs the test gate against the exact SHA, cross-compiles four targets, smoke-tests each binary, and publishes the Release with auto-generated notes.
VERSION=v0.2.0
TARGET=x86_64-unknown-linux-musl
gh release download "$VERSION" \
--pattern "nebuladns-${VERSION}-${TARGET}.tar.gz" \
--pattern "nebuladns-${VERSION}-${TARGET}.tar.gz.sha256"
shasum -a 256 -c "nebuladns-${VERSION}-${TARGET}.tar.gz.sha256"
# nebuladns-v0.2.0-x86_64-unknown-linux-musl.tar.gz: OK
tar -xzf "nebuladns-${VERSION}-${TARGET}.tar.gz"
./nebuladns-${VERSION}-${TARGET}/nebuladns --print-default-config | head
NebulaDNS is pre-1.0 and moving fast. If the compose rig doesn't spin up on your
machine in under two minutes, or the Helm chart lands a pod that isn't answering
real dig queries within thirty seconds,
that's a bug and we want the trace.