Skip to content

Performance Overview

OpenShield-XDP is designed for high-throughput edge filtering with sub-microsecond per-packet overhead. This page documents the performance targets, measurement methodology, and design decisions behind the numbers.

Design Targets

MetricTargetNotes
Normal path latency (passthrough)~300–500 nsPacket passes all checks, no mitigation triggered
Attack path latency (drop)~1–2 µsFull pipeline with ban insertion + event emission
Single-core throughput at 10M PPS~50–70% CPU utilizationOn modern Xeon/EPYC cores
Bloom filter savings~60–100 ns/packetSkipped whitelist HASH lookup when filter negative
Per-IP stats update~50–80 nsLRU_HASH lookup + increment + BPF_ANY update
Ban insertion~200–300 nsLRU_HASH insert + ringbuf event emission
Config read~10–20 nsARRAY map with single entry — always hot in L1 cache

Measurement Methodology

All latency figures are estimated from a combination of:

  • bpftool prog profile instruction counts × CPU cycle estimates
  • Direct measurement via bpf_ktime_get_ns() instrumentation (for non-production builds)
  • Synthetic benchmarking with pktgen / hping3 at line rate

Measurement Caveats

Actual latency depends on CPU model, kernel version, Spectre/Meltdown mitigations, and NIC driver. The figures above are for a kernel 6.6+ system with mitigations=off.

Pipeline Cost Breakdown

The normal path spends most of its time in rate scoring and connection tracking — both involve LRU map lookups, which are the most expensive per-packet operation.

Map Operation Costs

BPF map type strongly influences latency:

Map TypeLookup CostUpdate CostNotes
ARRAY~10 ns~10 nsDirect indexed access, no hashing. Used for config, baseline, prof.
PERCPU_ARRAY~15 ns~15 nsPer-CPU sub-arrays, no lock. Used for global_stats, panic_bucket, prof.
HASH (small)~50–80 ns~80–120 nsHash computation + collision chain walk. Used for whitelist.
LRU_HASH (large)~80–150 ns~150–300 nsLike HASH + LRU list maintenance. Used for ip_stats, ban.
LPM_TRIE~100–200 ns~200–400 nsPrefix matching. Used for subnet bans.
RINGBUF (reserve+submit)N/A~200–500 nsMemory reservation + commit. Used for events.

Key insight: ARRAY maps are ~10× cheaper than LRU_HASH maps. This is why config_map and baseline_map use ARRAY — they're accessed on every packet.

Bloom Filter Impact

The Bloom filter accelerates whitelist-negative packets:

  • Without Bloom: Every packet does a HASH lookup (50–80 ns), even if the whitelist is empty
  • With Bloom (empty whitelist): Bloom reads bloom_map[idx] (ARRAY, ~10 ns), finds 0 → ~10 ns total, saves 40–70 ns
  • With Bloom (populated, non-whitelisted IP): Bloom returns negative in ~15 ns → ~40–65 ns saved
  • False positive cost: Bloom positive → full HASH lookup (same as without Bloom) + ~15 ns Bloom overhead

Single-Core Behavior

At 10 million packets per second on a single core:

ComponentCPU Time% of Core
Packet processing (XDP)~300 ns/pkt30%
Interrupt handling (NAPI)~100 ns/pkt10%
NIC driver overhead~80 ns/pkt8%
Kernel stack (for XDP_PASS)~60 ns/pkt6%
Total~540 ns/pkt~54%

At 10M PPS, a single core handles all packets with ~50–70% utilization. Multi-core RSS (Receive Side Scaling) distributes the load across cores for higher rates.

Scaling Factors

FactorImpact
RSS queue countLinear scaling: 4 queues = ~4× throughput
XDP modenative ≈ 2× faster than generic; offload is NIC-dependent
Whitelist sizeMinimal impact (HASH is O(1) average)
IP stats map sizeLRU eviction overhead increases slightly with size (log factor)
Number of active IPsBan map growth affects LRU maintenance; 50K entries ≈ 10% overhead
Feature togglesEach disabled boolean saves ~1 if branch prediction (negligible)