Skip to content

Performance

Throughput Benchmarks:

  • Single CPU core: 10M+ packets/second
  • 4 CPU cores: 40M+ packets/second
  • 10 Gbps line rate: Sustained with <5% CPU usage

Latency Measurements:

  • XDP program execution: <1µs
  • Blocklist lookup (HashMap): 50-100ns
  • Stats increment (PerCpuArray): ~10ns

Kernel (eBPF Maps):

  • Blocklist (10,000 IPs): ~1MB
  • Statistics (per-CPU): 16 bytes × 2 × num_cpus
  • Total: <2MB

Userspace (Rust):

  • Mitigation node: ~50MB Resident Set Size (RSS)
  • Orchestrator: ~80MB RSS (includes Machine Learning models)

Problem: Atomic counters create cache line contention across Central Processing Unit (CPU) cores.

Solution: Each CPU maintains independent counters:

// Kernel: Simple increment, no atomics
let counter = unsafe { STATS.get(0) };
if let Some(c) = counter {
*c += 1; // No lock, no atomic operation
}
// Userspace: Aggregate totals
let total: u64 = per_cpu_values.iter().sum();

Benefit: 10x faster than atomic counters in multi-core benchmarks.

Why not Array?

  • IPv4 address space: 4 billion addresses
  • Array would waste ~4GB of memory
  • HashMap uses ~1MB for 10,000 IPs

Hash Function: Linux kernel SipHash (cryptographically secure, DoS-resistant)

Collision Handling: Open addressing with linear probing

iptables DROP path:

  1. Packet reaches driver → allocate sk_buff (1000+ bytes)
  2. Traverse netfilter hooks (multiple table lookups)
  3. Evaluate iptables rules (linear scan)
  4. Drop packet → free sk_buff

XDP DROP path:

  1. Packet reaches driver
  2. Execute XDP program (single HashMap lookup)
  3. Return XDP_DROP

Savings: ~1,000 CPU cycles per dropped packet

Using pktgen (Linux built-in packet generator):

Terminal window
# Load kernel module
sudo modprobe pktgen
# Configure packet generation
cat > /tmp/pktgen.sh << 'EOF'
echo "add_device eth0" > /proc/net/pktgen/kpktgend_0
echo "dst 192.168.100.100" > /proc/net/pktgen/eth0
echo "dst_mac 00:11:22:33:44:55" > /proc/net/pktgen/eth0
echo "count 10000000" > /proc/net/pktgen/eth0
echo "pkt_size 64" > /proc/net/pktgen/eth0
echo "start" > /proc/net/pktgen/pgctrl
EOF
sudo bash /tmp/pktgen.sh
Terminal window
# Watch real-time statistics
watch -n 1 'curl -s http://localhost:9090/api/v1/stats | jq .'
# Expected output (10M pps):
# {
# "packets_passed": 10000000,
# "packets_dropped": 0
# }
# Check CPU usage
top -p $(pgrep mitigation-node)
# Expected: 3-5% CPU at 1Gbps, 30-40% at 10Gbps
Traffic RateCPU Usage (4 cores)LatencyDrop Rate
1 Gbps (1M pps)3-5%<500ns0%
5 Gbps (5M pps)15-20%<800ns0%
10 Gbps (10M pps)30-40%<1µs0%
40 Gbps (40M pps)95%+<5µs<0.1%

Edit secbeat-ebpf/src/main.rs:

#[map]
static BLOCKLIST: HashMap<u32, u8> =
HashMap::with_max_entries(50000, 0); // Increase from 10k to 50k

Rebuild:

Terminal window
cargo xtask build-ebpf --release
cargo build --release

Pin XDP processing to specific CPU cores:

Terminal window
# Pin to CPUs 0-3 (isolate from other processes)
# Mode is set via config file [platform].mode = "syn"
sudo taskset -c 0-3 ./mitigation-node
# Verify affinity
ps -eLo pid,tid,psr,comm | grep mitigation

Distribute packet processing across multiple CPU cores:

Terminal window
# Check current RSS queue count
ethtool -l eth0
# Expected output:
# Combined: 4
# Increase to 8 queues (if NIC supports)
sudo ethtool -L eth0 combined 8
# Verify
ethtool -S eth0 | grep rx_queue

Optimize for network performance:

Terminal window
# Edit /etc/default/grub
GRUB_CMDLINE_LINUX="isolcpus=0-3 nohz_full=0-3 rcu_nocbs=0-3"
# Update grub
sudo update-grub
sudo reboot

This isolates CPUs 0-3 for XDP processing (no scheduler interrupts).

  • Linux Only: XDP requires Linux kernel 5.15+
  • Driver Support: Not all Network Interface Cards support native XDP
    • Check: ethtool -k eth0 | grep xdp
    • Fallback: Generic XDP (slower, but works on all NICs)
  • Docker-in-Docker: Cannot load XDP (lacks kernel access)
  • Native Docker: Works with --privileged and host network
  • LXC: Full kernel access, recommended for XDP

eBPF verifier enforces limits:

  • Stack size: 512 bytes per program
  • Map size: Kernel memory limits (check ulimit -l)
  • Program complexity: ~1M instructions max
Terminal window
# Record CPU samples during test
sudo perf record -F 99 -p $(pgrep mitigation-node) -g -- sleep 30
# Generate flamegraph
sudo perf script | stackcollapse-perf.pl | flamegraph.pl > flamegraph.svg
# Open in browser
firefox flamegraph.svg
Terminal window
# View program execution stats
sudo bpftool prog show id 42
# Expected output:
# 42: xdp name secbeat_xdp tag a1b2c3d4e5f6g7h8
# loaded_at 2025-11-24T01:00:00+0000
# run_time_ns 1234567890
# run_cnt 10000000

Calculate average latency: run_time_ns / run_cnt = 123ns per packet