Skip to content

feat(mix): cover traffic with constant rate and spam protection toggle#3807

Draft
chaitanyaprem wants to merge 8 commits into
poc/mix-spam-protectionfrom
feat/cover-traffic
Draft

feat(mix): cover traffic with constant rate and spam protection toggle#3807
chaitanyaprem wants to merge 8 commits into
poc/mix-spam-protectionfrom
feat/cover-traffic

Conversation

@chaitanyaprem
Copy link
Copy Markdown
Contributor

@chaitanyaprem chaitanyaprem commented Apr 9, 2026

Summary

  • Implement cover traffic with constant rate as per spec (nim-libp2p ConstantRateCoverTraffic)
  • Add mix-user-message-limit and mix-disable-spam-protection CLI flags for flexible testing
  • Fix option_shims.nim double-evaluation bug that caused UnpackDefect crash in ping — template expanded await expression twice, racing two separate network calls
  • Add check_cover_traffic.sh metrics monitoring script for simulation validation
  • Update simulation README with instructions for running without DoS protection

How to Test

See simulations/mixnet/README.md for full setup and testing instructions.

Note: Running with DoS protection enabled requires better hardware. RLN proof generation is computationally expensive — running 5 nodes locally with proofs enabled can overwhelm a laptop and cause nodes to crash or become unresponsive. For DoS-protected testing, use a machine with sufficient CPU/RAM or reduce the number of nodes.

Validation Report

Cover Traffic Metrics (after ~60s, all 5 nodes stable)

Metric Node 1 Node 2 Node 3 Node 4 Node 5
mix_cover_emitted_total (on_demand) 6 5 5 5 5
mix_cover_received_total 6 2 1 3 3
mix_messages_forwarded (Cover) 6 5 5 5 5
mix_messages_forwarded (Intermediate) 6 9 7 7 9
mix_messages_recvd (Exit) 6 2 1 3 3
mix_messages_recvd (Intermediate) 10 11 7 11 10
SLOT_EXHAUSTED errors 4 2 0 4 1

Key Observations

  1. All 5 nodes emit cover trafficmix_cover_emitted_total increases every epoch
  2. Cover traffic is received backmix_cover_received_total shows cover packets returning to origin nodes via 3-hop mix path
  3. Messages forwarded — both Cover and Intermediate forwarding metrics active across all nodes
  4. SLOT_EXHAUSTED is expected — with 2 slots/epoch, some forwarding slots exhausted when a node receives more traffic than budget
  5. No crashes — all 5 nodes stayed running for full test
  6. No real errors — log "errors" were only startup dial failures (connection refused before all nodes started)
  7. Message sending works — alice published "hello from alice" through mix network successfully

…ption_shims crash

- Add cover traffic support with constant rate as per spec
- Add mix-user-message-limit and mix-disable-spam-protection CLI flags
- Fix option_shims.nim double-evaluation bug causing UnpackDefect crash
  in ping (template expanded await expression twice, racing two calls)
- Reduce default rate limit to 2 msgs/epoch for simulation testing
- Add check_cover_traffic.sh metrics monitoring script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chaitanyaprem chaitanyaprem changed the base branch from dos-protection to poc/mix-spam-protection April 9, 2026 17:50
@chaitanyaprem chaitanyaprem marked this pull request as draft April 9, 2026 18:02
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3807

Built from 34de568

@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Cover Traffic + RLN Spam Protection Simulation Report

Date: 2026-04-10 | Duration: ~10 minutes

System Configuration

Component Value
CPU Apple M2 Pro
Cores 10
Memory 32 GB
Architecture arm64 (macOS Darwin 25.3.0)

Simulation Configuration

Parameter Value
Nodes 5 mix nodes (1 bootstrap + 4)
RLN Spam Protection Enabled (mix-disable-spam-protection=false)
User Message Limit (R) 2 slots per epoch
Epoch Duration (P) 10 seconds
Mix Path Length (L) 3 hops
Cover Traffic Precomputation Disabled (default)
Tree Members 7 (5 mix nodes + 2 chat clients)
Log Level TRACE

System Resource Usage

Metric Value
Memory per node ~55-67 MB RSS
Steady-state CPU (all 5 nodes) 1-5% total
Peak CPU (burst during epoch proof gen) ~142% total
Load average 1.65 - 2.84

CPU usage is bursty — each node generates 1-2 RLN proofs per 10s epoch (~200ms of work), then idles. Spikes are brief and non-disruptive on this hardware.

RLN Proof Generation Times

206 proofs collected across all 5 nodes:

Stat Value
Min 25ms
Max 605ms
Average 112.9ms
Median 71.5ms
P95 340ms

Distribution:

  0-49ms  |  75 | ###########################################################################
 50-99ms  |  49 | #################################################
100-149ms |  35 | ###################################
150-199ms |  14 | ##############
200-299ms |  17 | #################
300-499ms |  13 | #############
500ms+    |   3 | ###

60% of proofs complete under 100ms. Tail latency (300ms+) occurs when multiple nodes generate proofs simultaneously at epoch boundaries (single-threaded chronos async per process).

Cover Traffic Metrics (Final Snapshot)

Node Port Emitted Received Slot Exhausted (fwd) Forwarded (Cover) Forwarded (Intermediate)
0 (bootstrap) 8009 23 11 5 23 56
1 8010 31 24 10 31 52
2 8011 31 27 2 31 44
3 8012 31 29 5 31 50
4 8013 31 26 4 31 49
  • All 5 nodes actively emitting and receiving cover traffic
  • Receive rate ~80-93% of emitted (losses due to slot exhaustion at intermediate hops)
  • Node 0 has fewer emissions (started earlier, different epoch alignment)

Errors Observed

Error Total Count Affected Nodes Notes
SPAM_PROOF_GEN_FAILED (Cover) 8 Node 0 Early epoch proofs before tree sync
SPAM_PROOF_GEN_FAILED (Intermediate) 4 Nodes 0, 2, 3 Rate limit exhausted during forwarding — "Message id (2) is not within user_message_limit (2)"
SLOT_EXHAUSTED (forward) 26 All nodes Expected — R=2 is too low for both cover + forwarding

The SPAM_PROOF_GEN_FAILED errors during forwarding occur because the R-budget (2 slots) is shared across cover origination AND forwarding. When a node uses both slots for its own cover traffic, it has none left to generate proofs for forwarded packets. This is expected with R=2; higher limits (e.g., R=10+) would provide headroom for both.

Conclusion

Cover traffic with RLN spam protection works correctly on consumer hardware (M2 Pro). Resource usage is manageable with R=2 (~5% steady CPU for 5 nodes). The main limitation is the low rate limit causing slot exhaustion during forwarding — this is a configuration constraint, not a bug.

@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Addendum: Rate Limit Analysis & Recommended Configuration

The SPAM_PROOF_GEN_FAILED and SLOT_EXHAUSTED errors observed in the test above are caused by using a non-clean rate limit value.

Root Cause:

With R=2 (user message limit) and L=3 (path length):

  • Emission interval = (1+L) × P / R = 4 × 10 / 2 = 20s
  • Target cover packets per epoch = R / (1+L) = 2 / 4 = 0.5 (not an integer)

Because 0.5 is fractional, epoch boundary jitter causes some epochs to see 2 cover emissions and others 0. When 2 land in the same epoch, all slots are consumed by cover traffic, leaving none for forwarding — triggering proof generation failures at intermediate hops.

Fix:

R must be a multiple of (1+L) = 4 for clean integer cover emissions per epoch:

R Cover/epoch Interval Slots for forwarding Proofs/min/node
4 1 10s 3 ~6
8 2 5s 6 ~12
12 3 3.3s 9 ~18

Recommended: R=4 — exactly 1 cover packet per epoch, 3 remaining slots for forwarding, no fractional math or boundary jitter issues. This is also less CPU load than R=2 in practice (fewer spurious double-emissions).

The config files (config.toml through config4.toml) and setup_credentials.nim should be updated from mix-user-message-limit=2 to mix-user-message-limit=4 before final merge.

Defensive nil check for mixRlnSpamProtection when spam protection
is disabled, preventing potential crash if the guard in start()
is ever bypassed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Cover Traffic Precomputation + RLN Spam Protection Simulation Report

Date: 2026-04-10 | Duration: ~4 minutes | Config change: enablePrecomputation=true in ConstantRateCoverTraffic.new()

System Configuration

Component Value
CPU Apple M2 Pro (10 cores)
Memory 32 GB
Architecture arm64 (macOS)

Simulation Configuration

Parameter Value
Nodes 5 mix nodes
RLN Spam Protection Enabled
User Message Limit (R) 2 slots per epoch
Epoch Duration (P) 10 seconds
Mix Path Length (L) 3 hops
Cover Traffic Precomputation Enabled

Key Finding: Precomputation Produces 0 Packets with R=2

With enablePrecomputation=true, the precompute loop runs but builds zero packets because:

targetCount = totalSlots / (1 + pathLength) = 2 / 4 = 0 (integer division)

All cover traffic falls through to on-demand building. The mix_cover_precomputed_total metric remained at 0.0 on all nodes and mix_cover_emitted{type="prebuilt"} was never incremented. All emissions were {type="on_demand"}.

For precomputation to be effective, R must be ≥ 4 (so R / (1+L) ≥ 1).

System Resource Usage

Metric Value
Memory per node ~70-75 MB RSS
Steady-state CPU (all 5 nodes) 0.4-6.7% total
Load average 1.54 - 2.37

No measurable difference from the non-precomputation run — expected since precomputation built 0 packets.

RLN Proof Generation Times (124 proofs)

Stat Value
Min 24ms
Max 590ms
Average 116.0ms
Median 79ms
P95 322ms

Distribution:

  0-49ms  |  42 | ##########################################
 50-99ms  |  32 | ################################
100-149ms |  16 | ################
150-199ms |  14 | ##############
200-299ms |  12 | ############
300-499ms |   6 | ######
500ms+    |   2 | ##

Consistent with the previous run (avg 112.9ms vs 116.0ms).

Cover Traffic Metrics (Final — 240s)

Node Prebuilt On-Demand Received Precomputed Slot Exhausted Proof Fail (Intermediate)
0 0 16 16 0 0 1
1 0 16 14 0 1 1
2 0 15 13 0 3 0
3 0 16 12 0 3 0
4 0 15 8 0 5 1

Conclusion

Precomputation with R=2 is a no-op due to integer division yielding 0 target packets. The precompute loop activates but produces nothing, falling back entirely to on-demand packet building. To test precomputation meaningfully, R should be set to ≥4 (yielding ≥1 precomputed packet per epoch). All other behavior (emission, forwarding, slot exhaustion, proof generation) is consistent with the non-precomputation run.

@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Proof Token Reuse Validation — R=4 with Precomputation + RLN

Date: 2026-04-10 | Duration: ~3 minutes | Feature: Opaque proof token reuse for discarded precomputed cover packets

What Changed

When a precomputed cover packet is discarded (because a forwarded/originated packet claims its slot), the RLN messageId used for that proof is now reclaimed and reused instead of being wasted. This prevents premature rate limit exhaustion.

Implementation uses opaque proof tokens — SpamProtection.generateProof() returns a ProofResult(proof, token). The token is stored in the CoverPacket and returned via reclaimProofToken() when the packet is discarded. The RLN plugin decodes the token internally, recovering the messageId for reuse.

Results: Zero Proof Failures ✅

Node Prebuilt On-Demand Received Precomputed Exhausted Proof Fail (Cover) Proof Fail (Fwd) Reclaimed Tokens
0 11 8 17 20 2 0 0 9
1 13 7 17 20 2 0 0 7
2 9 10 17 20 3 0 0 11
3 12 7 17 20 1 0 0 8
4 12 8 20 20 1 0 0 8

43 total proof tokens reclaimed and reused across all nodes. Zero SPAM_PROOF_GEN_FAILED errors. Zero "Message id not within user_message_limit" errors.

Comparison: Before vs After Proof Token Reuse

Metric R=4 Without Reuse R=4 With Reuse
Proof failures (forwarding) 4 0
Slot exhaustions 5 9 (more forwarding now succeeds)
CPU (steady state) 18.5% 6.1%
Proof gen avg 117ms 107ms
Reclaimed tokens N/A 43

MessageId Reuse Trace (Node 1)

22:08:20 generateProof messageId=0 reused=true   ← reusing freed ID
22:08:20 generateProof messageId=1 reused=true
22:08:21 Reclaimed proof token messageId=1        ← cover packet discarded, ID returned
22:08:21 generateProof messageId=1 reused=true    ← immediately reused for forwarded packet
22:08:21 generateProof messageId=2 reused=true
22:08:30 generateProof messageId=0 reused=true    ← new epoch, IDs reset

System Resources

Metric Value
Memory per node ~56-59 MB RSS
Steady-state CPU 6.1% total
Load average 2.60 - 3.24

Configuration

Parameter Value
Nodes 5
RLN Spam Protection Enabled
R (user message limit) 4
Epoch Duration 10s
Path Length 3
Precomputation Enabled (same-epoch)
Proof Token Reuse Enabled

- nim-libp2p: adds ProofResult, reclaimProofToken, same-epoch precomp
- mix-rln-spam-protection-plugin: implements messageId reuse from
  discarded cover packets

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Cover Rate Fraction — Simulation Comparison (f=0.7 vs f=1.0)

Date: 2026-04-23 | Feature: cover_rate_fraction config (default f = 0.7, per spec §7)

Tracks upstream PR vacp2p/nim-libp2p#2322. Vendor bump in this branch: a9d5f3ff.

Setup (identical for both runs)

Parameter Value
Nodes 5 (1 bootstrap + 4)
R (mix-user-message-limit) 4
P (epoch) 10s
L (path length) 3
RLN spam protection Enabled
Precomputation Disabled
Duration ~26 epochs (~260s)

Per-node totals over the run

Metric Node 0 Node 1 Node 2 Node 3 Node 4
f = 0.7 (new default)
Cover emitted 13 13 13 13 13
Intermediate forwarded 26 28 32 20 24
Slot-rejected (forward) 0 0 3 0 0
SLOT_EXHAUSTED 0 0 0 0 0
SPAM_PROOF_GEN_FAILED 0 0 0 0 0
f = 1.0 (prior default, for comparison)
Cover emitted 27 27 27 27 27
Intermediate forwarded 52 55 54 50 46
Slot-rejected (forward) 0 3 3 0 0
SLOT_EXHAUSTED 0 3 3 0 0
SPAM_PROOF_GEN_FAILED 1 2 2 0 0

Per-node per-epoch averages

Metric (avg/epoch/node) f = 0.7 f = 1.0 Δ
Cover emitted 0.5 1.04 ≈ ½
Intermediate forwarded ~1.0 ~2.0 ≈ ½
Total outgoing ~1.5 ~3.0 ≈ ½
% of R=4 budget used ~37% ~75%
Emission interval (logged) 20s 10s

Observations

  1. ~50% less traffic per node per epoch under f=0.7. floor(0.7 × 4) = 2 halves the scaled-slot count at R=4, doubling emission interval from 10s → 20s. Intermediate forwarding also halves because neighbors emit half as much cover.

  2. Budget utilization drops 75% → 37%. Meaningful headroom gain — in f=1.0 at R=4, nodes ran near capacity; under f=0.7 they're under half-used.

  3. All error classes vanish under f=0.7. The SPAM_PROOF_GEN_FAILED (5 total, 3 nodes) and SLOT_EXHAUSTED (6 total, 2 nodes) errors from the f=1.0 run disappear entirely — exactly the effect spec §10.5 predicts for the cover_rate_fraction headroom.

  4. Some slot-rejected-forwards still occur at f=0.7 (3 on one node). Natural traffic burstiness briefly exceeding the remaining 2 slots, not cover starving forwards.

  5. Caveat — R=4 floor rounding amplifies the effect. At R=4, floor(0.7 × 4) = 2, so f=0.7 effectively behaves as f=0.5. At larger R (e.g., R=100) the reduction would be a cleaner ~30%. Prod deployments with bigger R won't see the 50% drop — they'll see the nominal 30%.

Vs. prior proof-token-reuse report on this PR

Prior run (R=4, f=1.0 implicit, precomputation enabled, 3 min, comment): ~1 emission/epoch/node, zero proof failures (thanks to token reuse).

Current f=0.7 (precomputation disabled, 4.3 min): 0.5 emissions/epoch/node, zero proof failures. Qualitative behavior (loop-back healthy, no failures) preserved; quantitative drop of ~50% per-node traffic as expected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- rename ExtendedKademliaDiscoveryParams -> ExtendedServiceDiscoveryParams
- switch sink from textlines[file] to textlines to work around a chronicles
  compile-time macro-eval bug under Nim 2.2.4
Copy link
Copy Markdown
Contributor

@jm-clius jm-clius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress. Feel free to raise a review once this is ready for merging. One concern is complexity in configuration - if we have restrictions on certain configuration items (e.g. user message rate limit should be a multiple of 4), we should refactor our configuration to prevent misconfiguration.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. Any reason why these functions were removed? Shouldn't we adapt to whatever is considered the most idiomatic, or use a different library?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to mention that this was. temporary hack that was done to avoid changes to be made in rest of logos delivery code.
I think the original issue came from libp2p migrating to a newer version which was causing issues in code no related to changes in this PR.

This file should be removed before merging as i assume this would get fixed by some other PR in master.

DefaultUserMessageLimit = 100'u64 # Network-wide default rate limit
SpammerUserMessageLimit = 3'u64 # Lower limit for spammer testing
DefaultUserMessageLimit = 4'u64 # R=4 slots per 10s epoch
SpammerUserMessageLimit = 3'u64 # Higher limit for spammer testing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truly a nitpick, but I'm trying to parse the configuration here and elsewhere, and I don't understand the change of "Lower" to "Higher" here. 😅

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this was something i added initially to simulate a spammer but later realized that there is no way to simulate it as zerokit itself seems to have a check once a user crosses his own message limit, it doesnt generate proofs. So, right now this is more of useless code, will try to remove it.

if userMessageLimit.isSome():
spamProtectionConfig.userMessageLimit = userMessageLimit.get()
# rlnResourcesPath left empty to use bundled resources (via "tree_height_/" placeholder)
let totalSlots = userMessageLimit.get(2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use the magic number 2 here and not the default configuration (besides, IIUC your comments previously - this will result in fractional cover-messages-per-epoch)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, this is not right.
You are right and we should use default config. Will fix it.

@chaitanyaprem
Copy link
Copy Markdown
Contributor Author

Good progress. Feel free to raise a review once this is ready for merging. One concern is complexity in configuration - if we have restrictions on certain configuration items (e.g. user message rate limit should be a multiple of 4), we should refactor our configuration to prevent misconfiguration.

ah, right...but that is just for simulation env. Once we migrate this to onchain registry, this config would not be required or rather will be fetched from there itself.
Another reason i had to set the config such as mix-user-message-limit is when we run simulation locally, we cant have full rate limit of 100msgs/sec as system can't handle soo much RLN proof generation especially in a 5 node simulation.
Not sure if there is a way around this rather than labelling it advanced config only useful for simulation. Similarly to test by disabling spam protection, i added the config.
Ideally these would be deployment level configs and only defaults would be used i guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants