[WIP] ETW receiver with control path and session#2930
Conversation
a67d25d to
ce46830
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2930 +/- ##
==========================================
- Coverage 86.03% 86.02% -0.01%
==========================================
Files 720 720
Lines 273264 273264
==========================================
- Hits 235095 235080 -15
- Misses 37645 37660 +15
Partials 524 524
🚀 New features to boost your workflow:
|
| *ext.keyword_mut() = keywords.unwrap_or(0); | ||
| } | ||
|
|
||
| wide_event.add_callback({ |
There was a problem hiding this comment.
It looks like you have changed the callback method registration code in the last commit. I'm okay with the refactor from set_raw_event_callback to per-provider wide_event registrations.
However, these changes replaced the single round-robin counter with one counter per provider, which could underutilize cores in startup. Each callback closure now captures its own local_next. In long-run steady state both schemes give uniform per-core load. The regression is at startup: every provider's counter is 0, so the first event of every provider piles on core 0, the second on core 1, and so on. With M providers all stepping in lock-step from 0, the early cores are oversubscribed and the late cores stay idle until each provider's counter has walked past them.
Example: 4 cores, 2 providers each emitting 3 events (A1, B1, A2, B2, A3, B3):
per-provider counters: shared counter:
A1 -> A=0 -> core 0 A1 -> 0 -> core 0
B1 -> B=0 -> core 0 B1 -> 1 -> core 1
A2 -> A=1 -> core 1 A2 -> 2 -> core 2
B2 -> B=1 -> core 1 B2 -> 3 -> core 3
A3 -> A=2 -> core 2 A3 -> 4 -> core 0
B3 -> B=2 -> core 2 B3 -> 5 -> core 1
result: 2, 2, 2, 0 result: 2, 2, 1, 1
Same pattern recurs any time multiple providers' counters happen to be equal mod num_txs and they all burst together — most reliably at session startup, when every counter is exactly 0. The original set_raw_event_callback design had one shared counter so the distribution was uniform.
We can fix this by sharing the counter across callbacks. one_collect requires one Event registration per provider, so we can't collapse to a single closure. We can share the counter (and the Vec<Sender>) across closures via Rc<Cell<usize>> / Rc<Vec<...>>. All callbacks dispatch on the single ProcessTrace thread, so Cell is safe — no atomics, no locking.
use std::cell::Cell;
use std::rc::Rc;
let next: Rc<Cell<usize>> = Rc::new(Cell::new(0));
let txs: Rc<Vec<mpsc::Sender<EtwEventData>>> = Rc::new(txs);
for (guid, level, keywords) in &resolved_providers {
let mut wide_event = one_collect::event::Event::new(0, "otap_wide".into());
{
let ext = wide_event.extension_mut();
*ext.provider_mut() = *guid;
*ext.level_mut() = *level;
*ext.keyword_mut() = keywords.unwrap_or(0);
}
let ancillary = ancillary.clone();
let next = Rc::clone(&next);
let txs = Rc::clone(&txs);
wide_event.add_callback(move |_event_data| {
let anc = ancillary.borrow();
let data = EtwEventData {
provider_id: anc.provider().to_bytes(),
timestamp: anc.time(),
process_id: anc.pid(),
thread_id: anc.tid(),
// TODO: populate event_id/opcode/level/keywords/version once
// WindowsEventExtension exposes EVENT_DESCRIPTOR.
event_id: 0, opcode: 0, version: 0, level: 0, keywords: 0,
};
drop(anc);
let i = next.get();
next.set(i.wrapping_add(1));
let _ = txs[i % txs.len()].try_send(data);
Ok(())
});
session.add_event(wide_event, None);
}Also collapses N Vec<Sender> clones to N refcount handles, as a side benefit.
| //! | ||
| //! Instead of using the low-level `set_raw_event_callback` (which bypasses | ||
| //! `one_collect`'s event routing), we use the **provider-wide event** mechanism | ||
| //! via [`EtwSession::add_wide_event`]. This registers a catch-all handler for |
There was a problem hiding this comment.
This API is not being called in the code.
Change Summary
ETW receiver initial PR
What issue does this PR close?
How are these changes tested?
Validation using cargo check, run etc.
Local validation with ETW console yaml
cargo run --features etw-receiver -- -c configs/etw-console.yamlAre there any user-facing changes?
N/A