feat: add Lean Ethereum consensus pipeline by ilitteri · Pull Request #19 · lambdaclass/ethereum-package

ilitteri · 2026-05-13T22:13:30Z

Summary

Add Lean Ethereum consensus clients as cl_type: values inside the standard participants: block. Operators pick a Lean client the same way they pick lighthouse/teku/prysm/...; if the Lean client supports Engine API (today only ethlambda does, via lambdaclass/ethlambda#367) it can be paired with any EL — otherwise set el_type: none and the package skips the Eth1 side for that participant.

This replaces the original lean_participants: top-level block; that schema is removed and the canonical args files are migrated.

Wired clients (8 total):

`cl_type:`	Default image	EL pairing today
`ethlambda`	`ghcr.io/lambdaclass/ethlambda:latest`	✅ via Engine API (lambdaclass/ethlambda#367)
`ream`	`ghcr.io/reamlabs/ream:latest`	❌ no Engine API yet — use `el_type: none`
`zeam`	`blockblaz/zeam:latest`	❌ no Engine API yet
`qlean`	`qdrvm/qlean-mini:latest`	❌ no Engine API yet
`lantern`	`piertwo/lantern:latest`	❌ no Engine API yet
`gean`	`ghcr.io/geanlabs/gean:latest`	❌ no Engine API yet
`lean_grandine`	`sifrai/lean:latest`	❌ no Engine API yet
`lean_lighthouse`	`hopinheimer/lighthouse:latest`	❌ no Engine API yet (devnet3 single-key layout — won't peer with devnet4 clients until a dual-key image ships)

The lean_grandine / lean_lighthouse prefixes disambiguate from the standard CL Grandine / Lighthouse types of the same name (different binaries from different repos).

Plus a Prometheus + Grafana stack (mirrors lean-quickstart's docker-compose-metrics.yaml) that scrapes every Lean node's /metrics and serves the upstream Lean client dashboard, host-published on :3000 / :9090.

How to run — Lean-only localnet (all 8 clients)

Save the args file below to e.g. lean-only.yaml:

# All-Lean devnet4 localnet. Every participant has `el_type: none`
# so the package skips the Eth1 EL/CL pipeline for it. lean_lighthouse
# is omitted because the published `hopinheimer/lighthouse:latest`
# image is still on the single-key (devnet3) GENESIS_VALIDATORS layout
# and won't reach consensus with the devnet4 nodes. Add it back when
# a dual-key image ships.
participants:
  - el_type: none
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet4
    count: 1
    validator_count: 1
    is_aggregator: true
  - el_type: none
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet4
    count: 1
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: ream
    cl_image: ghcr.io/reamlabs/ream:latest-devnet4
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: zeam
    cl_image: blockblaz/zeam:devnet4
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: qlean
    cl_image: qdrvm/qlean-mini:devnet-4-amd64
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: lantern
    cl_image: piertwo/lantern:v0.0.4
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: lean_grandine
    cl_image: sifrai/lean:devnet-4
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: none
    cl_type: gean
    cl_image: ghcr.io/geanlabs/gean:devnet4
    count: 2
    validator_count: 1
    is_aggregator: false

lean_network_params:
  genesis_delay: 180
  active_epoch: 18
  attestation_committee_count: 1

additional_services: []

This is the file checked in at .github/tests/lean-devnet4.yaml.

Run with Kurtosis 1.18.2+:

kurtosis run --enclave lean \
  github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
  --args-file lean-only.yaml

The package boots:
- 14 Lean nodes (2× each: ethlambda, ream, zeam, qlean, lantern, lean_grandine, gean), full-mesh peering
- 1 Prometheus scraping every node + itself
- 1 Grafana with anonymous admin + the upstream client dashboard pre-loaded

Access:

# All ports (host-published):
kurtosis enclave inspect lean

# Direct URLs (host's hostname, bound on 0.0.0.0):
# Grafana:    http://<host>:3000/d/lean-ethereum-clients-dashboard/lean-ethereum-clients-dashboard
# Prometheus: http://<host>:9090

# Service logs (any client):
kurtosis service logs lean lean-ethlambda-0 -f

Smaller setups: drop nodes you don't need from participants:. A 2-node ethlambda smoke test runs in ~3 min (.github/tests/lean-smoke.yaml); the 14-node setup above takes ~10–12 min including hash-sig keygen.
Teardown:
```
kurtosis enclave rm -f lean
```

How to run — 2× ethrex + ethlambda paired localnet

Two ethrex ELs each paired with one ethlambda via the Engine API. One ethlambda is the aggregator, the other isn't.

Save the args file below to e.g. ethlambda-el-pair-2node.yaml:

participants:
  - el_type: ethrex
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 1
    validator_count: 1
    is_aggregator: true
  - el_type: ethrex
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 1
    validator_count: 1
    is_aggregator: false

additional_services: []

This is the file checked in at .github/tests/ethlambda-el-pair-2node.yaml. A single-pair variant lives at .github/tests/ethlambda-el-pair.yaml.

(optional) Build an ethlambda image for local testing, instead of using the published devnet6 image:

git clone --branch engine-api-integration https://github.com/lambdaclass/ethlambda
cd ethlambda
docker build -t ghcr.io/lambdaclass/ethlambda:devnet6 .

Run with Kurtosis 1.18.2+:

kurtosis run --enclave ethlambda-pair-2node \
  github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
  --args-file ethlambda-el-pair-2node.yaml

The package boots:
- 2× ethrex EL services (el-1-ethrex-ethlambda, el-2-ethrex-ethlambda)
- 2× ethlambda CL services (lean-ethlambda-0 = aggregator, lean-ethlambda-1 = peer)
- 1× Prometheus + 1× Grafana scraping every Lean node

Verify the chain is advancing on both pairs:

# Lean head_slot from both ethlambdas:
p0=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-0 metrics | sed 's|.*:||')
p1=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-1 metrics | sed 's|.*:||')
curl -s "http://127.0.0.1:${p0}/metrics" | grep '^lean_head_slot '
curl -s "http://127.0.0.1:${p1}/metrics" | grep '^lean_head_slot '

# Justification / finalization from ethlambda_0's fork choice:
api=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-0 http | sed 's|.*:||')
curl -s "http://127.0.0.1:${api}/lean/v0/fork_choice" \
  | jq '{justified:.justified.slot, finalized:.finalized.slot, validator_count:.validator_count}'

# Both ethrex tips should be at the same block hash:
rpc1=$(kurtosis port print ethlambda-pair-2node el-1-ethrex-ethlambda rpc | sed 's|.*:||')
rpc2=$(kurtosis port print ethlambda-pair-2node el-2-ethrex-ethlambda rpc | sed 's|.*:||')
for rpc in $rpc1 $rpc2; do
  curl -s "http://127.0.0.1:${rpc}" -X POST -H 'Content-Type: application/json' \
    -d '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","id":1,"params":["latest",false]}' \
    | jq -c '{block:.result.number, hash:.result.hash}'
done

Expected (after ~30 s): both lean_head_slot values match and advance by 1 every 4 s; justified/finalized advance slot-by-slot; both ethrex latest blocks report identical hashes.

Teardown:

kurtosis enclave rm -f ethlambda-pair-2node

How to run — cross-client EL × ethlambda experiment (6 ELs × 2 each)

12 EL+ethlambda pairs, one ethlambda aggregator. Exercises the Engine API surface across every major EL client: ethrex, nethermind, geth, erigon, nimbus-eth1, besu.

Save the args file below to e.g. ethlambda-el-all-clients.yaml:

participants:
  - el_type: ethrex
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 1
    validator_count: 1
    is_aggregator: true
  - el_type: ethrex
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 1
    validator_count: 1
    is_aggregator: false
  - el_type: nethermind
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: geth
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: erigon
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: nimbus
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 2
    validator_count: 1
    is_aggregator: false
  - el_type: besu
    cl_type: ethlambda
    cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
    count: 2
    validator_count: 1
    is_aggregator: false

lean_network_params:
  # 180s lets the 12 EL containers fully warm up before the Lean
  # chain starts. With the 60s default, several proposers missed
  # their slot during EL JIT/cache warm-up, and the resulting gaps
  # broke 3SF-mini's `delta ≤ 5` finalization rule.
  genesis_delay: 180

additional_services: []

This is the file checked in at .github/tests/ethlambda-el-all-clients.yaml.

Build the ethlambda image from feat: integrate with ethrex over the Engine API ethlambda#367 (see the previous section).

Run with Kurtosis 1.18.2+:

kurtosis run --enclave ethlambda-all-els \
  github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
  --args-file ethlambda-el-all-clients.yaml

The package boots in ~6 min:
- 12 EL services (2× of ethrex, nethermind, geth, erigon, nimbus-eth1, besu)
- 12 ethlambda CL services (lean-ethlambda-0 = aggregator, 1..11 = peers)
- 1× Prometheus + 1× Grafana

Verify all 12 ethlambdas are in sync and finalization is advancing:

# All 12 heads match
for n in 0 1 2 3 4 5 6 7 8 9 10 11; do
  port=$(kurtosis port print ethlambda-all-els "lean-ethlambda-${n}" metrics | sed 's|.*:||')
  head=$(curl -s "http://127.0.0.1:${port}/metrics" | grep '^lean_head_slot ' | awk '{print $2}')
  echo "ethlambda_${n} head=${head}"
done

# Justified/finalized advancing
api=$(kurtosis port print ethlambda-all-els lean-ethlambda-0 http | sed 's|.*:||')
curl -s "http://127.0.0.1:${api}/lean/v0/fork_choice" \
  | jq '{justified:.justified.slot, finalized:.finalized.slot, validator_count:.validator_count}'

Expected: every ethlambda at the same head_slot, justified = head - 1, finalized = head - 2, validator_count = 12. All 12 EL chains will report identical block hashes because the Lean libp2p mesh picks a single canonical chain.

Teardown:

kurtosis enclave rm -f ethlambda-all-els

Note on additional_services: [dora] (and similar Eth1 beacon-API consumers): they currently can't run alongside a Lean-only CL set because Lean clients don't expose an Eth1 Beacon API. ethlambda Beacon-API compatibility stubs are tracked separately.

Note on reth / ethereumjs: omitted from this experiment. ethereumjs fails the package's 2-minute TCP port-check on its WS endpoint and rolls back the entire EL launch batch; reth may or may not work — untested in this configuration.

What's in the box

Path	Purpose
`src/package_io/constants.star`	Adds 8 Lean cl_types to `CL_TYPE` and the `LEAN_CL_TYPES` set the dispatcher checks
`src/package_io/input_parser.star`	Lean defaults in `DEFAULT_CL_IMAGES`, `is_aggregator` participant field, relaxed Eth1 bootnode + Fulu/PeerDAS guards for Lean-only networks
`src/cl/cl_launcher.star`	Skips Lean cl_types in the standard dispatcher (None cl_context)
`src/participant_network.star`	Skips VC / snooper / metrics-exporter for Lean participants
`src/shared_utils/shared_utils.star`	`get_client_names` falls back to cl_type when cl_context is None
`src/dora/dora_launcher.star`	Skips Lean participants when building beacon-endpoint list (dora itself still requires ≥1 Eth1 beacon endpoint to start)
`main.star`	Builds Lean records from `participants:` with a Lean cl_type and hands them to the Lean launcher
`src/lean/lean_launcher.star`	3-phase orchestrator (P2P keys → genesis → start) with optional EL pairing (queries EL genesis hash, passes jwt_file through)
`src/lean/ethlambda/ethlambda_launcher.star`	Per-client CLI translation; accepts optional EL params and stages JWT into the container
`src/lean/{ream,zeam,qlean,lantern,grandine,lighthouse,gean}/<client>_launcher.star`	Other Lean clients (no EL pairing today)
`src/lean/metrics/metrics_launcher.star`	Prometheus + Grafana stack with the upstream Lean client dashboard pre-loaded
`src/prelaunch_data_generator/lean_genesis/`	hash-sig-cli + eth-beacon-genesis leanchain + Python post-process
`src/el/ethrex/ethrex_launcher.star`	Enables the `admin` HTTP API namespace so `admin_nodeInfo` works (was hanging)
`.github/tests/lean-devnet4.yaml`	14-node Lean-only canonical args
`.github/tests/lean-smoke.yaml`	2-node ethlambda smoke test (Lean-only)
`.github/tests/ethlambda-el-pair.yaml`	Single EL+ethlambda smoke test
`.github/tests/ethlambda-el-pair-2node.yaml`	2× EL+ethlambda pairs (one aggregator + one not)
`.github/tests/ethlambda-el-all-clients.yaml`	6 ELs × 2 each paired with ethlambda (12 pairs, one aggregator)
`docs/architecture.md`	"Lean Ethereum participants" section rewritten for the cl_type-based shape

Validation

12-pair cross-EL devnet (ethlambda-el-all-clients.yaml) — ethrex, nethermind, geth, erigon, nimbus-eth1, besu × 2 each paired with ethlambda. All 12 ethlambdas finalize slot-by-slot via 3SF-mini's delta ≤ 5 rule; every EL chain converges on identical block hashes. Local Kurtosis 1.18.2 on macOS / arm64 and Debian 13 / amd64 (ethrex-office-4).
2-pair ethrex+ethlambda (ethlambda-el-pair-2node.yaml) — finalization advancing on both pairs, ethrex tips identical.
Single ethrex+ethlambda pair (ethlambda-el-pair.yaml) — ethrex's per-slot log (Fork choice updated includes payload attributes / Requested payload with id=...) confirms the Engine API loop.
14-node Lean-only (lean-devnet4.yaml) — all peers connect cross-client, chain advances; finalization stalls because the single aggregator's leanVM XMSS proof exceeds the 750 ms slot-aggregation deadline at 14-validator scale (knob lives in ethlambda; see "Known limitations"). On ethrex-office-3 (Debian 13 / amd64).

Architectural notes

EL pairing only on ethlambda today: the other 7 Lean clients don't implement Engine API yet. The convention is el_type: none for non-ethlambda Lean cl_types.
First-participant guard: EL_TYPE.none on the first participant is allowed when every participant has a Lean cl_type (no Eth1 bootnode mesh exists). Mixed networks (some Lean, some standard Eth1 CL) still need a non-none first participant.
EL genesis block hash: read from the EL via eth_getBlockByNumber 0x0 after the EL is up. Used to seed ethlambda's state.latest_execution_payload_header.block_hash so the very first FCU carries a head the EL recognizes.
Process detachment: Lean client binaries are launched via setsid -f (not nohup ... &) so Kurtosis exec returns immediately — the Lean clients run dash, which doesn't understand disown.
genesis_delay tuning: 60s is enough for a 2–4 node devnet; 180s is recommended for ≥10 EL containers booting in parallel, otherwise some ELs miss their first engine_getPayloadV5 deadline and the resulting empty slots break 3SF-mini's delta ≤ 5 finalization rule.
YAML 1.1 hex int trap: PyYAML parses unquoted 0x... tokens as ints, dropping leading zeros from XMSS pubkey hex. The Lean genesis post-process loader uses yaml.BaseLoader to keep them as strings.
zeam scratch image: no /bin/sh — handled by injecting a static busybox binary as a Kurtosis files artifact.
lean_lighthouse caveat: the published hopinheimer/lighthouse:latest rejects devnet4's dual-key GENESIS_VALIDATORS layout. Launcher is wired in, but the client won't reach consensus until upstream ships a dual-key image.

Known limitations

dora doesn't work on Lean-only networks — it requires ≥1 Eth1 beacon endpoint to start, and Lean clients don't expose one. ethlambda Beacon API compatibility stubs would unblock dora (and assertoor, forky, …); tracked separately.
14-node Lean-only finalization stalls with the current single-aggregator + 750ms slot-aggregation deadline. The aggregator processes ~4 sigs/slot in the 750 ms window vs the 10–14 it would need for justification; the deadline is hardcoded in ethlambda. Cross-client lag (some Lean clients fall behind the head) compounds the issue.
ethereumjs is excluded from the all-ELs args file: its container fails the package's 2-minute TCP port-check on the WS endpoint, which rolls back the entire EL launch batch.

Adding a new Lean client

The contract for adding a new Lean client lives in the module docstrings (same convention the existing src/cl/* launchers follow). Touch points:

Add an entry to LEAN_TYPE in src/package_io/constants.star, add the same value to CL_TYPE and LEAN_CL_TYPES, plus a default :latest image in DEFAULT_CL_IMAGES / DEFAULT_CL_IMAGES_MINIMAL.
Create src/lean/<client>/<client>_launcher.star exporting initialize(plan, node, p2p_keys_artifact, hash_sig_artifact) and start(plan, node, service, genesis_artifact, hash_sig_artifact). The ethlambda launcher is the cleanest reference; zeam shows the scratch-image variant.
Wire the new launcher into _launcher_for(...) in src/lean/lean_launcher.star.
Add a participant entry to .github/tests/lean-devnet4.yaml (or a new args file) showing the canonical image override.

Lean Ethereum is a redesign of Ethereum consensus with no EL pairing, no Engine API, no JWT, and post-quantum (XMSS / hash-sig) validator signatures. Lean clients are standalone consensus nodes talking only to each other over libp2p QUIC. Trying to express Lean clients as participants[].cl_type with el_type: none would force is_lean() branches throughout the EL/CL pipeline, validator-keystore generator, MEV-boost flow, snooper, etc. This change adds a parallel top-level lean_participants: pipeline so the existing EL/CL flow is untouched and Lean concerns are isolated under src/lean/ and src/prelaunch_data_generator/lean_genesis/. The Lean pipeline runs in three phases: 1. Allocate per-node libp2p P2P keys via openssl. 2. Stand up placeholder services so Kurtosis assigns IPs, then run hash-sig-cli (XMSS keypairs) and eth-beacon-genesis leanchain (config.yaml + validators.yaml + nodes.yaml + genesis.{ssz,json}) against the live IPs. Post-process injects GENESIS_VALIDATORS into config.yaml and renders annotated_validators.yaml mapping node names to validator indices and attester/proposer privkey file basenames. 3. Re-add each placeholder with force_update=True so the genesis + hash-sig artifacts are mounted and the real client binary runs. Kurtosis preserves the IP because the service name and ports don't change, keeping the ENRs we just embedded valid. ethlambda is fully wired as the first concrete client. ream and zeam ship with stub launchers translating to their CLI surface from blockblaz/lean-quickstart client-cmds/*.sh. docs/lean-consensus.md covers the architecture and docs/lean-adding-a-new-client.md is the contract for adding a new Lean client (5 touch points). V1 still requires at least one EL/CL participants[] entry because several downstream consumers (tx-fuzz target, dora, etc.) assume all_el_contexts[0] exists; lean-only mode is a follow-up.

Three bug fixes surfaced by running kurtosis against a minimal lean_participants config: 1. Starlark doesn't support implicit string-literal concatenation. Two adjacent string literals across lines parsed cleanly under black (used by kurtosis lint) but failed the Starlark interpreter. Use explicit "+" in the input_parser fail() and lean_genesis_generator fail(). 2. sanity_check rejected lean_participants and lean_network_params as unknown root keys. Register both in ADDITIONAL_CATEGORY_PARAMS so the catch-all root validator accepts them. Per-entry validation stays in the Lean input parser (DEFAULT_LEAN_IMAGES + parse_lean_participants). 3. Mount overlap: GENESIS_MOUNT (/network-configs) and HASH_SIG_MOUNT (/network-configs/hash-sig-keys) cannot both be Kurtosis file artifact mountpoints since Kurtosis forbids nested mounts. Bundle hash-sig keys into the same artifact during the genesis post-process step and drop the separate HASH_SIG_MOUNT entry from each client's ServiceConfig. Validated up to the point where the existing EL/CL pipeline starts image pulls; Kurtosis CLI v1.18.1 hangs there independently of these changes (upstream issue, fixed in v1.18.2).

Lean consensus is fully standalone (no Engine API, no EL counterpart), so running a Lean network alongside an Eth1 EL/CL pair just to satisfy downstream consumers is the wrong contract. Detect the lean-only case (participants: [] && lean_participants: [...]) early in main.star and short-circuit straight into lean_launcher.launch — the Eth1 EL/CL flow is skipped entirely. Guards added to input_parser for the two EL/CL preconditions that crashed with `participants: []`: * Fulu/PeerDAS validation only runs when at least one EL/CL participant is configured. * First-participant-must-have-EL check only runs when participants[0] actually exists. Several pipeline-runtime fixes surfaced while iterating against `kurtosis run`: 1. `plan.run_sh(...).output` is a Kurtosis runtime future, not a Starlark string. Don't try to index it in Starlark (dict lookups, etc.). The P2P key generator stops returning a value dict and just exports the artifact; the validator-config render reads keys inside its own shell. 2. Kurtosis service names must match RFC 1035, so the per-node `<client>_<idx>` (lean-quickstart convention) is translated to `lean-<client>-<idx>` for the service name. The internal node_name keeps the underscore for --node-id / validator-config compatibility. 3. add_service rejects calling the same name twice (force_update or not). Swap the placeholder-then-replace pattern for the pq-devnet-package style: add_service once with all IP-independent mounts (P2P + hash-sig keys), then plan.exec each text genesis file in via `cat <<EOF` and start the binary as a nohup background process. 4. ServiceConfig refuses min_cpu/max_cpu/min_memory/max_memory of 0; only set those kwargs when the participant configured them. 5. lean_participants / lean_network_params have to be added to sanity_check.ADDITIONAL_CATEGORY_PARAMS and to the post-parse struct in input_parser, otherwise the existing root-key validator rejects them and main.star sees missing struct attrs. 6. eth-beacon-genesis leanchain works with IPs as Kurtosis futures via render_templates (template engine resolves the future correctly); embedding the future inside a raw shell heredoc trips sh because the {{kurtosis:...}} text reaches sh before substitution. The validator-config render is now two-stage: render with `__PRIVKEY_<node>__` placeholders + real IPs, then sed-substitute privkeys from the keys artifact. 7. hash-sig-cli binary lives at /usr/local/bin/hashsig in the blockblaz/hash-sig-cli:latest image, not `hash-sig-cli`. 8. The post-process step (GENESIS_VALIDATORS injection + annotated_validators.yaml render) is now a Python script rendered as a separate artifact and executed by a minimal shell wrapper — busybox sh in common yq/alpine images choked on heredocs with embedded interpreters. PyYAML installed via pip (apk's py3-yaml targets alpine's system python, not the python:3-alpine bundled one). 9. PyYAML 1.1 parses unquoted 0x-hex tokens as int; the post-process script normalises both int and str forms before string ops. 10. Lean and EL genesis pipelines share no artifacts; the hash-sig keys are bundled into the same `lean-genesis-data` artifact as config.yaml et al. so a single Kurtosis files mount covers `/network-configs` + `/network-configs/hash-sig-keys`. Kurtosis forbids overlapping mounts. Validated end-to-end with kurtosis run: two `lean-ethlambda-{0,1}` services come up in lean-only mode, peer over QUIC, exchange status messages, expose `GET /lean/v0/health` returning HTTP 200, and serve `lean_*` Prometheus metrics.

The hash-sig-cli manifest writes attester_key_pubkey_hex / proposer_key_pubkey_hex as unquoted `0x...` tokens. PyYAML's default loader (and yaml.safe_load) parse those as YAML 1.1 ints, which silently drops leading zeros. Reformatting via `format(value, "x")` then emits an odd-length hex string, and every Lean client correctly rejects the resulting config.yaml ("pubkey is not valid hex" / "odd number of digits at line ..."). Switch the loader to yaml.BaseLoader so every scalar stays a Python str; _as_hex retains its int fallback in case a future manifest version writes the field differently. Validated with a 4-node ethlambda+ream multi-client devnet: peer_count=3 on every node, cross-client status exchange confirmed.

blockblaz/zeam:devnet4 is a scratch image — only /app/zig-out/bin/zeam exists, no /bin/sh, /bin/tail, /usr/bin/touch, or even /var/log. The existing placeholder-then-plan.exec lifecycle therefore couldn't run for zeam. This commit adds a static busybox binary as a Kurtosis files artifact, mounts it at /usr/local/bin/, and routes every shell-needing step through busybox sh + dispatched applets (busybox mkdir / touch / tail / cat / nohup). Three additional zeam-specific fixups surfaced while iterating: 1. The placeholder cmd mkdir -p's $(dirname /var/log/<svc>.log) before touch — /var/log doesn't exist in scratch. 2. The start phase mkdir -p's /data and copies the per-node <node>.key from /node-keys into /network-configs (lean-quickstart's zeam contract reads --node-key relative to --custom-genesis). 3. --validator-config is set to the literal `genesis_bootnode` sentinel rather than a YAML file path. zeam's --validator-config accepts either a *directory* of per-node validator configs or the sentinel; pointing it at a single YAML file triggers a NotDir failure during "build node start options". Validated end-to-end with a 6-node devnet (2x ethlambda + 2x ream + 2x zeam): every node is RUNNING; lean-zeam-0 reports "Connected Peers: 5", builds the genesis state, and prints the fork-choice tree. Cross-client peering between ethlambda <-> ream <-> zeam confirmed via status request/response exchange in all three clients' logs.

XMSS keypair generation is CPU-bound and scales with num-validators * 2^active_epoch. On slower hosts (and with the default active_epoch=18 == 2^18 epochs per key) the default 180s plan.run_sh timeout fires mid-generation, killing the run with "exec request timed out". Bump the wait to 30m so it has headroom on shared/remote hosts. The step is idempotent in the kurtosis artifact sense - re-runs of the same package args reuse the artifact. Surfaced while bringing up the 6-node ethlambda+ream+zeam devnet on ethrex-mainnet-test-1.

Mirror lean-quickstart's docker-compose-metrics.yaml stack inside the Kurtosis enclave: one `lean-prometheus` (prom v3.8.0) scraping every lean-<client>-<idx>:5054/metrics target plus its own /metrics, and one `lean-grafana` (12.3.2) provisioning the Prometheus datasource and the upstream Lean client dashboard at port 3000. Inside the enclave we resolve scrape targets by service DNS name, so no `host.docker.internal` workaround is needed. The dashboard JSON (client-dashboard.json) is vendored under src/lean/metrics/grafana/dashboards/ at the same upstream commit as lean-quickstart. Gated on `lean_network_params.metrics_enabled` (default true), so any operator who wants a metrics-free run can set it false. Anonymous admin login is enabled (matches lean-quickstart) — there's no admin/admin prompt to navigate through. Validated end-to-end against a 6-node ethlambda+ream+zeam devnet: all 7 scrape targets (6 clients + prometheus self) report `up`, Grafana health endpoint returns 200, dashboard "Lean Ethereum Clients Dashboard" loads at /d/lean-ethereum-clients-dashboard.

Default `ports={}` publishes on a random host port bound to 127.0.0.1. `public_ports={}` lets us pin the host-side number (and Docker binds those on 0.0.0.0 by default), so dashboards have stable URLs and can be reached as http://<server>:3000 / :9090 without an SSH tunnel. If 3000 or 9090 are already in use on the host the run will fail at service start - operators can avoid this by not enabling metrics on shared hosts, or by adding an override via lean_network_params (a follow-up).

Wires the remaining devnet4 Lean clients into the pipeline. Each launcher follows the same placeholder-then-plan.exec pattern as ethlambda/ream, translating its CLI surface from the matching client-cmds/<client>-cmd.sh in blockblaz/lean-quickstart. All 5 images ship with a working /bin/sh + busybox applets, so none need the static-busybox injection that zeam required. Per-client notes: - qlean reads --node-key from /node-keys; uses the libp2p multiaddr listen-addr form. - lantern reads everything by explicit path (validator-registry, validator-keys, validator-config, hash-sig-key-dir, nodes-path). - grandine binary is /usr/local/bin/lean_client (image ENTRYPOINT alias). - lighthouse needs both genesis.json staged (in addition to the text bundle) for its lean_node subcommand. - gean follows lean-quickstart's convention of looking up --node-key inside --custom-network-config-dir, so the launcher cps the per-node libp2p secret into the genesis mount before starting the binary. Dispatcher in src/lean/lean_launcher.star routes by LEAN_TYPE.

Two regressions surfaced during the 9-client deploy on the office host: - lantern's binary lives at /opt/lantern/bin/lantern, not /usr/local/bin/lantern_cli. The image's ENTRYPOINT script (lantern-entrypoint.sh) forwards to /opt/lantern/bin/lantern, but since we bypass the entrypoint we have to point at the real binary. - The published hopinheimer/lighthouse:latest lean_node subcommand does not accept --api-port or --is-aggregator. Drop those flags; document that lighthouse always runs as a non-aggregator under this image and exposes only its metrics endpoint (no HTTP API).

Pinning the in-tree defaults to devnet4 makes them rot the moment a new devnet generation ships - operators who don't override `lean_image:` would silently keep getting a stale tag. Switch every default to the client's `:latest` tag (all of them publish one) so the package itself is forward-compatible. Devnet-specific runs (e.g. the current devnet4 deployment) belong in the args file: each participant sets `lean_image: <repo>:devnet4` explicitly. The PR description carries the canonical devnet4 args example.

The earlier wording described Lean consensus as architecturally standalone ("no EL pairing", "no Engine API", "fully standalone"). That isn't the long-term picture: Lean clients are designed to pair with EL clients in the regular EL+CL devnet shape; they just don't implement Engine API yet, so present-day devnets are client-only. Reframe both code comments and docs around that distinction: - main.star and lean_launcher.star comments now say "no Engine API yet" / "until Engine API ships" instead of declaring Lean architecturally EL-less. - docs/lean-consensus.md introduces Lean as "client-only today, EL pairing later" and notes the motivation for landing it in this package is exactly to be ready for the EL+Lean shape when it ships. - The why-a-parallel-pipeline table is retitled "Lean (today)" and adds a follow-up paragraph on how the two pipelines compose once Engine API lands.

Two more comment blocks (constants.star LEAN_TYPE intro and the Lean parsing section header in input_parser.star) still described Lean as "standalone" / "no Engine API" without the temporal qualifier. Bring both in line with the wording used in docs/lean-consensus.md and the launcher modules: Lean is client-only today because Engine API isn't implemented yet, EL+Lean pairing arrives when it does.

The two Lean prose docs (docs/lean-consensus.md and docs/lean-adding-a-new-client.md) don't match the repo's documentation shape — the existing docs/ has exactly one prose doc (architecture.md), and per-feature configs live as YAML args files under .github/tests/. - Delete docs/lean-consensus.md and docs/lean-adding-a-new-client.md. - Append a "Lean Ethereum participants" section to docs/architecture.md so the architectural overview lands in the file that already serves that purpose. - Add .github/tests/lean-devnet4.yaml as the canonical 14-node args example (mirrors how bal-devnet-0.yaml / fulu.yaml / etc. document network-shape configs). - Add .github/tests/lean-smoke.yaml as the minimal 2-node smoke test. The per-client contract that the deleted "adding a new client" guide described is now the module docstrings on each src/lean/<client>/<client>_launcher.star, matching the existing src/cl/*/<client>_launcher.star convention.

MegaRedHand · 2026-05-18T22:51:18Z

+    return plan.run_sh(
+        run="mkdir -p /out && cp /bin/busybox /out/busybox",
+        image="busybox:musl",
+        store=[StoreSpec(src="/out", name="lean-busybox")],
+        description="Extracting static busybox for zeam scratch image",
+    ).files_artifacts[0]


Zeam busybox artifact collides when count > 1 — _busybox_artifact() is called from initialize() per-node and stores with the fixed name "lean-busybox". lean-devnet4.yaml runs zeam with count: 2, so the second initialize attempts to register the same artifact name and Kurtosis rejects duplicate artifact names within an enclave.

Call site (per-node, line 54):

ethereum-package/src/lean/zeam/zeam_launcher.star

Lines 39 to 78 in 1052920

def _busybox_artifact(plan):

# Extract the static busybox binary out of busybox:musl. Once exported

# as a Kurtosis files artifact it can be mounted into any scratch

# container as `/usr/local/bin/busybox` so we have a working shell to

# run plan.exec scripts against.

return plan.run_sh(

run="mkdir -p /out && cp /bin/busybox /out/busybox",

image="busybox:musl",

store=[StoreSpec(src="/out", name="lean-busybox")],

description="Extracting static busybox for zeam scratch image",

).files_artifacts[0]

def initialize(plan, node, p2p_keys_artifact, hash_sig_artifact):

busybox_artifact = _busybox_artifact(plan)

cfg_kwargs = lean_shared.common_cfg_kwargs(node)

cfg_kwargs.update(

{

"image": node["image"],

# Override the zeam entrypoint with busybox sh; the real zeam

# binary is invoked later via plan.exec.

"entrypoint": [BUSYBOX, "sh", "-c"],

"cmd": [

# zeam's scratch image has no /usr/bin/touch, /bin/tail, and

# not even /var/log. Every applet has to be dispatched through

# busybox; mkdir -p creates /var/log on first touch.

"{0} mkdir -p $({0} dirname {1}) && {0} touch {1} && {0} tail -f {1}".format(

BUSYBOX,

lean_shared.lean_log_file_path(node["service_name"]),

)

],

"files": {

NODE_KEY_MOUNT: p2p_keys_artifact,

HASH_SIG_MOUNT: hash_sig_artifact,

BUSYBOX_MOUNT: busybox_artifact,

},

}

)

return plan.add_service(node["service_name"], ServiceConfig(**cfg_kwargs))

Loop that invokes it per node:

ethereum-package/src/lean/lean_launcher.star

Lines 160 to 170 in 1052920

# Phase 1: initialise placeholder services so Kurtosis assigns IPs.

services = []

for node in expanded:

launcher = _launcher_for(node["lean_type"])

service = launcher.initialize(

plan,

node,

keys_artifact,

hash_sig_artifact,

)

services.append((node, service))

Devnet config that triggers it (count: 2):

ethereum-package/.github/tests/lean-devnet4.yaml

Lines 27 to 31 in 1052920

- lean_type: zeam

lean_image: blockblaz/zeam:devnet4

count: 2

validator_count: 1

is_aggregator: false

Fix: hoist _busybox_artifact(plan) into launch() (like the p2p/hash-sig artifacts) and pass it as a parameter to initialize().

`el_admin_node_info.get_enode_enr_for_node` can discover the ENR/enode via `admin_nodeInfo`. ethrex defaults to `eth,net,web3` only; without this flag the kurtosis startup polls admin_nodeInfo forever and never hands the el_context to downstream CL launchers. The change is a no-op for existing setups that didn't reach the poll (it only widens the public HTTP API surface inside the test enclave).

`nohup ... &`. Kurtosis `exec` waits for its docker-exec FDs to close, and `& disown` is a bash-ism that the Lean client images' /bin/sh (dash on Debian-slim) doesn't recognise — so the backgrounded ethlambda process kept the exec connection open and the kurtosis run hung at the start step for the next Lean node. `setsid -f` forks into a new session and exits the parent shell immediately, releasing the FDs and letting the kurtosis run progress to the next node. The zeam launcher keeps the busybox-prefixed `nohup ... &` pattern because its scratch-based image has no setsid; the busybox build detaches via the injected `< /dev/null` redirect (handled separately in that launcher).

the standard `participants:` block, and remove the parallel `lean_participants:` schema entirely. The new shape collapses Lean and EL+CL into one input surface: participants: - el_type: ethrex cl_type: ethlambda is_aggregator: true - el_type: none cl_type: ream ethlambda is the only Lean client that implements Engine API today (lambdaclass/ethlambda#367); when paired with an EL (`el_type` != none) the Lean launcher reads the EL's genesis block hash via `eth_getBlockByNumber 0x0` after the EL is up, stages the network JWT into the ethlambda container, and adds the three Engine API flags (`--execution-endpoint`, `--execution-jwt-secret`, `--execution-genesis-block-hash`) to its CLI. The other seven Lean clients run client-only — `el_type: none` skips EL launch entirely (the package already supports this for `consensoor` etc.). Cl_TYPE additions: ethlambda, ream, zeam, qlean, lantern, gean, lean_grandine, lean_lighthouse. The last two are prefixed because `grandine` and `lighthouse` already exist in CL_TYPE for the Eth1 CLs of the same name (different binaries from different repos). `LEAN_CL_TYPES` is the set the cl_launcher dispatcher checks to decide whether to skip a participant (Lean cl_types are launched by src/lean/lean_launcher.star, not the standard CL launchers); main.star then builds a Lean record per such participant and hands the list to the Lean launcher with the network jwt_file attached. Other plumbing changes that fall out of this: - `is_aggregator` is now a first-class per-participant field. Ignored on non-Lean cl_types. - The "first participant cannot have el_type=none without bootnodoor" guard is relaxed when every participant has a Lean cl_type — Lean uses its own libp2p QUIC mesh and doesn't need an Eth1 bootnode. - The Fulu/PeerDAS validation skips Lean cl_types (they don't speak PeerDAS). - The VC / remote-signer / snooper / metrics-exporter pipeline is skipped for Lean cl_types in participant_network.star — Lean validators live inside the consensus binary, not a separate VC. - shared_utils.get_client_names is None-safe: when cl_context is None (Lean participants), it falls back to the cl_type string from the participant config so downstream consumers (validator-ranges, dora, etc.) still get a usable row name. `lean_network_params:` stays as a separate config block for Lean-only knobs (`active_epoch`, `attestation_committee_count`, `num_validator_keys_per_node`, `metrics_enabled`, ...). `parse_lean_participants` and `DEFAULT_LEAN_IMAGES` are deleted; the DEFAULT_CL_IMAGES table now carries the Lean defaults too. Args files migrated: - `.github/tests/lean-devnet4.yaml` — every entry moved to `participants:` with `el_type: none`. - `.github/tests/lean-smoke.yaml` — same shape, two ethlambda nodes. - `.github/tests/ethlambda-el-pair.yaml` — new, single ethrex+ethlambda pair. - `.github/tests/ethlambda-el-pair-2node.yaml` — new, two pairs with one aggregator + one non-aggregator on the Lean side. Validated locally: the 2-node ethrex+ethlambda pair finalizes slot-by-slot, both ethrex ELs converge on identical block hashes via the Lean libp2p mesh between the two ethlambdas.

participant has `el_type: none` (an all-Lean deployment) the EL launcher appends nothing, so `all_el_contexts[0].ip_addr` crashed at startup with "index 0 out of range: empty list". `fuzz_target` is only consumed by additional services that talk to an Eth1 EL (tx-fuzz, rakoon, broadcaster, custom_flood); leaving it empty is correct for Lean-only — those services aren't enabled there and the remaining additional-service handlers guard their own EL needs.

for Lean clients whose `lean_type` is itself hyphen-prefixed. With the phase-2 disambiguation, `LEAN_TYPE.grandine = "lean_grandine"` and `LEAN_TYPE.lighthouse = "lean_lighthouse"`; the previous service name formatter produced `lean-lean_grandine-2`, which Kurtosis rejects per RFC 1035 ("only lowercase alphanumeric and `-` characters"). The node name (used inside the Lean genesis / validator config) keeps the underscore — that's the convention lean-quickstart writes and the clients parse.

node paired with ethlambda via Engine API (16 EL+CL pairs total, one aggregator). Dora is added as an additional service to give the EL side a beacon-explorer UI. Also gate dora's launcher loop on Lean cl_types — `cl_client` is None for Lean participants and dora's `new_cl_client_info(cl_client.beacon_http_url, ...)` was crashing before reaching the `el_type == none` skip. Same shape the other downstream pipelines (VC / snooper / metrics-exporter) already have.

doesn't open TCP 8546 within Kurtosis's 2-minute port-check timeout, which fails the parallel start batch and rolls back every other EL. The other 7 EL clients (geth, nethermind, besu, reth, erigon, nimbus, ethrex) are unaffected. Re-add when the ethereumjs image is fixed.

nethermind, geth, erigon, nimbus-eth1, besu. reth dropped from the experiment too; ethereumjs continues to be excluded because of its 2-minute TCP port-check timeout rolling back the whole batch.

parser expands `count: N` into N separate `participants:` entries (input_parser.star:1260), each carrying the original `count` attribute. main.star's synthesis was reading that `count` and propagating it into the Lean record, so the Lean launcher's own count expansion multiplied N×N — a `count: 2` ream participant ended up running 4 ream containers. Hardcode `count: 1` in the synthesized Lean record so the Lean launcher gets one node per already-expanded participant entry.

beacon endpoint to start, and Lean cl_types don't expose one — every participant in this experiment is a Lean cl_type, so dora's config template rendered zero endpoints and the container exited with "missing beacon node endpoints (need at least 1)". The rest of the devnet runs fine without dora; re-enable once ethlambda ships Beacon API compatibility stubs.

12 EL containers booting in parallel, several were too slow to answer engine_getPayloadV5 within the Lean 4s slot window during EL warm-up; the resulting empty slots broke 3SF-mini's delta-bounded finalization rule. 180s lets the ELs warm up before slot 0 starts.

ilitteri added 7 commits May 13, 2026 16:19

ilitteri changed the title ~~Add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana)~~ feat: add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana) May 13, 2026

ilitteri added 7 commits May 13, 2026 19:19

ilitteri changed the title ~~feat: add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana)~~ feat: add Lean Ethereum consensus pipeline May 14, 2026

MegaRedHand reviewed May 19, 2026

View reviewed changes

ilitteri added 11 commits May 19, 2026 22:21

Narrow the all-ELs ethlambda experiment to 6 EL clients: ethrex,

ac90c38

nethermind, geth, erigon, nimbus-eth1, besu. reth dropped from the experiment too; ethereumjs continues to be excluded because of its 2-minute TCP port-check timeout rolling back the whole batch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Lean Ethereum consensus pipeline#19

feat: add Lean Ethereum consensus pipeline#19
ilitteri wants to merge 25 commits into
mainfrom
feat/lean-consensus-clients

ilitteri commented May 13, 2026 •

edited by MegaRedHand

Loading

Uh oh!

MegaRedHand May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


	def _busybox_artifact(plan):
	# Extract the static busybox binary out of busybox:musl. Once exported
	# as a Kurtosis files artifact it can be mounted into any scratch
	# container as `/usr/local/bin/busybox` so we have a working shell to
	# run plan.exec scripts against.
	return plan.run_sh(
	run="mkdir -p /out && cp /bin/busybox /out/busybox",
	image="busybox:musl",
	store=[StoreSpec(src="/out", name="lean-busybox")],
	description="Extracting static busybox for zeam scratch image",
	).files_artifacts[0]


	def initialize(plan, node, p2p_keys_artifact, hash_sig_artifact):
	busybox_artifact = _busybox_artifact(plan)
	cfg_kwargs = lean_shared.common_cfg_kwargs(node)
	cfg_kwargs.update(
	{
	"image": node["image"],
	# Override the zeam entrypoint with busybox sh; the real zeam
	# binary is invoked later via plan.exec.
	"entrypoint": [BUSYBOX, "sh", "-c"],
	"cmd": [
	# zeam's scratch image has no /usr/bin/touch, /bin/tail, and
	# not even /var/log. Every applet has to be dispatched through
	# busybox; mkdir -p creates /var/log on first touch.
	"{0} mkdir -p $({0} dirname {1}) && {0} touch {1} && {0} tail -f {1}".format(
	BUSYBOX,
	lean_shared.lean_log_file_path(node["service_name"]),
	)
	],
	"files": {
	NODE_KEY_MOUNT: p2p_keys_artifact,
	HASH_SIG_MOUNT: hash_sig_artifact,
	BUSYBOX_MOUNT: busybox_artifact,
	},
	}
	)
	return plan.add_service(node["service_name"], ServiceConfig(**cfg_kwargs))

	# Phase 1: initialise placeholder services so Kurtosis assigns IPs.
	services = []
	for node in expanded:
	launcher = _launcher_for(node["lean_type"])
	service = launcher.initialize(
	plan,
	node,
	keys_artifact,
	hash_sig_artifact,
	)
	services.append((node, service))

	- lean_type: zeam
	lean_image: blockblaz/zeam:devnet4
	count: 2
	validator_count: 1
	is_aggregator: false

Conversation

ilitteri commented May 13, 2026 • edited by MegaRedHand Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

How to run — Lean-only localnet (all 8 clients)

How to run — 2× ethrex + ethlambda paired localnet

How to run — cross-client EL × ethlambda experiment (6 ELs × 2 each)

What's in the box

Validation

Architectural notes

Known limitations

Adding a new Lean client

Uh oh!

MegaRedHand May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilitteri commented May 13, 2026 •

edited by MegaRedHand

Loading