Skip to content

feat: add Lean Ethereum consensus pipeline#19

Open
ilitteri wants to merge 25 commits into
mainfrom
feat/lean-consensus-clients
Open

feat: add Lean Ethereum consensus pipeline#19
ilitteri wants to merge 25 commits into
mainfrom
feat/lean-consensus-clients

Conversation

@ilitteri
Copy link
Copy Markdown

@ilitteri ilitteri commented May 13, 2026

Summary

Add Lean Ethereum consensus clients as cl_type: values inside the standard participants: block. Operators pick a Lean client the same way they pick lighthouse/teku/prysm/...; if the Lean client supports Engine API (today only ethlambda does, via lambdaclass/ethlambda#367) it can be paired with any EL — otherwise set el_type: none and the package skips the Eth1 side for that participant.

This replaces the original lean_participants: top-level block; that schema is removed and the canonical args files are migrated.

Wired clients (8 total):

cl_type: Default image EL pairing today
ethlambda ghcr.io/lambdaclass/ethlambda:latest ✅ via Engine API (lambdaclass/ethlambda#367)
ream ghcr.io/reamlabs/ream:latest ❌ no Engine API yet — use el_type: none
zeam blockblaz/zeam:latest ❌ no Engine API yet
qlean qdrvm/qlean-mini:latest ❌ no Engine API yet
lantern piertwo/lantern:latest ❌ no Engine API yet
gean ghcr.io/geanlabs/gean:latest ❌ no Engine API yet
lean_grandine sifrai/lean:latest ❌ no Engine API yet
lean_lighthouse hopinheimer/lighthouse:latest ❌ no Engine API yet (devnet3 single-key layout — won't peer with devnet4 clients until a dual-key image ships)

The lean_grandine / lean_lighthouse prefixes disambiguate from the standard CL Grandine / Lighthouse types of the same name (different binaries from different repos).

Plus a Prometheus + Grafana stack (mirrors lean-quickstart's docker-compose-metrics.yaml) that scrapes every Lean node's /metrics and serves the upstream Lean client dashboard, host-published on :3000 / :9090.

How to run — Lean-only localnet (all 8 clients)

  1. Save the args file below to e.g. lean-only.yaml:

    # All-Lean devnet4 localnet. Every participant has `el_type: none`
    # so the package skips the Eth1 EL/CL pipeline for it. lean_lighthouse
    # is omitted because the published `hopinheimer/lighthouse:latest`
    # image is still on the single-key (devnet3) GENESIS_VALIDATORS layout
    # and won't reach consensus with the devnet4 nodes. Add it back when
    # a dual-key image ships.
    participants:
      - el_type: none
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet4
        count: 1
        validator_count: 1
        is_aggregator: true
      - el_type: none
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet4
        count: 1
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: ream
        cl_image: ghcr.io/reamlabs/ream:latest-devnet4
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: zeam
        cl_image: blockblaz/zeam:devnet4
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: qlean
        cl_image: qdrvm/qlean-mini:devnet-4-amd64
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: lantern
        cl_image: piertwo/lantern:v0.0.4
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: lean_grandine
        cl_image: sifrai/lean:devnet-4
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: none
        cl_type: gean
        cl_image: ghcr.io/geanlabs/gean:devnet4
        count: 2
        validator_count: 1
        is_aggregator: false
    
    lean_network_params:
      genesis_delay: 180
      active_epoch: 18
      attestation_committee_count: 1
    
    additional_services: []

    This is the file checked in at .github/tests/lean-devnet4.yaml.

  2. Run with Kurtosis 1.18.2+:

    kurtosis run --enclave lean \
      github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
      --args-file lean-only.yaml
  3. The package boots:

    • 14 Lean nodes (2× each: ethlambda, ream, zeam, qlean, lantern, lean_grandine, gean), full-mesh peering
    • 1 Prometheus scraping every node + itself
    • 1 Grafana with anonymous admin + the upstream client dashboard pre-loaded
  4. Access:

    # All ports (host-published):
    kurtosis enclave inspect lean
    
    # Direct URLs (host's hostname, bound on 0.0.0.0):
    # Grafana:    http://<host>:3000/d/lean-ethereum-clients-dashboard/lean-ethereum-clients-dashboard
    # Prometheus: http://<host>:9090
    
    # Service logs (any client):
    kurtosis service logs lean lean-ethlambda-0 -f
  5. Smaller setups: drop nodes you don't need from participants:. A 2-node ethlambda smoke test runs in ~3 min (.github/tests/lean-smoke.yaml); the 14-node setup above takes ~10–12 min including hash-sig keygen.

  6. Teardown:

    kurtosis enclave rm -f lean

How to run — 2× ethrex + ethlambda paired localnet

Two ethrex ELs each paired with one ethlambda via the Engine API. One ethlambda is the aggregator, the other isn't.

  1. Save the args file below to e.g. ethlambda-el-pair-2node.yaml:

    participants:
      - el_type: ethrex
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 1
        validator_count: 1
        is_aggregator: true
      - el_type: ethrex
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 1
        validator_count: 1
        is_aggregator: false
    
    additional_services: []

    This is the file checked in at .github/tests/ethlambda-el-pair-2node.yaml. A single-pair variant lives at .github/tests/ethlambda-el-pair.yaml.

  2. (optional) Build an ethlambda image for local testing, instead of using the published devnet6 image:

    git clone --branch engine-api-integration https://github.com/lambdaclass/ethlambda
    cd ethlambda
    docker build -t ghcr.io/lambdaclass/ethlambda:devnet6 .
  3. Run with Kurtosis 1.18.2+:

    kurtosis run --enclave ethlambda-pair-2node \
      github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
      --args-file ethlambda-el-pair-2node.yaml
  4. The package boots:

    • 2× ethrex EL services (el-1-ethrex-ethlambda, el-2-ethrex-ethlambda)
    • 2× ethlambda CL services (lean-ethlambda-0 = aggregator, lean-ethlambda-1 = peer)
    • 1× Prometheus + 1× Grafana scraping every Lean node
  5. Verify the chain is advancing on both pairs:

    # Lean head_slot from both ethlambdas:
    p0=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-0 metrics | sed 's|.*:||')
    p1=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-1 metrics | sed 's|.*:||')
    curl -s "http://127.0.0.1:${p0}/metrics" | grep '^lean_head_slot '
    curl -s "http://127.0.0.1:${p1}/metrics" | grep '^lean_head_slot '
    
    # Justification / finalization from ethlambda_0's fork choice:
    api=$(kurtosis port print ethlambda-pair-2node lean-ethlambda-0 http | sed 's|.*:||')
    curl -s "http://127.0.0.1:${api}/lean/v0/fork_choice" \
      | jq '{justified:.justified.slot, finalized:.finalized.slot, validator_count:.validator_count}'
    
    # Both ethrex tips should be at the same block hash:
    rpc1=$(kurtosis port print ethlambda-pair-2node el-1-ethrex-ethlambda rpc | sed 's|.*:||')
    rpc2=$(kurtosis port print ethlambda-pair-2node el-2-ethrex-ethlambda rpc | sed 's|.*:||')
    for rpc in $rpc1 $rpc2; do
      curl -s "http://127.0.0.1:${rpc}" -X POST -H 'Content-Type: application/json' \
        -d '{"jsonrpc":"2.0","method":"eth_getBlockByNumber","id":1,"params":["latest",false]}' \
        | jq -c '{block:.result.number, hash:.result.hash}'
    done

    Expected (after ~30 s): both lean_head_slot values match and advance by 1 every 4 s; justified/finalized advance slot-by-slot; both ethrex latest blocks report identical hashes.

  6. Teardown:

    kurtosis enclave rm -f ethlambda-pair-2node

How to run — cross-client EL × ethlambda experiment (6 ELs × 2 each)

12 EL+ethlambda pairs, one ethlambda aggregator. Exercises the Engine API surface across every major EL client: ethrex, nethermind, geth, erigon, nimbus-eth1, besu.

  1. Save the args file below to e.g. ethlambda-el-all-clients.yaml:

    participants:
      - el_type: ethrex
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 1
        validator_count: 1
        is_aggregator: true
      - el_type: ethrex
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 1
        validator_count: 1
        is_aggregator: false
      - el_type: nethermind
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: geth
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: erigon
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: nimbus
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 2
        validator_count: 1
        is_aggregator: false
      - el_type: besu
        cl_type: ethlambda
        cl_image: ghcr.io/lambdaclass/ethlambda:devnet6
        count: 2
        validator_count: 1
        is_aggregator: false
    
    lean_network_params:
      # 180s lets the 12 EL containers fully warm up before the Lean
      # chain starts. With the 60s default, several proposers missed
      # their slot during EL JIT/cache warm-up, and the resulting gaps
      # broke 3SF-mini's `delta ≤ 5` finalization rule.
      genesis_delay: 180
    
    additional_services: []

    This is the file checked in at .github/tests/ethlambda-el-all-clients.yaml.

  2. Build the ethlambda image from feat: integrate with ethrex over the Engine API ethlambda#367 (see the previous section).

  3. Run with Kurtosis 1.18.2+:

    kurtosis run --enclave ethlambda-all-els \
      github.com/lambdaclass/ethereum-package@feat/lean-consensus-clients \
      --args-file ethlambda-el-all-clients.yaml
  4. The package boots in ~6 min:

    • 12 EL services (2× of ethrex, nethermind, geth, erigon, nimbus-eth1, besu)
    • 12 ethlambda CL services (lean-ethlambda-0 = aggregator, 1..11 = peers)
    • 1× Prometheus + 1× Grafana
  5. Verify all 12 ethlambdas are in sync and finalization is advancing:

    # All 12 heads match
    for n in 0 1 2 3 4 5 6 7 8 9 10 11; do
      port=$(kurtosis port print ethlambda-all-els "lean-ethlambda-${n}" metrics | sed 's|.*:||')
      head=$(curl -s "http://127.0.0.1:${port}/metrics" | grep '^lean_head_slot ' | awk '{print $2}')
      echo "ethlambda_${n} head=${head}"
    done
    
    # Justified/finalized advancing
    api=$(kurtosis port print ethlambda-all-els lean-ethlambda-0 http | sed 's|.*:||')
    curl -s "http://127.0.0.1:${api}/lean/v0/fork_choice" \
      | jq '{justified:.justified.slot, finalized:.finalized.slot, validator_count:.validator_count}'

    Expected: every ethlambda at the same head_slot, justified = head - 1, finalized = head - 2, validator_count = 12. All 12 EL chains will report identical block hashes because the Lean libp2p mesh picks a single canonical chain.

  6. Teardown:

    kurtosis enclave rm -f ethlambda-all-els

Note on additional_services: [dora] (and similar Eth1 beacon-API consumers): they currently can't run alongside a Lean-only CL set because Lean clients don't expose an Eth1 Beacon API. ethlambda Beacon-API compatibility stubs are tracked separately.

Note on reth / ethereumjs: omitted from this experiment. ethereumjs fails the package's 2-minute TCP port-check on its WS endpoint and rolls back the entire EL launch batch; reth may or may not work — untested in this configuration.

What's in the box

Path Purpose
src/package_io/constants.star Adds 8 Lean cl_types to CL_TYPE and the LEAN_CL_TYPES set the dispatcher checks
src/package_io/input_parser.star Lean defaults in DEFAULT_CL_IMAGES, is_aggregator participant field, relaxed Eth1 bootnode + Fulu/PeerDAS guards for Lean-only networks
src/cl/cl_launcher.star Skips Lean cl_types in the standard dispatcher (None cl_context)
src/participant_network.star Skips VC / snooper / metrics-exporter for Lean participants
src/shared_utils/shared_utils.star get_client_names falls back to cl_type when cl_context is None
src/dora/dora_launcher.star Skips Lean participants when building beacon-endpoint list (dora itself still requires ≥1 Eth1 beacon endpoint to start)
main.star Builds Lean records from participants: with a Lean cl_type and hands them to the Lean launcher
src/lean/lean_launcher.star 3-phase orchestrator (P2P keys → genesis → start) with optional EL pairing (queries EL genesis hash, passes jwt_file through)
src/lean/ethlambda/ethlambda_launcher.star Per-client CLI translation; accepts optional EL params and stages JWT into the container
src/lean/{ream,zeam,qlean,lantern,grandine,lighthouse,gean}/<client>_launcher.star Other Lean clients (no EL pairing today)
src/lean/metrics/metrics_launcher.star Prometheus + Grafana stack with the upstream Lean client dashboard pre-loaded
src/prelaunch_data_generator/lean_genesis/ hash-sig-cli + eth-beacon-genesis leanchain + Python post-process
src/el/ethrex/ethrex_launcher.star Enables the admin HTTP API namespace so admin_nodeInfo works (was hanging)
.github/tests/lean-devnet4.yaml 14-node Lean-only canonical args
.github/tests/lean-smoke.yaml 2-node ethlambda smoke test (Lean-only)
.github/tests/ethlambda-el-pair.yaml Single EL+ethlambda smoke test
.github/tests/ethlambda-el-pair-2node.yaml 2× EL+ethlambda pairs (one aggregator + one not)
.github/tests/ethlambda-el-all-clients.yaml 6 ELs × 2 each paired with ethlambda (12 pairs, one aggregator)
docs/architecture.md "Lean Ethereum participants" section rewritten for the cl_type-based shape

Validation

  • 12-pair cross-EL devnet (ethlambda-el-all-clients.yaml) — ethrex, nethermind, geth, erigon, nimbus-eth1, besu × 2 each paired with ethlambda. All 12 ethlambdas finalize slot-by-slot via 3SF-mini's delta ≤ 5 rule; every EL chain converges on identical block hashes. Local Kurtosis 1.18.2 on macOS / arm64 and Debian 13 / amd64 (ethrex-office-4).
  • 2-pair ethrex+ethlambda (ethlambda-el-pair-2node.yaml) — finalization advancing on both pairs, ethrex tips identical.
  • Single ethrex+ethlambda pair (ethlambda-el-pair.yaml) — ethrex's per-slot log (Fork choice updated includes payload attributes / Requested payload with id=...) confirms the Engine API loop.
  • 14-node Lean-only (lean-devnet4.yaml) — all peers connect cross-client, chain advances; finalization stalls because the single aggregator's leanVM XMSS proof exceeds the 750 ms slot-aggregation deadline at 14-validator scale (knob lives in ethlambda; see "Known limitations"). On ethrex-office-3 (Debian 13 / amd64).

Architectural notes

  • EL pairing only on ethlambda today: the other 7 Lean clients don't implement Engine API yet. The convention is el_type: none for non-ethlambda Lean cl_types.
  • First-participant guard: EL_TYPE.none on the first participant is allowed when every participant has a Lean cl_type (no Eth1 bootnode mesh exists). Mixed networks (some Lean, some standard Eth1 CL) still need a non-none first participant.
  • EL genesis block hash: read from the EL via eth_getBlockByNumber 0x0 after the EL is up. Used to seed ethlambda's state.latest_execution_payload_header.block_hash so the very first FCU carries a head the EL recognizes.
  • Process detachment: Lean client binaries are launched via setsid -f (not nohup ... &) so Kurtosis exec returns immediately — the Lean clients run dash, which doesn't understand disown.
  • genesis_delay tuning: 60s is enough for a 2–4 node devnet; 180s is recommended for ≥10 EL containers booting in parallel, otherwise some ELs miss their first engine_getPayloadV5 deadline and the resulting empty slots break 3SF-mini's delta ≤ 5 finalization rule.
  • YAML 1.1 hex int trap: PyYAML parses unquoted 0x... tokens as ints, dropping leading zeros from XMSS pubkey hex. The Lean genesis post-process loader uses yaml.BaseLoader to keep them as strings.
  • zeam scratch image: no /bin/sh — handled by injecting a static busybox binary as a Kurtosis files artifact.
  • lean_lighthouse caveat: the published hopinheimer/lighthouse:latest rejects devnet4's dual-key GENESIS_VALIDATORS layout. Launcher is wired in, but the client won't reach consensus until upstream ships a dual-key image.

Known limitations

  • dora doesn't work on Lean-only networks — it requires ≥1 Eth1 beacon endpoint to start, and Lean clients don't expose one. ethlambda Beacon API compatibility stubs would unblock dora (and assertoor, forky, …); tracked separately.
  • 14-node Lean-only finalization stalls with the current single-aggregator + 750ms slot-aggregation deadline. The aggregator processes ~4 sigs/slot in the 750 ms window vs the 10–14 it would need for justification; the deadline is hardcoded in ethlambda. Cross-client lag (some Lean clients fall behind the head) compounds the issue.
  • ethereumjs is excluded from the all-ELs args file: its container fails the package's 2-minute TCP port-check on the WS endpoint, which rolls back the entire EL launch batch.

Adding a new Lean client

The contract for adding a new Lean client lives in the module docstrings (same convention the existing src/cl/* launchers follow). Touch points:

  1. Add an entry to LEAN_TYPE in src/package_io/constants.star, add the same value to CL_TYPE and LEAN_CL_TYPES, plus a default :latest image in DEFAULT_CL_IMAGES / DEFAULT_CL_IMAGES_MINIMAL.
  2. Create src/lean/<client>/<client>_launcher.star exporting initialize(plan, node, p2p_keys_artifact, hash_sig_artifact) and start(plan, node, service, genesis_artifact, hash_sig_artifact). The ethlambda launcher is the cleanest reference; zeam shows the scratch-image variant.
  3. Wire the new launcher into _launcher_for(...) in src/lean/lean_launcher.star.
  4. Add a participant entry to .github/tests/lean-devnet4.yaml (or a new args file) showing the canonical image override.

ilitteri added 7 commits May 13, 2026 16:19
Lean Ethereum is a redesign of Ethereum consensus with no EL pairing,
no Engine API, no JWT, and post-quantum (XMSS / hash-sig) validator
signatures. Lean clients are standalone consensus nodes talking only
to each other over libp2p QUIC.

Trying to express Lean clients as participants[].cl_type with
el_type: none would force is_lean() branches throughout the EL/CL
pipeline, validator-keystore generator, MEV-boost flow, snooper, etc.
This change adds a parallel top-level lean_participants: pipeline so
the existing EL/CL flow is untouched and Lean concerns are isolated
under src/lean/ and src/prelaunch_data_generator/lean_genesis/.

The Lean pipeline runs in three phases:

  1. Allocate per-node libp2p P2P keys via openssl.
  2. Stand up placeholder services so Kurtosis assigns IPs, then run
     hash-sig-cli (XMSS keypairs) and eth-beacon-genesis leanchain
     (config.yaml + validators.yaml + nodes.yaml + genesis.{ssz,json})
     against the live IPs. Post-process injects GENESIS_VALIDATORS
     into config.yaml and renders annotated_validators.yaml mapping
     node names to validator indices and attester/proposer privkey
     file basenames.
  3. Re-add each placeholder with force_update=True so the genesis +
     hash-sig artifacts are mounted and the real client binary runs.
     Kurtosis preserves the IP because the service name and ports
     don't change, keeping the ENRs we just embedded valid.

ethlambda is fully wired as the first concrete client. ream and zeam
ship with stub launchers translating to their CLI surface from
blockblaz/lean-quickstart client-cmds/*.sh. docs/lean-consensus.md
covers the architecture and docs/lean-adding-a-new-client.md is the
contract for adding a new Lean client (5 touch points).

V1 still requires at least one EL/CL participants[] entry because
several downstream consumers (tx-fuzz target, dora, etc.) assume
all_el_contexts[0] exists; lean-only mode is a follow-up.
Three bug fixes surfaced by running kurtosis against a minimal
lean_participants config:

1. Starlark doesn't support implicit string-literal concatenation. Two
   adjacent string literals across lines parsed cleanly under black
   (used by kurtosis lint) but failed the Starlark interpreter. Use
   explicit "+" in the input_parser fail() and lean_genesis_generator
   fail().
2. sanity_check rejected lean_participants and lean_network_params as
   unknown root keys. Register both in ADDITIONAL_CATEGORY_PARAMS so
   the catch-all root validator accepts them. Per-entry validation
   stays in the Lean input parser (DEFAULT_LEAN_IMAGES + parse_lean_participants).
3. Mount overlap: GENESIS_MOUNT (/network-configs) and HASH_SIG_MOUNT
   (/network-configs/hash-sig-keys) cannot both be Kurtosis file
   artifact mountpoints since Kurtosis forbids nested mounts. Bundle
   hash-sig keys into the same artifact during the genesis post-process
   step and drop the separate HASH_SIG_MOUNT entry from each client's
   ServiceConfig.

Validated up to the point where the existing EL/CL pipeline starts
image pulls; Kurtosis CLI v1.18.1 hangs there independently of these
changes (upstream issue, fixed in v1.18.2).
Lean consensus is fully standalone (no Engine API, no EL counterpart), so
running a Lean network alongside an Eth1 EL/CL pair just to satisfy
downstream consumers is the wrong contract. Detect the lean-only case
(participants: [] && lean_participants: [...]) early in main.star and
short-circuit straight into lean_launcher.launch — the Eth1 EL/CL flow
is skipped entirely. Guards added to input_parser for the two EL/CL
preconditions that crashed with `participants: []`:
  * Fulu/PeerDAS validation only runs when at least one EL/CL participant
    is configured.
  * First-participant-must-have-EL check only runs when participants[0]
    actually exists.

Several pipeline-runtime fixes surfaced while iterating against
`kurtosis run`:

  1. `plan.run_sh(...).output` is a Kurtosis runtime future, not a
     Starlark string. Don't try to index it in Starlark (dict lookups,
     etc.). The P2P key generator stops returning a value dict and
     just exports the artifact; the validator-config render reads keys
     inside its own shell.
  2. Kurtosis service names must match RFC 1035, so the per-node
     `<client>_<idx>` (lean-quickstart convention) is translated to
     `lean-<client>-<idx>` for the service name. The internal
     node_name keeps the underscore for --node-id / validator-config
     compatibility.
  3. add_service rejects calling the same name twice (force_update or
     not). Swap the placeholder-then-replace pattern for the
     pq-devnet-package style: add_service once with all
     IP-independent mounts (P2P + hash-sig keys), then plan.exec
     each text genesis file in via `cat <<EOF` and start the binary
     as a nohup background process.
  4. ServiceConfig refuses min_cpu/max_cpu/min_memory/max_memory of
     0; only set those kwargs when the participant configured them.
  5. lean_participants / lean_network_params have to be added to
     sanity_check.ADDITIONAL_CATEGORY_PARAMS and to the post-parse
     struct in input_parser, otherwise the existing root-key
     validator rejects them and main.star sees missing struct attrs.
  6. eth-beacon-genesis leanchain works with IPs as Kurtosis futures
     via render_templates (template engine resolves the future
     correctly); embedding the future inside a raw shell heredoc
     trips sh because the {{kurtosis:...}} text reaches sh before
     substitution. The validator-config render is now two-stage:
     render with `__PRIVKEY_<node>__` placeholders + real IPs, then
     sed-substitute privkeys from the keys artifact.
  7. hash-sig-cli binary lives at /usr/local/bin/hashsig in the
     blockblaz/hash-sig-cli:latest image, not `hash-sig-cli`.
  8. The post-process step (GENESIS_VALIDATORS injection +
     annotated_validators.yaml render) is now a Python script
     rendered as a separate artifact and executed by a minimal
     shell wrapper — busybox sh in common yq/alpine images choked
     on heredocs with embedded interpreters. PyYAML installed via
     pip (apk's py3-yaml targets alpine's system python, not the
     python:3-alpine bundled one).
  9. PyYAML 1.1 parses unquoted 0x-hex tokens as int; the post-process
     script normalises both int and str forms before string ops.
 10. Lean and EL genesis pipelines share no artifacts; the hash-sig
     keys are bundled into the same `lean-genesis-data` artifact as
     config.yaml et al. so a single Kurtosis files mount covers
     `/network-configs` + `/network-configs/hash-sig-keys`. Kurtosis
     forbids overlapping mounts.

Validated end-to-end with kurtosis run: two `lean-ethlambda-{0,1}`
services come up in lean-only mode, peer over QUIC, exchange status
messages, expose `GET /lean/v0/health` returning HTTP 200, and serve
`lean_*` Prometheus metrics.
The hash-sig-cli manifest writes attester_key_pubkey_hex /
proposer_key_pubkey_hex as unquoted `0x...` tokens. PyYAML's default
loader (and yaml.safe_load) parse those as YAML 1.1 ints, which silently
drops leading zeros. Reformatting via `format(value, "x")` then emits an
odd-length hex string, and every Lean client correctly rejects the
resulting config.yaml ("pubkey is not valid hex" / "odd number of
digits at line ...").

Switch the loader to yaml.BaseLoader so every scalar stays a Python
str; _as_hex retains its int fallback in case a future manifest version
writes the field differently.

Validated with a 4-node ethlambda+ream multi-client devnet:
peer_count=3 on every node, cross-client status exchange confirmed.
blockblaz/zeam:devnet4 is a scratch image — only /app/zig-out/bin/zeam
exists, no /bin/sh, /bin/tail, /usr/bin/touch, or even /var/log. The
existing placeholder-then-plan.exec lifecycle therefore couldn't run
for zeam. This commit adds a static busybox binary as a Kurtosis files
artifact, mounts it at /usr/local/bin/, and routes every shell-needing
step through busybox sh + dispatched applets (busybox mkdir / touch /
tail / cat / nohup).

Three additional zeam-specific fixups surfaced while iterating:

  1. The placeholder cmd mkdir -p's $(dirname /var/log/<svc>.log)
     before touch — /var/log doesn't exist in scratch.
  2. The start phase mkdir -p's /data and copies the per-node
     <node>.key from /node-keys into /network-configs (lean-quickstart's
     zeam contract reads --node-key relative to --custom-genesis).
  3. --validator-config is set to the literal `genesis_bootnode`
     sentinel rather than a YAML file path. zeam's --validator-config
     accepts either a *directory* of per-node validator configs or
     the sentinel; pointing it at a single YAML file triggers a
     NotDir failure during "build node start options".

Validated end-to-end with a 6-node devnet (2x ethlambda + 2x ream +
2x zeam): every node is RUNNING; lean-zeam-0 reports "Connected
Peers: 5", builds the genesis state, and prints the fork-choice
tree. Cross-client peering between ethlambda <-> ream <-> zeam
confirmed via status request/response exchange in all three clients'
logs.
XMSS keypair generation is CPU-bound and scales with num-validators *
2^active_epoch. On slower hosts (and with the default active_epoch=18
== 2^18 epochs per key) the default 180s plan.run_sh timeout fires
mid-generation, killing the run with "exec request timed out". Bump
the wait to 30m so it has headroom on shared/remote hosts. The step
is idempotent in the kurtosis artifact sense - re-runs of the same
package args reuse the artifact.

Surfaced while bringing up the 6-node ethlambda+ream+zeam devnet on
ethrex-mainnet-test-1.
Mirror lean-quickstart's docker-compose-metrics.yaml stack inside the
Kurtosis enclave: one `lean-prometheus` (prom v3.8.0) scraping every
lean-<client>-<idx>:5054/metrics target plus its own /metrics, and one
`lean-grafana` (12.3.2) provisioning the Prometheus datasource and the
upstream Lean client dashboard at port 3000.

Inside the enclave we resolve scrape targets by service DNS name, so
no `host.docker.internal` workaround is needed. The dashboard JSON
(client-dashboard.json) is vendored under
src/lean/metrics/grafana/dashboards/ at the same upstream commit as
lean-quickstart.

Gated on `lean_network_params.metrics_enabled` (default true), so any
operator who wants a metrics-free run can set it false. Anonymous
admin login is enabled (matches lean-quickstart) — there's no
admin/admin prompt to navigate through.

Validated end-to-end against a 6-node ethlambda+ream+zeam devnet: all
7 scrape targets (6 clients + prometheus self) report `up`, Grafana
health endpoint returns 200, dashboard "Lean Ethereum Clients
Dashboard" loads at /d/lean-ethereum-clients-dashboard.
@ilitteri ilitteri changed the title Add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana) feat: add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana) May 13, 2026
ilitteri added 7 commits May 13, 2026 19:19
Default `ports={}` publishes on a random host port bound to 127.0.0.1.
`public_ports={}` lets us pin the host-side number (and Docker binds
those on 0.0.0.0 by default), so dashboards have stable URLs and can
be reached as http://<server>:3000 / :9090 without an SSH tunnel.

If 3000 or 9090 are already in use on the host the run will fail at
service start - operators can avoid this by not enabling metrics on
shared hosts, or by adding an override via lean_network_params (a
follow-up).
Wires the remaining devnet4 Lean clients into the pipeline. Each
launcher follows the same placeholder-then-plan.exec pattern as
ethlambda/ream, translating its CLI surface from the matching
client-cmds/<client>-cmd.sh in blockblaz/lean-quickstart. All 5
images ship with a working /bin/sh + busybox applets, so none need
the static-busybox injection that zeam required.

Per-client notes:
- qlean reads --node-key from /node-keys; uses the libp2p multiaddr
  listen-addr form.
- lantern reads everything by explicit path (validator-registry,
  validator-keys, validator-config, hash-sig-key-dir, nodes-path).
- grandine binary is /usr/local/bin/lean_client (image ENTRYPOINT
  alias).
- lighthouse needs both genesis.json staged (in addition to the
  text bundle) for its lean_node subcommand.
- gean follows lean-quickstart's convention of looking up
  --node-key inside --custom-network-config-dir, so the launcher
  cps the per-node libp2p secret into the genesis mount before
  starting the binary.

Dispatcher in src/lean/lean_launcher.star routes by LEAN_TYPE.
Two regressions surfaced during the 9-client deploy on the office host:

- lantern's binary lives at /opt/lantern/bin/lantern, not
  /usr/local/bin/lantern_cli. The image's ENTRYPOINT script
  (lantern-entrypoint.sh) forwards to /opt/lantern/bin/lantern, but
  since we bypass the entrypoint we have to point at the real binary.
- The published hopinheimer/lighthouse:latest lean_node subcommand
  does not accept --api-port or --is-aggregator. Drop those flags;
  document that lighthouse always runs as a non-aggregator under this
  image and exposes only its metrics endpoint (no HTTP API).
Pinning the in-tree defaults to devnet4 makes them rot the moment a
new devnet generation ships - operators who don't override
`lean_image:` would silently keep getting a stale tag. Switch every
default to the client's `:latest` tag (all of them publish one) so
the package itself is forward-compatible.

Devnet-specific runs (e.g. the current devnet4 deployment) belong
in the args file: each participant sets `lean_image: <repo>:devnet4`
explicitly. The PR description carries the canonical devnet4 args
example.
The earlier wording described Lean consensus as architecturally
standalone ("no EL pairing", "no Engine API", "fully standalone"). That
isn't the long-term picture: Lean clients are designed to pair with EL
clients in the regular EL+CL devnet shape; they just don't implement
Engine API yet, so present-day devnets are client-only.

Reframe both code comments and docs around that distinction:

- main.star and lean_launcher.star comments now say "no Engine API yet"
  / "until Engine API ships" instead of declaring Lean architecturally
  EL-less.
- docs/lean-consensus.md introduces Lean as "client-only today, EL
  pairing later" and notes the motivation for landing it in this
  package is exactly to be ready for the EL+Lean shape when it ships.
- The why-a-parallel-pipeline table is retitled "Lean (today)" and adds
  a follow-up paragraph on how the two pipelines compose once Engine
  API lands.
Two more comment blocks (constants.star LEAN_TYPE intro and the Lean
parsing section header in input_parser.star) still described Lean as
"standalone" / "no Engine API" without the temporal qualifier. Bring
both in line with the wording used in docs/lean-consensus.md and the
launcher modules: Lean is client-only today because Engine API isn't
implemented yet, EL+Lean pairing arrives when it does.
The two Lean prose docs (docs/lean-consensus.md and
docs/lean-adding-a-new-client.md) don't match the repo's documentation
shape — the existing docs/ has exactly one prose doc (architecture.md),
and per-feature configs live as YAML args files under .github/tests/.

  - Delete docs/lean-consensus.md and docs/lean-adding-a-new-client.md.
  - Append a "Lean Ethereum participants" section to docs/architecture.md
    so the architectural overview lands in the file that already serves
    that purpose.
  - Add .github/tests/lean-devnet4.yaml as the canonical 14-node args
    example (mirrors how bal-devnet-0.yaml / fulu.yaml / etc. document
    network-shape configs).
  - Add .github/tests/lean-smoke.yaml as the minimal 2-node smoke test.

The per-client contract that the deleted "adding a new client" guide
described is now the module docstrings on each src/lean/<client>/<client>_launcher.star,
matching the existing src/cl/*/<client>_launcher.star convention.
@ilitteri ilitteri changed the title feat: add Lean Ethereum consensus pipeline (ethlambda + ream + zeam + Prometheus/Grafana) feat: add Lean Ethereum consensus pipeline May 14, 2026
Comment on lines +45 to +50
return plan.run_sh(
run="mkdir -p /out && cp /bin/busybox /out/busybox",
image="busybox:musl",
store=[StoreSpec(src="/out", name="lean-busybox")],
description="Extracting static busybox for zeam scratch image",
).files_artifacts[0]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zeam busybox artifact collides when count > 1 — _busybox_artifact() is called from initialize() per-node and stores with the fixed name "lean-busybox". lean-devnet4.yaml runs zeam with count: 2, so the second initialize attempts to register the same artifact name and Kurtosis rejects duplicate artifact names within an enclave.

Call site (per-node, line 54):

def _busybox_artifact(plan):
# Extract the static busybox binary out of busybox:musl. Once exported
# as a Kurtosis files artifact it can be mounted into any scratch
# container as `/usr/local/bin/busybox` so we have a working shell to
# run plan.exec scripts against.
return plan.run_sh(
run="mkdir -p /out && cp /bin/busybox /out/busybox",
image="busybox:musl",
store=[StoreSpec(src="/out", name="lean-busybox")],
description="Extracting static busybox for zeam scratch image",
).files_artifacts[0]
def initialize(plan, node, p2p_keys_artifact, hash_sig_artifact):
busybox_artifact = _busybox_artifact(plan)
cfg_kwargs = lean_shared.common_cfg_kwargs(node)
cfg_kwargs.update(
{
"image": node["image"],
# Override the zeam entrypoint with busybox sh; the real zeam
# binary is invoked later via plan.exec.
"entrypoint": [BUSYBOX, "sh", "-c"],
"cmd": [
# zeam's scratch image has no /usr/bin/touch, /bin/tail, and
# not even /var/log. Every applet has to be dispatched through
# busybox; mkdir -p creates /var/log on first touch.
"{0} mkdir -p $({0} dirname {1}) && {0} touch {1} && {0} tail -f {1}".format(
BUSYBOX,
lean_shared.lean_log_file_path(node["service_name"]),
)
],
"files": {
NODE_KEY_MOUNT: p2p_keys_artifact,
HASH_SIG_MOUNT: hash_sig_artifact,
BUSYBOX_MOUNT: busybox_artifact,
},
}
)
return plan.add_service(node["service_name"], ServiceConfig(**cfg_kwargs))

Loop that invokes it per node:

# Phase 1: initialise placeholder services so Kurtosis assigns IPs.
services = []
for node in expanded:
launcher = _launcher_for(node["lean_type"])
service = launcher.initialize(
plan,
node,
keys_artifact,
hash_sig_artifact,
)
services.append((node, service))

Devnet config that triggers it (count: 2):

- lean_type: zeam
lean_image: blockblaz/zeam:devnet4
count: 2
validator_count: 1
is_aggregator: false

Fix: hoist _busybox_artifact(plan) into launch() (like the p2p/hash-sig artifacts) and pass it as a parameter to initialize().

Comment thread src/lean/ethlambda/ethlambda_launcher.star
ilitteri added 11 commits May 19, 2026 22:21
`el_admin_node_info.get_enode_enr_for_node` can discover the ENR/enode
via `admin_nodeInfo`. ethrex defaults to `eth,net,web3` only; without
this flag the kurtosis startup polls admin_nodeInfo forever and never
hands the el_context to downstream CL launchers.

The change is a no-op for existing setups that didn't reach the poll
(it only widens the public HTTP API surface inside the test enclave).
`nohup ... &`. Kurtosis `exec` waits for its docker-exec FDs to close,
and `& disown` is a bash-ism that the Lean client images' /bin/sh
(dash on Debian-slim) doesn't recognise — so the backgrounded ethlambda
process kept the exec connection open and the kurtosis run hung at
the start step for the next Lean node. `setsid -f` forks into a new
session and exits the parent shell immediately, releasing the FDs and
letting the kurtosis run progress to the next node.

The zeam launcher keeps the busybox-prefixed `nohup ... &` pattern
because its scratch-based image has no setsid; the busybox build
detaches via the injected `< /dev/null` redirect (handled separately
in that launcher).
the standard `participants:` block, and remove the parallel
`lean_participants:` schema entirely. The new shape collapses Lean and
EL+CL into one input surface:

  participants:
    - el_type: ethrex
      cl_type: ethlambda
      is_aggregator: true
    - el_type: none
      cl_type: ream

ethlambda is the only Lean client that implements Engine API today
(lambdaclass/ethlambda#367); when paired with an EL (`el_type` != none)
the Lean launcher reads the EL's genesis block hash via
`eth_getBlockByNumber 0x0` after the EL is up, stages the network JWT
into the ethlambda container, and adds the three Engine API flags
(`--execution-endpoint`, `--execution-jwt-secret`,
`--execution-genesis-block-hash`) to its CLI. The other seven Lean
clients run client-only — `el_type: none` skips EL launch entirely
(the package already supports this for `consensoor` etc.).

Cl_TYPE additions: ethlambda, ream, zeam, qlean, lantern, gean,
lean_grandine, lean_lighthouse. The last two are prefixed because
`grandine` and `lighthouse` already exist in CL_TYPE for the Eth1 CLs
of the same name (different binaries from different repos).
`LEAN_CL_TYPES` is the set the cl_launcher dispatcher checks to
decide whether to skip a participant (Lean cl_types are launched by
src/lean/lean_launcher.star, not the standard CL launchers); main.star
then builds a Lean record per such participant and hands the list to
the Lean launcher with the network jwt_file attached.

Other plumbing changes that fall out of this:

- `is_aggregator` is now a first-class per-participant field. Ignored
  on non-Lean cl_types.
- The "first participant cannot have el_type=none without bootnodoor"
  guard is relaxed when every participant has a Lean cl_type — Lean
  uses its own libp2p QUIC mesh and doesn't need an Eth1 bootnode.
- The Fulu/PeerDAS validation skips Lean cl_types (they don't speak
  PeerDAS).
- The VC / remote-signer / snooper / metrics-exporter pipeline is
  skipped for Lean cl_types in participant_network.star — Lean
  validators live inside the consensus binary, not a separate VC.
- shared_utils.get_client_names is None-safe: when cl_context is None
  (Lean participants), it falls back to the cl_type string from the
  participant config so downstream consumers (validator-ranges, dora,
  etc.) still get a usable row name.

`lean_network_params:` stays as a separate config block for Lean-only
knobs (`active_epoch`, `attestation_committee_count`,
`num_validator_keys_per_node`, `metrics_enabled`, ...).
`parse_lean_participants` and `DEFAULT_LEAN_IMAGES` are deleted; the
DEFAULT_CL_IMAGES table now carries the Lean defaults too.

Args files migrated:
- `.github/tests/lean-devnet4.yaml` — every entry moved to
  `participants:` with `el_type: none`.
- `.github/tests/lean-smoke.yaml` — same shape, two ethlambda nodes.
- `.github/tests/ethlambda-el-pair.yaml` — new, single ethrex+ethlambda
  pair.
- `.github/tests/ethlambda-el-pair-2node.yaml` — new, two pairs with
  one aggregator + one non-aggregator on the Lean side.

Validated locally: the 2-node ethrex+ethlambda pair finalizes
slot-by-slot, both ethrex ELs converge on identical block hashes via
the Lean libp2p mesh between the two ethlambdas.
participant has `el_type: none` (an all-Lean deployment) the EL launcher
appends nothing, so `all_el_contexts[0].ip_addr` crashed at startup with
"index 0 out of range: empty list".

`fuzz_target` is only consumed by additional services that talk to an
Eth1 EL (tx-fuzz, rakoon, broadcaster, custom_flood); leaving it empty
is correct for Lean-only — those services aren't enabled there and the
remaining additional-service handlers guard their own EL needs.
for Lean clients whose `lean_type` is itself hyphen-prefixed. With the
phase-2 disambiguation, `LEAN_TYPE.grandine = "lean_grandine"` and
`LEAN_TYPE.lighthouse = "lean_lighthouse"`; the previous service name
formatter produced `lean-lean_grandine-2`, which Kurtosis rejects per
RFC 1035 ("only lowercase alphanumeric and `-` characters"). The node
name (used inside the Lean genesis / validator config) keeps the
underscore — that's the convention lean-quickstart writes and the
clients parse.
node paired with ethlambda via Engine API (16 EL+CL pairs total, one
aggregator). Dora is added as an additional service to give the EL side
a beacon-explorer UI.

Also gate dora's launcher loop on Lean cl_types — `cl_client` is None
for Lean participants and dora's `new_cl_client_info(cl_client.beacon_http_url, ...)`
was crashing before reaching the `el_type == none` skip. Same shape the
other downstream pipelines (VC / snooper / metrics-exporter) already
have.
doesn't open TCP 8546 within Kurtosis's 2-minute port-check timeout,
which fails the parallel start batch and rolls back every other EL.
The other 7 EL clients (geth, nethermind, besu, reth, erigon, nimbus,
ethrex) are unaffected. Re-add when the ethereumjs image is fixed.
nethermind, geth, erigon, nimbus-eth1, besu. reth dropped from the
experiment too; ethereumjs continues to be excluded because of its
2-minute TCP port-check timeout rolling back the whole batch.
parser expands `count: N` into N separate `participants:` entries
(input_parser.star:1260), each carrying the original `count` attribute.
main.star's synthesis was reading that `count` and propagating it into
the Lean record, so the Lean launcher's own count expansion multiplied
N×N — a `count: 2` ream participant ended up running 4 ream containers.

Hardcode `count: 1` in the synthesized Lean record so the Lean
launcher gets one node per already-expanded participant entry.
beacon endpoint to start, and Lean cl_types don't expose one — every
participant in this experiment is a Lean cl_type, so dora's config
template rendered zero endpoints and the container exited with
"missing beacon node endpoints (need at least 1)". The rest of the
devnet runs fine without dora; re-enable once ethlambda ships Beacon
API compatibility stubs.
12 EL containers booting in parallel, several were too slow to answer
engine_getPayloadV5 within the Lean 4s slot window during EL warm-up;
the resulting empty slots broke 3SF-mini's delta-bounded finalization
rule. 180s lets the ELs warm up before slot 0 starts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants