feat: add Lean Ethereum consensus pipeline#19
Conversation
Lean Ethereum is a redesign of Ethereum consensus with no EL pairing,
no Engine API, no JWT, and post-quantum (XMSS / hash-sig) validator
signatures. Lean clients are standalone consensus nodes talking only
to each other over libp2p QUIC.
Trying to express Lean clients as participants[].cl_type with
el_type: none would force is_lean() branches throughout the EL/CL
pipeline, validator-keystore generator, MEV-boost flow, snooper, etc.
This change adds a parallel top-level lean_participants: pipeline so
the existing EL/CL flow is untouched and Lean concerns are isolated
under src/lean/ and src/prelaunch_data_generator/lean_genesis/.
The Lean pipeline runs in three phases:
1. Allocate per-node libp2p P2P keys via openssl.
2. Stand up placeholder services so Kurtosis assigns IPs, then run
hash-sig-cli (XMSS keypairs) and eth-beacon-genesis leanchain
(config.yaml + validators.yaml + nodes.yaml + genesis.{ssz,json})
against the live IPs. Post-process injects GENESIS_VALIDATORS
into config.yaml and renders annotated_validators.yaml mapping
node names to validator indices and attester/proposer privkey
file basenames.
3. Re-add each placeholder with force_update=True so the genesis +
hash-sig artifacts are mounted and the real client binary runs.
Kurtosis preserves the IP because the service name and ports
don't change, keeping the ENRs we just embedded valid.
ethlambda is fully wired as the first concrete client. ream and zeam
ship with stub launchers translating to their CLI surface from
blockblaz/lean-quickstart client-cmds/*.sh. docs/lean-consensus.md
covers the architecture and docs/lean-adding-a-new-client.md is the
contract for adding a new Lean client (5 touch points).
V1 still requires at least one EL/CL participants[] entry because
several downstream consumers (tx-fuzz target, dora, etc.) assume
all_el_contexts[0] exists; lean-only mode is a follow-up.
Three bug fixes surfaced by running kurtosis against a minimal lean_participants config: 1. Starlark doesn't support implicit string-literal concatenation. Two adjacent string literals across lines parsed cleanly under black (used by kurtosis lint) but failed the Starlark interpreter. Use explicit "+" in the input_parser fail() and lean_genesis_generator fail(). 2. sanity_check rejected lean_participants and lean_network_params as unknown root keys. Register both in ADDITIONAL_CATEGORY_PARAMS so the catch-all root validator accepts them. Per-entry validation stays in the Lean input parser (DEFAULT_LEAN_IMAGES + parse_lean_participants). 3. Mount overlap: GENESIS_MOUNT (/network-configs) and HASH_SIG_MOUNT (/network-configs/hash-sig-keys) cannot both be Kurtosis file artifact mountpoints since Kurtosis forbids nested mounts. Bundle hash-sig keys into the same artifact during the genesis post-process step and drop the separate HASH_SIG_MOUNT entry from each client's ServiceConfig. Validated up to the point where the existing EL/CL pipeline starts image pulls; Kurtosis CLI v1.18.1 hangs there independently of these changes (upstream issue, fixed in v1.18.2).
Lean consensus is fully standalone (no Engine API, no EL counterpart), so
running a Lean network alongside an Eth1 EL/CL pair just to satisfy
downstream consumers is the wrong contract. Detect the lean-only case
(participants: [] && lean_participants: [...]) early in main.star and
short-circuit straight into lean_launcher.launch — the Eth1 EL/CL flow
is skipped entirely. Guards added to input_parser for the two EL/CL
preconditions that crashed with `participants: []`:
* Fulu/PeerDAS validation only runs when at least one EL/CL participant
is configured.
* First-participant-must-have-EL check only runs when participants[0]
actually exists.
Several pipeline-runtime fixes surfaced while iterating against
`kurtosis run`:
1. `plan.run_sh(...).output` is a Kurtosis runtime future, not a
Starlark string. Don't try to index it in Starlark (dict lookups,
etc.). The P2P key generator stops returning a value dict and
just exports the artifact; the validator-config render reads keys
inside its own shell.
2. Kurtosis service names must match RFC 1035, so the per-node
`<client>_<idx>` (lean-quickstart convention) is translated to
`lean-<client>-<idx>` for the service name. The internal
node_name keeps the underscore for --node-id / validator-config
compatibility.
3. add_service rejects calling the same name twice (force_update or
not). Swap the placeholder-then-replace pattern for the
pq-devnet-package style: add_service once with all
IP-independent mounts (P2P + hash-sig keys), then plan.exec
each text genesis file in via `cat <<EOF` and start the binary
as a nohup background process.
4. ServiceConfig refuses min_cpu/max_cpu/min_memory/max_memory of
0; only set those kwargs when the participant configured them.
5. lean_participants / lean_network_params have to be added to
sanity_check.ADDITIONAL_CATEGORY_PARAMS and to the post-parse
struct in input_parser, otherwise the existing root-key
validator rejects them and main.star sees missing struct attrs.
6. eth-beacon-genesis leanchain works with IPs as Kurtosis futures
via render_templates (template engine resolves the future
correctly); embedding the future inside a raw shell heredoc
trips sh because the {{kurtosis:...}} text reaches sh before
substitution. The validator-config render is now two-stage:
render with `__PRIVKEY_<node>__` placeholders + real IPs, then
sed-substitute privkeys from the keys artifact.
7. hash-sig-cli binary lives at /usr/local/bin/hashsig in the
blockblaz/hash-sig-cli:latest image, not `hash-sig-cli`.
8. The post-process step (GENESIS_VALIDATORS injection +
annotated_validators.yaml render) is now a Python script
rendered as a separate artifact and executed by a minimal
shell wrapper — busybox sh in common yq/alpine images choked
on heredocs with embedded interpreters. PyYAML installed via
pip (apk's py3-yaml targets alpine's system python, not the
python:3-alpine bundled one).
9. PyYAML 1.1 parses unquoted 0x-hex tokens as int; the post-process
script normalises both int and str forms before string ops.
10. Lean and EL genesis pipelines share no artifacts; the hash-sig
keys are bundled into the same `lean-genesis-data` artifact as
config.yaml et al. so a single Kurtosis files mount covers
`/network-configs` + `/network-configs/hash-sig-keys`. Kurtosis
forbids overlapping mounts.
Validated end-to-end with kurtosis run: two `lean-ethlambda-{0,1}`
services come up in lean-only mode, peer over QUIC, exchange status
messages, expose `GET /lean/v0/health` returning HTTP 200, and serve
`lean_*` Prometheus metrics.
The hash-sig-cli manifest writes attester_key_pubkey_hex /
proposer_key_pubkey_hex as unquoted `0x...` tokens. PyYAML's default
loader (and yaml.safe_load) parse those as YAML 1.1 ints, which silently
drops leading zeros. Reformatting via `format(value, "x")` then emits an
odd-length hex string, and every Lean client correctly rejects the
resulting config.yaml ("pubkey is not valid hex" / "odd number of
digits at line ...").
Switch the loader to yaml.BaseLoader so every scalar stays a Python
str; _as_hex retains its int fallback in case a future manifest version
writes the field differently.
Validated with a 4-node ethlambda+ream multi-client devnet:
peer_count=3 on every node, cross-client status exchange confirmed.
blockblaz/zeam:devnet4 is a scratch image — only /app/zig-out/bin/zeam
exists, no /bin/sh, /bin/tail, /usr/bin/touch, or even /var/log. The
existing placeholder-then-plan.exec lifecycle therefore couldn't run
for zeam. This commit adds a static busybox binary as a Kurtosis files
artifact, mounts it at /usr/local/bin/, and routes every shell-needing
step through busybox sh + dispatched applets (busybox mkdir / touch /
tail / cat / nohup).
Three additional zeam-specific fixups surfaced while iterating:
1. The placeholder cmd mkdir -p's $(dirname /var/log/<svc>.log)
before touch — /var/log doesn't exist in scratch.
2. The start phase mkdir -p's /data and copies the per-node
<node>.key from /node-keys into /network-configs (lean-quickstart's
zeam contract reads --node-key relative to --custom-genesis).
3. --validator-config is set to the literal `genesis_bootnode`
sentinel rather than a YAML file path. zeam's --validator-config
accepts either a *directory* of per-node validator configs or
the sentinel; pointing it at a single YAML file triggers a
NotDir failure during "build node start options".
Validated end-to-end with a 6-node devnet (2x ethlambda + 2x ream +
2x zeam): every node is RUNNING; lean-zeam-0 reports "Connected
Peers: 5", builds the genesis state, and prints the fork-choice
tree. Cross-client peering between ethlambda <-> ream <-> zeam
confirmed via status request/response exchange in all three clients'
logs.
XMSS keypair generation is CPU-bound and scales with num-validators * 2^active_epoch. On slower hosts (and with the default active_epoch=18 == 2^18 epochs per key) the default 180s plan.run_sh timeout fires mid-generation, killing the run with "exec request timed out". Bump the wait to 30m so it has headroom on shared/remote hosts. The step is idempotent in the kurtosis artifact sense - re-runs of the same package args reuse the artifact. Surfaced while bringing up the 6-node ethlambda+ream+zeam devnet on ethrex-mainnet-test-1.
Mirror lean-quickstart's docker-compose-metrics.yaml stack inside the Kurtosis enclave: one `lean-prometheus` (prom v3.8.0) scraping every lean-<client>-<idx>:5054/metrics target plus its own /metrics, and one `lean-grafana` (12.3.2) provisioning the Prometheus datasource and the upstream Lean client dashboard at port 3000. Inside the enclave we resolve scrape targets by service DNS name, so no `host.docker.internal` workaround is needed. The dashboard JSON (client-dashboard.json) is vendored under src/lean/metrics/grafana/dashboards/ at the same upstream commit as lean-quickstart. Gated on `lean_network_params.metrics_enabled` (default true), so any operator who wants a metrics-free run can set it false. Anonymous admin login is enabled (matches lean-quickstart) — there's no admin/admin prompt to navigate through. Validated end-to-end against a 6-node ethlambda+ream+zeam devnet: all 7 scrape targets (6 clients + prometheus self) report `up`, Grafana health endpoint returns 200, dashboard "Lean Ethereum Clients Dashboard" loads at /d/lean-ethereum-clients-dashboard.
Default `ports={}` publishes on a random host port bound to 127.0.0.1.
`public_ports={}` lets us pin the host-side number (and Docker binds
those on 0.0.0.0 by default), so dashboards have stable URLs and can
be reached as http://<server>:3000 / :9090 without an SSH tunnel.
If 3000 or 9090 are already in use on the host the run will fail at
service start - operators can avoid this by not enabling metrics on
shared hosts, or by adding an override via lean_network_params (a
follow-up).
Wires the remaining devnet4 Lean clients into the pipeline. Each launcher follows the same placeholder-then-plan.exec pattern as ethlambda/ream, translating its CLI surface from the matching client-cmds/<client>-cmd.sh in blockblaz/lean-quickstart. All 5 images ship with a working /bin/sh + busybox applets, so none need the static-busybox injection that zeam required. Per-client notes: - qlean reads --node-key from /node-keys; uses the libp2p multiaddr listen-addr form. - lantern reads everything by explicit path (validator-registry, validator-keys, validator-config, hash-sig-key-dir, nodes-path). - grandine binary is /usr/local/bin/lean_client (image ENTRYPOINT alias). - lighthouse needs both genesis.json staged (in addition to the text bundle) for its lean_node subcommand. - gean follows lean-quickstart's convention of looking up --node-key inside --custom-network-config-dir, so the launcher cps the per-node libp2p secret into the genesis mount before starting the binary. Dispatcher in src/lean/lean_launcher.star routes by LEAN_TYPE.
Two regressions surfaced during the 9-client deploy on the office host: - lantern's binary lives at /opt/lantern/bin/lantern, not /usr/local/bin/lantern_cli. The image's ENTRYPOINT script (lantern-entrypoint.sh) forwards to /opt/lantern/bin/lantern, but since we bypass the entrypoint we have to point at the real binary. - The published hopinheimer/lighthouse:latest lean_node subcommand does not accept --api-port or --is-aggregator. Drop those flags; document that lighthouse always runs as a non-aggregator under this image and exposes only its metrics endpoint (no HTTP API).
Pinning the in-tree defaults to devnet4 makes them rot the moment a new devnet generation ships - operators who don't override `lean_image:` would silently keep getting a stale tag. Switch every default to the client's `:latest` tag (all of them publish one) so the package itself is forward-compatible. Devnet-specific runs (e.g. the current devnet4 deployment) belong in the args file: each participant sets `lean_image: <repo>:devnet4` explicitly. The PR description carries the canonical devnet4 args example.
The earlier wording described Lean consensus as architecturally
standalone ("no EL pairing", "no Engine API", "fully standalone"). That
isn't the long-term picture: Lean clients are designed to pair with EL
clients in the regular EL+CL devnet shape; they just don't implement
Engine API yet, so present-day devnets are client-only.
Reframe both code comments and docs around that distinction:
- main.star and lean_launcher.star comments now say "no Engine API yet"
/ "until Engine API ships" instead of declaring Lean architecturally
EL-less.
- docs/lean-consensus.md introduces Lean as "client-only today, EL
pairing later" and notes the motivation for landing it in this
package is exactly to be ready for the EL+Lean shape when it ships.
- The why-a-parallel-pipeline table is retitled "Lean (today)" and adds
a follow-up paragraph on how the two pipelines compose once Engine
API lands.
Two more comment blocks (constants.star LEAN_TYPE intro and the Lean parsing section header in input_parser.star) still described Lean as "standalone" / "no Engine API" without the temporal qualifier. Bring both in line with the wording used in docs/lean-consensus.md and the launcher modules: Lean is client-only today because Engine API isn't implemented yet, EL+Lean pairing arrives when it does.
The two Lean prose docs (docs/lean-consensus.md and
docs/lean-adding-a-new-client.md) don't match the repo's documentation
shape — the existing docs/ has exactly one prose doc (architecture.md),
and per-feature configs live as YAML args files under .github/tests/.
- Delete docs/lean-consensus.md and docs/lean-adding-a-new-client.md.
- Append a "Lean Ethereum participants" section to docs/architecture.md
so the architectural overview lands in the file that already serves
that purpose.
- Add .github/tests/lean-devnet4.yaml as the canonical 14-node args
example (mirrors how bal-devnet-0.yaml / fulu.yaml / etc. document
network-shape configs).
- Add .github/tests/lean-smoke.yaml as the minimal 2-node smoke test.
The per-client contract that the deleted "adding a new client" guide
described is now the module docstrings on each src/lean/<client>/<client>_launcher.star,
matching the existing src/cl/*/<client>_launcher.star convention.
| return plan.run_sh( | ||
| run="mkdir -p /out && cp /bin/busybox /out/busybox", | ||
| image="busybox:musl", | ||
| store=[StoreSpec(src="/out", name="lean-busybox")], | ||
| description="Extracting static busybox for zeam scratch image", | ||
| ).files_artifacts[0] |
There was a problem hiding this comment.
Zeam busybox artifact collides when count > 1 — _busybox_artifact() is called from initialize() per-node and stores with the fixed name "lean-busybox". lean-devnet4.yaml runs zeam with count: 2, so the second initialize attempts to register the same artifact name and Kurtosis rejects duplicate artifact names within an enclave.
Call site (per-node, line 54):
ethereum-package/src/lean/zeam/zeam_launcher.star
Lines 39 to 78 in 1052920
Loop that invokes it per node:
ethereum-package/src/lean/lean_launcher.star
Lines 160 to 170 in 1052920
Devnet config that triggers it (count: 2):
ethereum-package/.github/tests/lean-devnet4.yaml
Lines 27 to 31 in 1052920
Fix: hoist _busybox_artifact(plan) into launch() (like the p2p/hash-sig artifacts) and pass it as a parameter to initialize().
`el_admin_node_info.get_enode_enr_for_node` can discover the ENR/enode via `admin_nodeInfo`. ethrex defaults to `eth,net,web3` only; without this flag the kurtosis startup polls admin_nodeInfo forever and never hands the el_context to downstream CL launchers. The change is a no-op for existing setups that didn't reach the poll (it only widens the public HTTP API surface inside the test enclave).
`nohup ... &`. Kurtosis `exec` waits for its docker-exec FDs to close, and `& disown` is a bash-ism that the Lean client images' /bin/sh (dash on Debian-slim) doesn't recognise — so the backgrounded ethlambda process kept the exec connection open and the kurtosis run hung at the start step for the next Lean node. `setsid -f` forks into a new session and exits the parent shell immediately, releasing the FDs and letting the kurtosis run progress to the next node. The zeam launcher keeps the busybox-prefixed `nohup ... &` pattern because its scratch-based image has no setsid; the busybox build detaches via the injected `< /dev/null` redirect (handled separately in that launcher).
the standard `participants:` block, and remove the parallel
`lean_participants:` schema entirely. The new shape collapses Lean and
EL+CL into one input surface:
participants:
- el_type: ethrex
cl_type: ethlambda
is_aggregator: true
- el_type: none
cl_type: ream
ethlambda is the only Lean client that implements Engine API today
(lambdaclass/ethlambda#367); when paired with an EL (`el_type` != none)
the Lean launcher reads the EL's genesis block hash via
`eth_getBlockByNumber 0x0` after the EL is up, stages the network JWT
into the ethlambda container, and adds the three Engine API flags
(`--execution-endpoint`, `--execution-jwt-secret`,
`--execution-genesis-block-hash`) to its CLI. The other seven Lean
clients run client-only — `el_type: none` skips EL launch entirely
(the package already supports this for `consensoor` etc.).
Cl_TYPE additions: ethlambda, ream, zeam, qlean, lantern, gean,
lean_grandine, lean_lighthouse. The last two are prefixed because
`grandine` and `lighthouse` already exist in CL_TYPE for the Eth1 CLs
of the same name (different binaries from different repos).
`LEAN_CL_TYPES` is the set the cl_launcher dispatcher checks to
decide whether to skip a participant (Lean cl_types are launched by
src/lean/lean_launcher.star, not the standard CL launchers); main.star
then builds a Lean record per such participant and hands the list to
the Lean launcher with the network jwt_file attached.
Other plumbing changes that fall out of this:
- `is_aggregator` is now a first-class per-participant field. Ignored
on non-Lean cl_types.
- The "first participant cannot have el_type=none without bootnodoor"
guard is relaxed when every participant has a Lean cl_type — Lean
uses its own libp2p QUIC mesh and doesn't need an Eth1 bootnode.
- The Fulu/PeerDAS validation skips Lean cl_types (they don't speak
PeerDAS).
- The VC / remote-signer / snooper / metrics-exporter pipeline is
skipped for Lean cl_types in participant_network.star — Lean
validators live inside the consensus binary, not a separate VC.
- shared_utils.get_client_names is None-safe: when cl_context is None
(Lean participants), it falls back to the cl_type string from the
participant config so downstream consumers (validator-ranges, dora,
etc.) still get a usable row name.
`lean_network_params:` stays as a separate config block for Lean-only
knobs (`active_epoch`, `attestation_committee_count`,
`num_validator_keys_per_node`, `metrics_enabled`, ...).
`parse_lean_participants` and `DEFAULT_LEAN_IMAGES` are deleted; the
DEFAULT_CL_IMAGES table now carries the Lean defaults too.
Args files migrated:
- `.github/tests/lean-devnet4.yaml` — every entry moved to
`participants:` with `el_type: none`.
- `.github/tests/lean-smoke.yaml` — same shape, two ethlambda nodes.
- `.github/tests/ethlambda-el-pair.yaml` — new, single ethrex+ethlambda
pair.
- `.github/tests/ethlambda-el-pair-2node.yaml` — new, two pairs with
one aggregator + one non-aggregator on the Lean side.
Validated locally: the 2-node ethrex+ethlambda pair finalizes
slot-by-slot, both ethrex ELs converge on identical block hashes via
the Lean libp2p mesh between the two ethlambdas.
participant has `el_type: none` (an all-Lean deployment) the EL launcher appends nothing, so `all_el_contexts[0].ip_addr` crashed at startup with "index 0 out of range: empty list". `fuzz_target` is only consumed by additional services that talk to an Eth1 EL (tx-fuzz, rakoon, broadcaster, custom_flood); leaving it empty is correct for Lean-only — those services aren't enabled there and the remaining additional-service handlers guard their own EL needs.
for Lean clients whose `lean_type` is itself hyphen-prefixed. With the
phase-2 disambiguation, `LEAN_TYPE.grandine = "lean_grandine"` and
`LEAN_TYPE.lighthouse = "lean_lighthouse"`; the previous service name
formatter produced `lean-lean_grandine-2`, which Kurtosis rejects per
RFC 1035 ("only lowercase alphanumeric and `-` characters"). The node
name (used inside the Lean genesis / validator config) keeps the
underscore — that's the convention lean-quickstart writes and the
clients parse.
node paired with ethlambda via Engine API (16 EL+CL pairs total, one aggregator). Dora is added as an additional service to give the EL side a beacon-explorer UI. Also gate dora's launcher loop on Lean cl_types — `cl_client` is None for Lean participants and dora's `new_cl_client_info(cl_client.beacon_http_url, ...)` was crashing before reaching the `el_type == none` skip. Same shape the other downstream pipelines (VC / snooper / metrics-exporter) already have.
doesn't open TCP 8546 within Kurtosis's 2-minute port-check timeout, which fails the parallel start batch and rolls back every other EL. The other 7 EL clients (geth, nethermind, besu, reth, erigon, nimbus, ethrex) are unaffected. Re-add when the ethereumjs image is fixed.
nethermind, geth, erigon, nimbus-eth1, besu. reth dropped from the experiment too; ethereumjs continues to be excluded because of its 2-minute TCP port-check timeout rolling back the whole batch.
parser expands `count: N` into N separate `participants:` entries (input_parser.star:1260), each carrying the original `count` attribute. main.star's synthesis was reading that `count` and propagating it into the Lean record, so the Lean launcher's own count expansion multiplied N×N — a `count: 2` ream participant ended up running 4 ream containers. Hardcode `count: 1` in the synthesized Lean record so the Lean launcher gets one node per already-expanded participant entry.
beacon endpoint to start, and Lean cl_types don't expose one — every participant in this experiment is a Lean cl_type, so dora's config template rendered zero endpoints and the container exited with "missing beacon node endpoints (need at least 1)". The rest of the devnet runs fine without dora; re-enable once ethlambda ships Beacon API compatibility stubs.
12 EL containers booting in parallel, several were too slow to answer engine_getPayloadV5 within the Lean 4s slot window during EL warm-up; the resulting empty slots broke 3SF-mini's delta-bounded finalization rule. 180s lets the ELs warm up before slot 0 starts.
Summary
Add Lean Ethereum consensus clients as
cl_type:values inside the standardparticipants:block. Operators pick a Lean client the same way they pick lighthouse/teku/prysm/...; if the Lean client supports Engine API (today onlyethlambdadoes, via lambdaclass/ethlambda#367) it can be paired with any EL — otherwise setel_type: noneand the package skips the Eth1 side for that participant.This replaces the original
lean_participants:top-level block; that schema is removed and the canonical args files are migrated.Wired clients (8 total):
cl_type:ethlambdaghcr.io/lambdaclass/ethlambda:latestreamghcr.io/reamlabs/ream:latestel_type: nonezeamblockblaz/zeam:latestqleanqdrvm/qlean-mini:latestlanternpiertwo/lantern:latestgeanghcr.io/geanlabs/gean:latestlean_grandinesifrai/lean:latestlean_lighthousehopinheimer/lighthouse:latestThe
lean_grandine/lean_lighthouseprefixes disambiguate from the standard CL Grandine / Lighthouse types of the same name (different binaries from different repos).Plus a Prometheus + Grafana stack (mirrors lean-quickstart's
docker-compose-metrics.yaml) that scrapes every Lean node's/metricsand serves the upstream Lean client dashboard, host-published on:3000/:9090.How to run — Lean-only localnet (all 8 clients)
Save the args file below to e.g.
lean-only.yaml:This is the file checked in at
.github/tests/lean-devnet4.yaml.Run with Kurtosis 1.18.2+:
The package boots:
Access:
Smaller setups: drop nodes you don't need from
participants:. A 2-node ethlambda smoke test runs in ~3 min (.github/tests/lean-smoke.yaml); the 14-node setup above takes ~10–12 min including hash-sig keygen.Teardown:
How to run — 2× ethrex + ethlambda paired localnet
Two ethrex ELs each paired with one ethlambda via the Engine API. One ethlambda is the aggregator, the other isn't.
Save the args file below to e.g.
ethlambda-el-pair-2node.yaml:This is the file checked in at
.github/tests/ethlambda-el-pair-2node.yaml. A single-pair variant lives at.github/tests/ethlambda-el-pair.yaml.(optional) Build an ethlambda image for local testing, instead of using the published
devnet6image:Run with Kurtosis 1.18.2+:
The package boots:
el-1-ethrex-ethlambda,el-2-ethrex-ethlambda)lean-ethlambda-0= aggregator,lean-ethlambda-1= peer)Verify the chain is advancing on both pairs:
Expected (after ~30 s): both
lean_head_slotvalues match and advance by 1 every 4 s;justified/finalizedadvance slot-by-slot; both ethrexlatestblocks report identical hashes.Teardown:
How to run — cross-client EL × ethlambda experiment (6 ELs × 2 each)
12 EL+ethlambda pairs, one ethlambda aggregator. Exercises the Engine API surface across every major EL client: ethrex, nethermind, geth, erigon, nimbus-eth1, besu.
Save the args file below to e.g.
ethlambda-el-all-clients.yaml:This is the file checked in at
.github/tests/ethlambda-el-all-clients.yaml.Build the ethlambda image from feat: integrate with ethrex over the Engine API ethlambda#367 (see the previous section).
Run with Kurtosis 1.18.2+:
The package boots in ~6 min:
ethrex,nethermind,geth,erigon,nimbus-eth1,besu)lean-ethlambda-0= aggregator,1..11= peers)Verify all 12 ethlambdas are in sync and finalization is advancing:
Expected: every ethlambda at the same head_slot,
justified = head - 1,finalized = head - 2,validator_count = 12. All 12 EL chains will report identical block hashes because the Lean libp2p mesh picks a single canonical chain.Teardown:
Note on
additional_services: [dora](and similar Eth1 beacon-API consumers): they currently can't run alongside a Lean-only CL set because Lean clients don't expose an Eth1 Beacon API. ethlambda Beacon-API compatibility stubs are tracked separately.Note on
reth/ethereumjs: omitted from this experiment.ethereumjsfails the package's 2-minute TCP port-check on its WS endpoint and rolls back the entire EL launch batch;rethmay or may not work — untested in this configuration.What's in the box
src/package_io/constants.starCL_TYPEand theLEAN_CL_TYPESset the dispatcher checkssrc/package_io/input_parser.starDEFAULT_CL_IMAGES,is_aggregatorparticipant field, relaxed Eth1 bootnode + Fulu/PeerDAS guards for Lean-only networkssrc/cl/cl_launcher.starsrc/participant_network.starsrc/shared_utils/shared_utils.starget_client_namesfalls back to cl_type when cl_context is Nonesrc/dora/dora_launcher.starmain.starparticipants:with a Lean cl_type and hands them to the Lean launchersrc/lean/lean_launcher.starsrc/lean/ethlambda/ethlambda_launcher.starsrc/lean/{ream,zeam,qlean,lantern,grandine,lighthouse,gean}/<client>_launcher.starsrc/lean/metrics/metrics_launcher.starsrc/prelaunch_data_generator/lean_genesis/src/el/ethrex/ethrex_launcher.staradminHTTP API namespace soadmin_nodeInfoworks (was hanging).github/tests/lean-devnet4.yaml.github/tests/lean-smoke.yaml.github/tests/ethlambda-el-pair.yaml.github/tests/ethlambda-el-pair-2node.yaml.github/tests/ethlambda-el-all-clients.yamldocs/architecture.mdValidation
ethlambda-el-all-clients.yaml) —ethrex,nethermind,geth,erigon,nimbus-eth1,besu× 2 each paired with ethlambda. All 12 ethlambdas finalize slot-by-slot via 3SF-mini'sdelta ≤ 5rule; every EL chain converges on identical block hashes. Local Kurtosis 1.18.2 on macOS / arm64 and Debian 13 / amd64 (ethrex-office-4).ethlambda-el-pair-2node.yaml) — finalization advancing on both pairs, ethrex tips identical.ethlambda-el-pair.yaml) — ethrex's per-slot log (Fork choice updated includes payload attributes/Requested payload with id=...) confirms the Engine API loop.lean-devnet4.yaml) — all peers connect cross-client, chain advances; finalization stalls because the single aggregator's leanVM XMSS proof exceeds the 750 ms slot-aggregation deadline at 14-validator scale (knob lives in ethlambda; see "Known limitations"). Onethrex-office-3(Debian 13 / amd64).Architectural notes
el_type: nonefor non-ethlambda Lean cl_types.EL_TYPE.noneon the first participant is allowed when every participant has a Lean cl_type (no Eth1 bootnode mesh exists). Mixed networks (some Lean, some standard Eth1 CL) still need a non-nonefirst participant.eth_getBlockByNumber 0x0after the EL is up. Used to seed ethlambda'sstate.latest_execution_payload_header.block_hashso the very first FCU carries a head the EL recognizes.setsid -f(notnohup ... &) so Kurtosisexecreturns immediately — the Lean clients run dash, which doesn't understanddisown.genesis_delaytuning: 60s is enough for a 2–4 node devnet; 180s is recommended for ≥10 EL containers booting in parallel, otherwise some ELs miss their firstengine_getPayloadV5deadline and the resulting empty slots break3SF-mini'sdelta ≤ 5finalization rule.0x...tokens as ints, dropping leading zeros from XMSS pubkey hex. The Lean genesis post-process loader usesyaml.BaseLoaderto keep them as strings./bin/sh— handled by injecting a static busybox binary as a Kurtosis files artifact.hopinheimer/lighthouse:latestrejects devnet4's dual-keyGENESIS_VALIDATORSlayout. Launcher is wired in, but the client won't reach consensus until upstream ships a dual-key image.Known limitations
ethereumjsis excluded from the all-ELs args file: its container fails the package's 2-minute TCP port-check on the WS endpoint, which rolls back the entire EL launch batch.Adding a new Lean client
The contract for adding a new Lean client lives in the module docstrings (same convention the existing
src/cl/*launchers follow). Touch points:LEAN_TYPEinsrc/package_io/constants.star, add the same value toCL_TYPEandLEAN_CL_TYPES, plus a default:latestimage inDEFAULT_CL_IMAGES/DEFAULT_CL_IMAGES_MINIMAL.src/lean/<client>/<client>_launcher.starexportinginitialize(plan, node, p2p_keys_artifact, hash_sig_artifact)andstart(plan, node, service, genesis_artifact, hash_sig_artifact). The ethlambda launcher is the cleanest reference; zeam shows the scratch-image variant._launcher_for(...)insrc/lean/lean_launcher.star..github/tests/lean-devnet4.yaml(or a new args file) showing the canonical image override.