nethtest: add --engineTest, --stateTest, --jsonout, --workers flags by spencer-tb · Pull Request #11035 · NethermindEth/nethermind

spencer-tb · 2026-04-04T11:55:53Z

Summary

Add --engineTest flag to nethtest for direct engine fixture execution, along with --stateTest, --jsonout, --workers, and --run flags. This enables running blockchain_test_engine fixtures without requiring Hive or full client startup.

`nethtest --engineTest`

Engine tests go through the real IEngineRpcModule via IJsonRpcService.SendRequestAsync("engine_newPayloadVX"), exercising the full Engine API code path including version validation, payload processing, and forkchoice state management. This is the same path that Hive consume engine exercises, but without Docker/client startup overhead.

`--workers` flag

Adds parallel file processing via Parallel.ForEachAsync to all three runners. This doesn't do much right now unless we make it more lightweight. WIP.

Other changes

--stateTest flag — explicitly run as state test (previously the implicit default; now required)
--jsonout flag — JSON array output for blocktest/enginetest (statetest already outputs JSON)
--run alias for --filter — matching geth/erigon naming convention
Replaced NUnit Assert.That in RunNewPayloads with explicit exceptions for structured error reporting

Benchmarks

Tested against EEST v5.3.0 stable fixtures.

For reference, Hive runs of the same test suite on the same Nethermind version:

consume engine: 16h 39m
consume rlp: >24h

nethtest --engineTest — exercises the same engine code paths as consume engine (40,519 tests):

Workers	Time	Speedup vs serial	vs Hive consume engine
1	3m10s	1x	~316x
8	2m58s	1.1x	~337x

nethtest --blockTest — exercises the same execution paths as consume rlp (43,912 tests):

Workers	Time	Speedup vs serial	vs Hive consume rlp
1	1m47s	1x	>800x
8	57s	1.9x	>1,500x

nethtest --stateTest (40,553 tests):

Workers	Time	Speedup vs serial
1	1m42s	1x
8	1m12s	1.4x

Hive parity

All 7 Hive consume engine failures for Nethermind on v5.3.0 are exception mapper issues in EELS (correct INVALID status returned, but error message string not mapped in NethermindExceptionMapper). The direct runner correctly reports these as passing since the execution result is correct. To fully test the mapper changes this will need to be integrated into consume direct via EELS.

Usage

# Engine test with parallel workers
nethtest --engineTest --workers 8 --input /path/to/blockchain_tests_engine/

# Block test with JSON output
nethtest --blockTest --jsonout --workers 8 --input /path/to/blockchain_tests/

# State test with regex filter
nethtest --stateTest --run "eip4844" --input /path/to/state_tests/

# No flag -> prints usage
nethtest --input foo.json
# "Please specify one of: --stateTest, --blockTest, or --engineTest"

Related: ethereum/go-ethereum#34650, erigontech/erigon#20315, ethereum/execution-specs#2650

…ests Parse all fixture files upfront and flatten to individual test cases before distributing to workers, instead of parallelizing by file. Nethermind full suite: 2m48s with 8 workers (was 2m58s file-level).

- Pre-warm thread pool capped at CPU count (fixes starvation at w>=4) - Parallel file parsing via Parallel.For - UTF-8 byte span deserialization (File.ReadAllBytes + ReadOnlySpan<byte>) - Pre-allocated results array instead of ConcurrentBag + sort - Lightweight BlockchainTestsRunner constructor for parallel path - Compiled Regex for filter matching - Pre-initialize KZG once upfront Prague engine tests: 24.1s → 18.5s at w=8 (23% faster). Full suite: 2m48s → 1m45s at w=4 (37% faster).

…r field, hide timeInMs

…irect - Add ForkName to BlockchainTest, use raw network string (e.g. CancunToPragueAtTime15k) - Add lastBlockHash, lastPayloadStatus to JSON output for block/engine tests - Remove stateRoot from block/engine results (only state tests) - Report validation error in `error` even on pass for negative tests: engine tests via PayloadStatus.ValidationError, block tests via BlockRemoved event and ValidateSuggestedBlock, state tests via TransactionResult.ErrorDescription - Suggest invalid blocks (blockHeader=null) to actually test rejection - Enables EELS consume direct exception mapping for all test types

… hardcoded string

New `validate` CLI command for running EEST fixtures directly against client EVM binaries, replacing Hive for execution correctness testing. Usage: validate health # health check all clients validate engine --client geth # engine tests validate state --client besu # state tests validate block --client nethermind # block tests Features: - 7 clients: geth, besu, nethermind, erigon, reth, ethrex, nimbus - Per-type Pydantic result models: StateTestResult, BlockTestResult, EngineTestResult with type-specific fields - Exception matching: maps client error strings to EEST exception types via ExceptionMapper, verifies correct exception for every invalid test (--no-exception-check to disable) - Cross-validation: lastBlockHash against fixture, lastPayloadStatus (VALID/INVALID) for engine tests - validate.toml config for client binary paths with per-type overrides (state-bin, block-bin, engine-bin) - Auto bin-workers and xdist tuning per client - Bundled Frontier sanity fixtures for health checks - Shared validate_helpers.py for validation logic Client binary PRs: - geth: ethereum/go-ethereum#34650 - erigon: erigontech/erigon#20315 - besu: besu-eth/besu#10184 - nethermind: NethermindEth/nethermind#11035 - reth: paradigmxyz/reth#23361 - ethrex: lambdaclass/ethrex#6445 - nimbus: status-im/nimbus-eth1#4101 - revm: bluealloy/revm#3544 Tracking issue: ethereum#2319

kamilchodola · 2026-04-28T15:04:28Z

@claude review

claude · 2026-04-28T15:04:53Z

Claude finished @kamilchodola's task in 6m 43s —— View job

Code Review

Read repo rules and coding guidelines
Analyze PR diff and changed files
Review Program.cs — CLI parsing and runner orchestration
Review BlockchainTestsRunner.cs — engine test execution
Review BlockchainTestBase.cs — engine payload handling
Review JsonToEthereumTest.cs — test deserialization
Review StateTestRunner.cs, EthereumTestResult.cs, FileTestsSource.cs
Compile findings and post review

Summary

This PR adds --engineTest, --stateTest, --jsonout, --workers, and --run flags to the nethtest runner, enabling direct execution of Engine API fixtures without Hive overhead. The benchmarks (>300× faster than Hive) are compelling. The core engine path through IJsonRpcService is a clean design. However, there are several correctness and safety issues, two of which are high severity.

Counts: 0 Critical / 3 High / 5 Medium / 4 Low

High

H1 — Thread-safety race on `DifficultyCalculator.Wrapped`

Ethereum.Test.Base/BlockchainTestBase.cs:109

DifficultyCalculator is a static property on BlockchainTestBase. When --workers > 1, multiple parallel invocations of RunTest each assign DifficultyCalculator.Wrapped = new EthashDifficultyCalculator(specProvider) concurrently, then read back through the same static reference when blocks are processed. Two concurrent block tests from different forks can cross-contaminate each other's spec, silently producing wrong pass/fail results.

This only affects --blockTest with parallel workers today, but will produce wrong outcomes without any exception. The engine path doesn't call DifficultyCalculator so it is safe for now.

Fix: make DifficultyCalculator an instance field on BlockchainTestBase, since each test already uses a separate container.

H2 — Null dereference when `loadedTest` is neither `FailedToLoadTest` nor `BlockchainTest`

Nethermind.Test.Runner/BlockchainTestsRunner.cs:86-88

BlockchainTest test = loadedTest as BlockchainTest;   // null if unexpected type
if (filter is not null && test.Name is not null && ...)  // NRE here

LoadTests<EthereumTest>() can return any EthereumTest subtype. If it's neither FailedToLoadTest nor BlockchainTest, test is null and test.Name throws NullReferenceException. The preferred fix is:

if (loadedTest is not BlockchainTest test) continue;

H3 — Broad silent-continue masks real RPC errors in `RunNewPayloads`

Ethereum.Test.Base/BlockchainTestBase.cs:369-374 (inline comment posted)

When an invalid payload is expected (validationError != null) and an RPC-level error occurs, the code silently continues without checking what kind of error occurred. An internal server error, deserialization exception, or wrong-version error would be treated identically to an expected INVALID status, causing a false positive (test "passes" when the engine returned an unexpected error).

Medium

M1 — Dead code / copy-paste in parallel block-test path

Nethermind.Test.Runner/Program.cs:213 (inline comment posted)

item.file.Contains("chain") ? chainId : chainId — both branches are identical. Should be simplified to just chainId, or the intended conditional should be implemented.

M2 — Exit code 0 on usage error

Nethermind.Test.Runner/Program.cs:106 (inline comment posted)

When no test type flag is provided, the tool exits with success. CI scripts cannot distinguish a misconfiguration from a successful run. Should return 1.

M3 — `ParseBlockchainTestFile` is dead code

Nethermind.Test.Runner/Program.cs:227 (inline comment posted)

This 30-line method is defined but never called. The code inside it also adds null entries to a List<BlockchainTest> which would cause NREs if wired up. Should be removed.

M4 — `int.Parse` on untrusted fixture data without error handling

Ethereum.Test.Base/BlockchainTestBase.cs:351,358,359

int.Parse on ForkChoiceUpdatedVersion / NewPayloadVersion throws FormatException on malformed fixtures with no context. Use int.TryParse with a descriptive error message.

M5 — Pervasive `var` usage violates coding style

Nethermind.Test.Runner/Program.cs throughout

The coding style rule says: "Avoid var — spell out types." Nearly every local in the new code uses var, including simple cases like var source = new TestsSourceLoader(...), var result = ..., var bag = new ConcurrentBag<...>().

Low

L1 — Suboptimal `as` + null check pattern

Nethermind.Test.Runner/BlockchainTestsRunner.cs:74

if (loadedTest as FailedToLoadTest is not null) → prefer if (loadedTest is FailedToLoadTest).

L2 — EIP-3155 error value embedded in JSON without encoding

Nethermind.Test.Runner/StateTestRunner.cs:63 (inline comment posted)

$",\"error\":\"{entry.Error}\"" — if entry.Error contains ", \, or control characters, the output is invalid JSON.

L3 — Filter regex compiled per test in `BlockchainTestsRunner`

Nethermind.Test.Runner/BlockchainTestsRunner.cs:88

Regex.Match(test.Name, $"^({filter})") allocates a new regex parse tree on every test. Since filter is known at construction time, compile it once to a field.

L4 — `_testsSource!` null-forgiving in `RunTestsAsync`

Nethermind.Test.Runner/BlockchainTestsRunner.cs:71

The lightweight constructor explicitly sets _testsSource = null, so _testsSource!.LoadTests<>() will throw NullReferenceException if called on that instance. A guard with a clear InvalidOperationException message would be more helpful.

Other observations

The PR description notes 7 Hive consume engine failures that are masked here because the direct runner only checks payload status, not the ValidationError string mapping. Worth a code comment so future contributors don't treat 100% pass as full Hive parity.
BlockchainTestBase.cs retains using NUnit.Framework; and Assert.That/Assert.Fail calls. These throw AssertionException/SuccessException, which the parallel wrapper in Program.cs catches as generic Exception. Error messages will contain NUnit boilerplate — not a blocker but worth a follow-up.

claude · 2026-04-28T15:10:16Z

+                try
+                {
+                    var source = new TestsSourceLoader(new LoadBlockchainTestFileStrategy(), item.file);
+                    var runner = new BlockchainTestsRunner(source, filter, item.file.Contains("chain") ? chainId : chainId, trace: false, traceMemory, traceStack, jsonOutput: true, suppressOutput: true);


[Medium] Dead ternary — both branches are identical

item.file.Contains("chain") ? chainId : chainId always evaluates to chainId. This looks like an unfinished conditional (perhaps meant to use a Gnosis chain ID for gnosis fixtures). Either simplify to just chainId, or complete the intended logic.

Suggested change

var runner = new BlockchainTestsRunner(source, filter, item.file.Contains("chain") ? chainId : chainId, trace: false, traceMemory, traceStack, jsonOutput: true, suppressOutput: true);

var runner = new BlockchainTestsRunner(source, filter, chainId, trace: false, traceMemory, traceStack, jsonOutput: true, suppressOutput: true);

claude · 2026-04-28T15:10:27Z

+        if (testTypeCount != 1)
+        {
+            Console.WriteLine("Please specify one of: --stateTest, --blockTest, or --engineTest");
+            return 0;


[Medium] Returns exit code 0 on usage error

Callers (CI scripts, wrappers) cannot distinguish a misconfiguration from a successful empty run. This should return a non-zero exit code to signal the error.

Suggested change

return 0;

return 1;

claude · 2026-04-28T15:10:34Z

+                var stackJson = string.Join(",", entry.Stack.Select(s => $"\"{s}\""));
+                Console.Error.Write($"{{\"pc\":{entry.Pc},\"op\":{entry.Operation},\"gas\":\"0x{entry.Gas:x}\",\"gasCost\":\"0x{entry.GasCost:x}\",\"stack\":[{stackJson}],\"depth\":{entry.Depth},\"memSize\":{entry.MemSize}");
+                if (!string.IsNullOrEmpty(entry.Error))
+                    Console.Error.Write($",\"error\":\"{entry.Error}\"");


[Low] Error string embedded in JSON without encoding

If entry.Error contains ", \, or control characters, the output is malformed JSON. Use JsonSerializer.Serialize to safely encode the value:

Suggested change

Console.Error.Write($",\"error\":\"{entry.Error}\"");

Console.Error.Write($",\"error\":{System.Text.Json.JsonSerializer.Serialize(entry.Error)}");

claude · 2026-04-28T15:10:50Z

+                if (validationError is null)
+                    throw new Exception(
+                        $"engine_newPayloadV{newPayloadVersion} unexpected RPC error: {errorResponse.Error?.Code} {errorResponse.Error?.Message}");
                continue;


[High] Silent continue masks real RPC errors when invalid payload is expected

When validationError != null (test expects rejection), any RPC-level error is silently skipped. An internal server error, deserialization failure, or version mismatch is treated identically to an expected INVALID response, causing a false-positive pass.

Consider at minimum throwing on unexpected error codes (e.g. -32601 method not found, -32603 internal error) even when a validation error is expected:

if (npResponse is JsonRpcErrorResponse errorResponse) { if (validationError is null) throw new Exception( $"engine_newPayloadV{newPayloadVersion} unexpected RPC error: {errorResponse.Error?.Code} {errorResponse.Error?.Message}"); // RPC-level errors like method-not-found or internal-error are not valid rejections. // Only -32602 (InvalidParams) or engine-specific codes map to expected invalid-payload behaviour. if (errorResponse.Error?.Code is not (ErrorCodes.InvalidParams or MergeErrorCodes.UnsupportedFork)) throw new Exception( $"engine_newPayloadV{newPayloadVersion} unexpected RPC error code {errorResponse.Error?.Code}: {errorResponse.Error?.Message}"); continue; }

claude · 2026-04-28T15:11:00Z

+        return bag.OrderBy(x => x.index).SelectMany(x => x.results).ToList();
+    }
+
+    private static List<BlockchainTest> ParseBlockchainTestFile(string file, Regex? filterRegex)


[Medium] Dead code — ParseBlockchainTestFile is never called

This method is defined but has no callers. It also adds null entries to the returned list (lines 238, 251), which would cause NullReferenceException if the code were ever wired up. Remove it to keep the file clean.

nethtest: add --stateTest, --engineTest, --jsonout, --workers flags

a5d2ab4

spencer-tb force-pushed the feat/evm-enginetest branch from 4e22df4 to a5d2ab4 Compare April 4, 2026 11:59

This was referenced Apr 4, 2026

feat: consume direct support for all EL clients with engine fixture runners ethereum/execution-specs#2650

Open

evmtool: add engine-test subcommand and parallel workers for test runners besu-eth/besu#10184

Draft

test runner: add test-case level parallelism for engine/block/state t…

42f9aa3

…ests Parse all fixture files upfront and flatten to individual test cases before distributing to workers, instead of parallelizing by file. Nethermind full suite: 2m48s with 8 workers (was 2m58s file-level).

spencer-tb mentioned this pull request Apr 5, 2026

cmd/evm: add enginetest command and parallel workers for test runners erigontech/erigon#20315

Closed

spencer-tb added 5 commits April 5, 2026 19:40

test runner: fix statetest JSON schema — remove name suffix, add erro…

7b72dfc

…r field, hide timeInMs

test runner: use full fixture key as test name

31b80d3

test runner: add EIP-3155 trace output to stderr for statetest

e9abae3

revert: restore test-case parallelism and byte-based loading

c4575f0

fix: restore blocktest RunTestsAsync path, add --jsonout to all formats

21a13a1

spencer-tb mentioned this pull request Apr 5, 2026

feat(validate): new validate command for direct client binary testing ethereum/execution-specs#2622

Draft

4 tasks

spencer-tb added 2 commits April 6, 2026 00:04

fix: OverridableReleaseSpec.Name delegates to wrapped spec instead of…

afcb9de

… hardcoded string

kamilchodola requested review from dmitriy-b and stdevMac April 28, 2026 08:57

claude Bot reviewed Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nethtest: add --engineTest, --stateTest, --jsonout, --workers flags#11035

nethtest: add --engineTest, --stateTest, --jsonout, --workers flags#11035
spencer-tb wants to merge 10 commits into
NethermindEth:masterfrom
spencer-tb:feat/evm-enginetest

spencer-tb commented Apr 4, 2026 •

edited

Loading

Uh oh!

kamilchodola commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 •

edited

Loading

Uh oh!

claude Bot Apr 28, 2026

Uh oh!

claude Bot Apr 28, 2026

Uh oh!

claude Bot Apr 28, 2026

Uh oh!

claude Bot Apr 28, 2026

Uh oh!

claude Bot Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	var runner = new BlockchainTestsRunner(source, filter, item.file.Contains("chain") ? chainId : chainId, trace: false, traceMemory, traceStack, jsonOutput: true, suppressOutput: true);
	var runner = new BlockchainTestsRunner(source, filter, chainId, trace: false, traceMemory, traceStack, jsonOutput: true, suppressOutput: true);

	Console.Error.Write($",\"error\":\"{entry.Error}\"");
	Console.Error.Write($",\"error\":{System.Text.Json.JsonSerializer.Serialize(entry.Error)}");

Conversation

spencer-tb commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

nethtest --engineTest

--workers flag

Other changes

Benchmarks

Hive parity

Usage

Uh oh!

kamilchodola commented Apr 28, 2026

Uh oh!

claude Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review

Summary

High

H1 — Thread-safety race on DifficultyCalculator.Wrapped

H2 — Null dereference when loadedTest is neither FailedToLoadTest nor BlockchainTest

H3 — Broad silent-continue masks real RPC errors in RunNewPayloads

Medium

M1 — Dead code / copy-paste in parallel block-test path

M2 — Exit code 0 on usage error

M3 — ParseBlockchainTestFile is dead code

M4 — int.Parse on untrusted fixture data without error handling

M5 — Pervasive var usage violates coding style

Low

L1 — Suboptimal as + null check pattern

L2 — EIP-3155 error value embedded in JSON without encoding

L3 — Filter regex compiled per test in BlockchainTestsRunner

L4 — _testsSource! null-forgiving in RunTestsAsync

Other observations

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

spencer-tb commented Apr 4, 2026 •

edited

Loading

`nethtest --engineTest`

`--workers` flag

claude Bot commented Apr 28, 2026 •

edited

Loading

H1 — Thread-safety race on `DifficultyCalculator.Wrapped`

H2 — Null dereference when `loadedTest` is neither `FailedToLoadTest` nor `BlockchainTest`

H3 — Broad silent-continue masks real RPC errors in `RunNewPayloads`

M3 — `ParseBlockchainTestFile` is dead code

M4 — `int.Parse` on untrusted fixture data without error handling

M5 — Pervasive `var` usage violates coding style

L1 — Suboptimal `as` + null check pattern

L3 — Filter regex compiled per test in `BlockchainTestsRunner`

L4 — `_testsSource!` null-forgiving in `RunTestsAsync`