fix: temporary workaround for gnark-crypto AVX-512 innerProdVec SIGSEGV#3130
Closed
gusiri wants to merge 1 commit into
Closed
fix: temporary workaround for gnark-crypto AVX-512 innerProdVec SIGSEGV#3130gusiri wants to merge 1 commit into
gusiri wants to merge 1 commit into
Conversation
The innerProdVec assembly in gnark-crypto uses VPMULUDQ.BCST at byte offset 28 of each 32-byte element, performing an 8-byte load that reads 4 bytes past the element boundary. On the last element this overreads the allocation, causing SIGSEGV when page-aligned (e.g. n=262144). - ScalarProd: copy receiver with extra capacity when cap==len - ParBatchInvert: allocate result with cap=len+1
Contributor
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 201b28b. Configure here.
This was referenced May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

This PR
Temporary prover-side workaround for an upstream bug in gnark-crypto's AVX-512 assembly that causes SIGSEGV during aggregation proof generation.
On the caller side, we ensure cap > len on vectors before they reach InnerProduct, so the 4-byte overread lands in valid slack memory. This is a temporary workaround until the upstream fix.
Bug: AVX-512 innerProdVec reads 4 bytes past allocation boundary → SIGSEGV
Summary
fr.Vector.InnerProduct on amd64 with AVX-512 reads 4 bytes past the last element of the input slices. This causes SIGSEGV when the allocation ends at a page boundary.
Environment
gnark-crypto v0.20.2-0.20260402204920-39238e584b99
Go 1.24.x, Linux amd64 with AVX-512
Affected code
field/asm/element_4w/element_4w_amd64.s, lines ~735–758 (innerProdVec loop body):
MAC(28(R13), ...) expands to VPMULUDQ.BCST 28(R13), Z4, Z2, which loads 8 bytes from R13+28, i.e. bytes [28..35]. Each element is 32 bytes ([0..31]), so bytes [32..35] are past the element boundary.
For elements in the middle of the array, this harmlessly reads the first 4 bytes of the next element. For the last element, those 4 bytes are past the end of the allocation.
Calling code (no capacity guarantee)
vector_amd64.go:
Raw slice element pointers are passed directly. make([]Element, n) can return cap == len, with no slack bytes after the last element.
Crash conditions
The crash requires the 4-byte overread to cross a virtual memory page boundary. This happens when:
n × 32 is a multiple of the page size (4096 bytes), i.e. n is a multiple of 128
The allocation base is page-aligned (common for large mmap'd allocations)
In our case: n = 262144, allocation = 8 MiB = exactly 2048 pages.
Crash signature
Why tests don't catch it
gnark-crypto's test vectors are small. Go's allocator provides slack capacity for small allocations, so the overread lands in valid memory. The bug only manifests with large, page-aligned allocations.
Checklist
PR.
Note
High Risk
High risk because it changes core proving/compilation concurrency (new on-the-fly limitless prover path, parallel module build/compile, and
ProverRuntimelocking semantics) and adds low-level memory workarounds around field-vector operations.Overview
Adds a new in-memory “on-the-fly” limitless prover (
ProveOnTheFly) that skips serialized disk assets, overlaps compilation with bootstrapper proving, pipelines GL/LPP proving into hierarchical conglomeration, and releases compiled circuits early via a usage tracker.Removes the bespoke JSONL
perf_loginstrumentation and makes proving concurrency configurable via env vars (e.g.LIMITLESS_SUBPROVER_JOBS), while also refactoring conglomeration/GL/LPP entrypoints to drop the perf logger plumbing.Implements a temporary crash workaround for a gnark-crypto AVX-512
InnerProductoverread by ensuring vectors havecap > len(padding inScalarProdandParBatchInvert).Speeds up and parallelizes several hot paths: parallel module build/segment compilation with optional debug-module creation, background precompilation of conglomeration, cached Plonk-in-wizard constraint counts, parallelized Vortex column extraction/Merkle proofs, reduced allocations in quotient evaluation, and cached VK columns in the gnark verifier. Also switches
wizard.ProverRuntimefromMutextoRWMutexand reduces time spent holding the lock duringAssignColumn.Updates dependencies (
gnark-crypto,go-corset,x/sync,x/sys) and tweaks the mainnet limitless config paths.Reviewed by Cursor Bugbot for commit 201b28b. Bugbot is set up for automated code reviews on this repo. Configure here.