Skip to content

fix: temporary workaround for gnark-crypto AVX-512 innerProdVec SIGSEGV#3130

Closed
gusiri wants to merge 1 commit into
prod-basefrom
bugfix/gnark-crypto-avx512-innerproduct-overread
Closed

fix: temporary workaround for gnark-crypto AVX-512 innerProdVec SIGSEGV#3130
gusiri wants to merge 1 commit into
prod-basefrom
bugfix/gnark-crypto-avx512-innerproduct-overread

Conversation

@gusiri
Copy link
Copy Markdown
Contributor

@gusiri gusiri commented May 18, 2026

This PR
Temporary prover-side workaround for an upstream bug in gnark-crypto's AVX-512 assembly that causes SIGSEGV during aggregation proof generation.
On the caller side, we ensure cap > len on vectors before they reach InnerProduct, so the 4-byte overread lands in valid slack memory. This is a temporary workaround until the upstream fix.


Bug: AVX-512 innerProdVec reads 4 bytes past allocation boundary → SIGSEGV

Summary
fr.Vector.InnerProduct on amd64 with AVX-512 reads 4 bytes past the last element of the input slices. This causes SIGSEGV when the allocation ends at a page boundary.

Environment
gnark-crypto v0.20.2-0.20260402204920-39238e584b99
Go 1.24.x, Linux amd64 with AVX-512

Affected code
field/asm/element_4w/element_4w_amd64.s, lines ~735–758 (innerProdVec loop body):

#define MAC(in0, in1, in2) \
    VPMULUDQ.BCST in0, Z4, Z2  \
    ...

    MAC(0(R13), Z16, Z24)
    MAC(4(R13), Z17, Z25)
    ...
    MAC(24(R13), Z22, Z30)
    MAC(28(R13), Z23, Z31)   // ← 8-byte load at offset 28 of a 32-byte element
    ADDQ $32, R13

MAC(28(R13), ...) expands to VPMULUDQ.BCST 28(R13), Z4, Z2, which loads 8 bytes from R13+28, i.e. bytes [28..35]. Each element is 32 bytes ([0..31]), so bytes [32..35] are past the element boundary.

For elements in the middle of the array, this harmlessly reads the first 4 bytes of the next element. For the last element, those 4 bytes are past the end of the allocation.

Calling code (no capacity guarantee)
vector_amd64.go:

func (vector *Vector) InnerProduct(other Vector) (res Element) {
    ...
    innerProdVec(&res[0], &(*vector)[0], &other[0], uint64(len(*vector)))
    return
}

Raw slice element pointers are passed directly. make([]Element, n) can return cap == len, with no slack bytes after the last element.

Crash conditions
The crash requires the 4-byte overread to cross a virtual memory page boundary. This happens when:

n × 32 is a multiple of the page size (4096 bytes), i.e. n is a multiple of 128
The allocation base is page-aligned (common for large mmap'd allocations)
In our case: n = 262144, allocation = 8 MiB = exactly 2048 pages.

base            = 0xeb73c00000
last element    = base + (262144-1) × 32 = 0xeb743FFFE0
MAC(28) loads   = 0xeb743FFFFC .. 0xeb74400003  ← crosses page at 0xeb74400000

Crash signature

unexpected fault address 0xeb74400000
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x2 addr=0xeb74400000 pc=0x...]

goroutine ... [running]:
runtime: ...
github.com/consensys/gnark-crypto/ecc/bls12-377/fr.innerProdVec(...)
github.com/consensys/gnark-crypto/ecc/bls12-377/fr.(*Vector).InnerProduct(...)

Why tests don't catch it
gnark-crypto's test vectors are small. Go's allocator provides slack capacity for small allocations, so the overread lands in valid memory. The bug only manifests with large, page-aligned allocations.

Checklist

  • I wrote new tests for my new core changes.
  • I have successfully ran tests, style checker and build against my new changes locally.
  • If this change is deployed to any environment (including Devnet), E2E test coverage exists or is included in this
    PR.
  • I have informed the team of any breaking changes if there are any.

Note

High Risk
High risk because it changes core proving/compilation concurrency (new on-the-fly limitless prover path, parallel module build/compile, and ProverRuntime locking semantics) and adds low-level memory workarounds around field-vector operations.

Overview
Adds a new in-memory “on-the-fly” limitless prover (ProveOnTheFly) that skips serialized disk assets, overlaps compilation with bootstrapper proving, pipelines GL/LPP proving into hierarchical conglomeration, and releases compiled circuits early via a usage tracker.

Removes the bespoke JSONL perf_log instrumentation and makes proving concurrency configurable via env vars (e.g. LIMITLESS_SUBPROVER_JOBS), while also refactoring conglomeration/GL/LPP entrypoints to drop the perf logger plumbing.

Implements a temporary crash workaround for a gnark-crypto AVX-512 InnerProduct overread by ensuring vectors have cap > len (padding in ScalarProd and ParBatchInvert).

Speeds up and parallelizes several hot paths: parallel module build/segment compilation with optional debug-module creation, background precompilation of conglomeration, cached Plonk-in-wizard constraint counts, parallelized Vortex column extraction/Merkle proofs, reduced allocations in quotient evaluation, and cached VK columns in the gnark verifier. Also switches wizard.ProverRuntime from Mutex to RWMutex and reduces time spent holding the lock during AssignColumn.

Updates dependencies (gnark-crypto, go-corset, x/sync, x/sys) and tweaks the mainnet limitless config paths.

Reviewed by Cursor Bugbot for commit 201b28b. Bugbot is set up for automated code reviews on this repo. Configure here.

The innerProdVec assembly in gnark-crypto uses VPMULUDQ.BCST at byte
offset 28 of each 32-byte element, performing an 8-byte load that reads
4 bytes past the element boundary. On the last element this overreads
the allocation, causing SIGSEGV when page-aligned (e.g. n=262144).

- ScalarProd: copy receiver with extra capacity when cap==len
- ParBatchInvert: allocate result with cap=len+1
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 201b28b. Configure here.

Comment thread prover/config/config-mainnet-limitless.toml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant