Skip to content

Add code size cache to the kvt backend#4214

Draft
bhartnett wants to merge 8 commits into
masterfrom
kvt-codesize-cache
Draft

Add code size cache to the kvt backend#4214
bhartnett wants to merge 8 commits into
masterfrom
kvt-codesize-cache

Conversation

@bhartnett
Copy link
Copy Markdown
Contributor

@bhartnett bhartnett commented May 6, 2026

Geth has a similar code size cache which holds a million values in a LRU. For the kvt rocksdb backend we don't have any caching and the rocksdb row cache is disabled so this should in theory provide a speedup.

@bhartnett
Copy link
Copy Markdown
Contributor Author

bhartnett commented May 6, 2026

Ran a block import of 200000 blocks starting from around block 23 million:

master -
elapsed=12m48s897ms

codesize branch -
elapsed=12m22s962ms

Seems like not much improvement since code is already cached in the ledger. For the forkchain block import which doesn't reuse the caches between blocks, the improvement would likely be more.

@bhartnett bhartnett requested a review from arnetheduck May 6, 2026 15:21
@bhartnett
Copy link
Copy Markdown
Contributor Author

@arnetheduck Do you have any thoughts on this PR?

We could either go with this code size cache or just rely on the code cache to return the code sizes.

I had a look at what the other EL clients do for caching code and it looks like only Geth caches code sizes separately. Most of the other ELs just cache code using a variable weighted LRU so that larger code has more weight than smaller code.

Either way we will also need to do something about the current code cache in the ledger which doesn't get reused across blocks because in the forked chain module, a new ledger is created when processing each block.

Perhaps if we use a variable weighted LRU then we can safely fit more code in the cache and then there isn't so much need for a separate code size cache.

@arnetheduck
Copy link
Copy Markdown
Member

For the forkchain block import

sounds to me like an opportunity to extend the use of the code cache - a notable source of cpu usage is actually the code scanning done for jump analysis - in the future we might want to cache other "analysis" done on the bytecode (aka jit optimizations). Broadly, we want linear forked chain and import performance to be roughly the same (minus state root verification).

weighted cache

I guess the risk here is that you can damage the code cache with code size requests - the more pre-computation we perform on the code, the greater this risk (afair they were mispriced at some early point in the chain) - that said, I've thought about introducing weighted caches elsewhere (leaf vs branch in the mpt, in particular) so it's certainly a track worth investigating from a perf point of view.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants