-
Notifications
You must be signed in to change notification settings - Fork 29.2k
[SPARK-56918][CORE] Add ManagedConsumer SPI for shrinkable external storage memory #55953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,70 @@ | ||
| /* | ||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||
| * contributor license agreements. See the NOTICE file distributed with | ||
| * this work for additional information regarding copyright ownership. | ||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||
| * (the "License"); you may not use this file except in compliance with | ||
| * the License. You may obtain a copy of the License at | ||
| * | ||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||
| * | ||
| * Unless required by applicable law or agreed to in writing, software | ||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ | ||
|
|
||
| package org.apache.spark.memory | ||
|
|
||
| /** | ||
| * Storage-memory counterpart of [[org.apache.spark.memory.MemoryConsumer]]: holds bytes | ||
| * acquired via [[MemoryManager.acquireStorageMemory]] and synchronously releases them on | ||
| * Spark's request. Typical implementor: a native off-heap cache (e.g. Velox AsyncDataCache | ||
| * via Gluten) sharing `spark.memory.offHeap.size` with Spark's MemoryStore. | ||
| * | ||
| * == Contract == | ||
| * - [[name]] is the registry key (JVM-unique; ON_HEAP and OFF_HEAP share one namespace). | ||
| * Use the SAME instance for register / acquire / unregister so identity-based | ||
| * self-exclusion works during shrink rounds. | ||
| * - A component that also implements [[UnmanagedMemoryConsumer]] MUST NOT report the same | ||
| * bytes through both APIs -- they would be subtracted twice from `effectiveMaxMemory`. | ||
| * - `MemoryManager.shrinkExternal` owns storage-pool accounting: it deducts exactly | ||
| * `shrink`'s return value from the pool. Implementations MUST NOT call | ||
| * [[MemoryManager.releaseStorageMemory]] from inside [[shrink]]. | ||
| * - [[shrink]] runs inside the `MemoryManager` monitor; it MUST NOT cycle back into | ||
| * `MemoryStore.{putBytes, remove, evictBlocksToFreeSpace}` (lock-order cycle on | ||
| * `MemoryStore.entries`) and SHOULD return within | ||
| * `spark.memory.managedConsumer.shrinkWarnThresholdMs` (default 100ms) to avoid | ||
| * blocking other acquisitions. | ||
| * - [[shrink]] MUST be synchronous (claimed bytes reclaimable on return). Over-release | ||
| * and partial release are fine; negative return is a contract violation. Exceptions | ||
| * are caught and treated as 0-byte release. | ||
| */ | ||
| trait ManagedConsumer { | ||
|
|
||
| /** | ||
| * Registry key and log identifier; MUST be non-empty and JVM-unique. Defaults to | ||
| * `getClass.getSimpleName`; override for anonymous classes (where the default is "") | ||
| * or when multiple instances of the same class may coexist. | ||
| */ | ||
| def name: String = getClass.getSimpleName | ||
|
|
||
| /** Memory mode this consumer manages; [[shrink]] is only invoked when modes match. */ | ||
| def memoryMode: MemoryMode = MemoryMode.OFF_HEAP | ||
|
|
||
| /** | ||
| * Cheap snapshot of bytes currently releasable via [[shrink]]; used to skip empty | ||
| * consumers and order candidates largest-first. Non-negative; 0 means nothing to | ||
| * release right now. | ||
| */ | ||
| def getShrinkableMemoryBytes: Long | ||
|
|
||
| /** | ||
| * Synchronously release approximately `numBytes`. See class scaladoc for the full | ||
| * contract. | ||
| * | ||
| * @return actual bytes released, >= 0. Framework deducts this value from the storage pool. | ||
| */ | ||
| def shrink(numBytes: Long): Long | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -70,6 +70,16 @@ private[memory] class StorageMemoryPool( | |
| acquireMemory(blockId, numBytes, numBytesToFree) | ||
| } | ||
|
|
||
| /** | ||
| * Acquire `numBytes` for a [[ManagedConsumer]]: external bytes that never enter | ||
| * [[memoryStore]]'s `entries`, falling back to LRU eviction for any deficit. Caller is | ||
| * responsible for [[MemoryManager.shrinkExternal]] BEFORE this call; self-exclusion is | ||
| * handled in [[MemoryManager.acquireStorageMemory(self:ManagedConsumer,*]]. | ||
| */ | ||
| def acquireMemoryForManagedConsumer(numBytes: Long): Boolean = lock.synchronized { | ||
| acquireMemoryInternal(None, numBytes, math.max(0L, numBytes - memoryFree)) | ||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does the API have to be bridged with storage memory pool? IIUC Gluten demands a global memory area in the UMM that is not accounted to particular tasks. Would it be simpler to start from the unmanaged memory API added in #51778?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the look! Two points: 1. UMC is purely pull-mode: 2. The storage pool is the master ledger — must bridge into it.
On "global, not per-task" — agreed, that's what this SPI delivers: WDYT? |
||
|
|
||
| /** | ||
| * Acquire N bytes of storage memory for the given block, evicting existing ones if necessary. | ||
| * | ||
|
|
@@ -82,15 +92,21 @@ private[memory] class StorageMemoryPool( | |
| blockId: BlockId, | ||
| numBytesToAcquire: Long, | ||
| numBytesToFree: Long): Boolean = lock.synchronized { | ||
| acquireMemoryInternal(Some(blockId), numBytesToAcquire, numBytesToFree) | ||
| } | ||
|
|
||
| private def acquireMemoryInternal( | ||
| blockId: Option[BlockId], | ||
| numBytesToAcquire: Long, | ||
| numBytesToFree: Long): Boolean = { | ||
| assert(numBytesToAcquire >= 0) | ||
| assert(numBytesToFree >= 0) | ||
| assert(memoryUsed <= poolSize) | ||
| if (numBytesToFree > 0) { | ||
| memoryStore.evictBlocksToFreeSpace(Some(blockId), numBytesToFree, memoryMode) | ||
| memoryStore.evictBlocksToFreeSpace(blockId, numBytesToFree, memoryMode) | ||
| } | ||
| // NOTE: If the memory store evicts blocks, then those evictions will synchronously call | ||
| // back into this StorageMemoryPool in order to free memory. Therefore, these variables | ||
| // should have been updated. | ||
| // NOTE: If the memory store evicts blocks, those evictions synchronously call back | ||
| // into this StorageMemoryPool to free memory, so _memoryUsed is already updated. | ||
| val enoughMemory = numBytesToAcquire <= memoryFree | ||
| if (enoughMemory) { | ||
| _memoryUsed += numBytesToAcquire | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the trait name clear? Existing
MemoryConsumeris already managed by Spark.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point — "managed vs unmanaged" is a weak axis here (everything in Spark is managed in some sense). Better to name on the differentiating capability:
MemoryConsumer(Java)spill()UnmanagedMemoryConsumergetMemBytesUsedshrink()Proposal: rename to
ShrinkableMemoryConsumer. Matches the SPI's defining verb (shrink()), and forms a clean three-way distinction from the two existing traits — no overlap with either. WDYT?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more piece of evidence: "shrinkable" is already a first-class concept in the API surface —
getShrinkableMemoryBytesis the cheap snapshot the framework uses to skip consumers with nothing to give back. A consumer that always returns0is effectively un-shrinkable. So the trait nameShrinkableMemoryConsumerjust makes explicit what the method names already imply.