Add durable execution Step + Wait end-to-end#2360
Conversation
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with retry strategy, semantics, and serializer hooks - IRetryStrategy + ExponentialRetryStrategy + retry decision factories - ICheckpointSerializer + DefaultJsonCheckpointSerializer - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, retry, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - DurableLogger replay-suppression - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2
55db890 to
92e2428
Compare
9d78744 to
075b5e6
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with retry strategy, semantics, and serializer hooks - IRetryStrategy + ExponentialRetryStrategy + retry decision factories - ICheckpointSerializer + DefaultJsonCheckpointSerializer - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, retry, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - DurableLogger replay-suppression - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2
92e2428 to
edd8c5f
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with retry strategy, semantics, and serializer hooks - IRetryStrategy + ExponentialRetryStrategy + retry decision factories - ICheckpointSerializer + DefaultJsonCheckpointSerializer - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, retry, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - DurableLogger replay-suppression - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2
edd8c5f to
fe03624
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with retry strategy, semantics, and serializer hooks - IRetryStrategy + ExponentialRetryStrategy + retry decision factories - ICheckpointSerializer + DefaultJsonCheckpointSerializer - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, retry, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - DurableLogger replay-suppression - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2
fe03624 to
322fa09
Compare
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with retry strategy, semantics, and serializer hooks - IRetryStrategy + ExponentialRetryStrategy + retry decision factories - ICheckpointSerializer + DefaultJsonCheckpointSerializer - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, retry, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - DurableLogger replay-suppression - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2
322fa09 to
983c9aa
Compare
| { | ||
| public Task<DurableExecutionInvocationOutput> FunctionHandler( | ||
| DurableExecutionInvocationInput invocationInput, ILambdaContext context) | ||
| => DurableFunction.WrapAsync<OrderEvent, OrderResult>( |
There was a problem hiding this comment.
needed to add another overload here for native AOT
normj
left a comment
There was a problem hiding this comment.
I think we need to revisit the JSON serializing and having a separate ICheckpointSerializer instead of reusing the ILambdaSerializer but I'm good with that being a separate PR.
| Updates = pendingOperations is List<SdkOperationUpdate> list ? list : pendingOperations.ToList() | ||
| }; | ||
|
|
||
| var response = await _lambdaClient.CheckpointDurableExecutionAsync(request, cancellationToken); |
There was a problem hiding this comment.
Since the SDK call can fail for a variety of reasons like permissions. I wonder if we should wrap the SDK call in a try/catch and then wrap the SDK's exception in a DurableExecutionException and then you can give an error message with the context of what the SDK was being used for with the SDK exception message. I'm wanting users to not see an SDK stack trace and exception without any context of what it was being used for.
| Marker = marker | ||
| }; | ||
|
|
||
| var response = await _lambdaClient.GetDurableExecutionStateAsync(request, cancellationToken); |
There was a problem hiding this comment.
Since the SDK call can fail for a variety of reasons like permissions. I wonder if we should wrap the SDK call in a try/catch and then wrap the SDK's exception in a DurableExecutionException and then you can give an error message with the context of what the SDK was being used for with the SDK exception message. I'm wanting users to not see an SDK stack trace and exception without any context of what it was being used for.
| /// <remarks> | ||
| /// Replay tracking mirrors the Python / Java / JavaScript reference SDKs: | ||
| /// <list type="bullet"> | ||
| /// <item>At construction the workflow is "replaying" iff any user-replayable |
There was a problem hiding this comment.
should say if and only if
| /// </summary> | ||
| public string NextId() | ||
| { | ||
| var counter = ++_counter; |
There was a problem hiding this comment.
Wasn't sure if threading for this is a concern when you get to the parallel steps. If so then maybe this line should be:
var counter = System.Threading.Interlocked.Increment(ref _counter);
| return ToHex(hash); | ||
| } | ||
|
|
||
| private static string ToHex(ReadOnlySpan<byte> bytes) |
There was a problem hiding this comment.
Since you have the SDK you could potentially use the SDK's ToHex method Amazon.Util.AWSSDKUtils.ToHex().
| { | ||
| Id = OperationId, | ||
| Type = OperationTypes.Step, | ||
| Action = "FAIL", |
There was a problem hiding this comment.
Any reason you can't use the enum OperationAction.FAIL instead of the string literal?
| { | ||
| Id = OperationId, | ||
| Type = OperationTypes.Wait, | ||
| Action = "START", |
Implements the minimum viable slice of the Amazon.Lambda.DurableExecution SDK: a workflow can run StepAsync and WaitAsync against a real Lambda, with replay-aware checkpointing wired through to the AWS service. Public API surface introduced: - DurableFunction.WrapAsync — entry point that handles the durable execution envelope (input hydration, output construction, status mapping) - IDurableContext.StepAsync / WaitAsync (4 Step overloads, 1 Wait) - StepConfig with serializer hook (retry deferred to follow-up PR) - ICheckpointSerializer interface - [DurableExecution] attribute (recognized by future source generator) - DurableExecutionException base + StepException Internals: - DurableExecutionHandler — Task.WhenAny race between user code and the suspension signal, returning Succeeded/Failed/Pending - ExecutionState — replay-aware operation lookup and pending checkpoint buffer - OperationIdGenerator — deterministic, replay-stable IDs - TerminationManager — TaskCompletionSource-based suspension trigger - LambdaDurableServiceClient — wraps AWSSDK.Lambda's checkpoint and state APIs Tests: - 86 unit tests covering enums, exceptions, models, configs, ID generation, termination, execution state, the handler race, the context (Step + Wait paths), and the WrapAsync entry point - 8 end-to-end integration tests deploying real Lambdas via Docker on the provided.al2023 runtime: StepWaitStep, MultipleSteps, WaitOnly, LongerWait, ReplayDeterminism, RetrySucceeds, RetryExhausts, StepFails Out of scope (follow-up PRs): - IRetryStrategy, ExponentialRetryStrategy, retry decision factories - DefaultJsonCheckpointSerializer - DurableLogger replay-suppression (currently returns NullLogger) - Callbacks, InvokeAsync, ParallelAsync, MapAsync, RunInChildContextAsync, WaitForConditionAsync — interface intentionally does not declare them - Annotations source-generator integration - DurableTestRunner / Amazon.Lambda.DurableExecution.Testing package - dotnet new lambda.DurableFunction blueprint stack-info: PR: #2360, branch: GarrettBeatty/stack/2 remove update update update update
Match the Python / Java / JavaScript reference SDKs' replay-mode model: the workflow is "replaying" iff it has not yet revisited every checkpointed completed user-replayable operation. A single global flag flipped on the first fresh op (the prior model) misclassified workflow- body code that runs before the first step and would not generalize to Map/Parallel/Callback later. ExecutionState changes: - Replace `Mode`/`ExecutionMode`/`EnterExecutionMode()` with `IsReplaying` + `TrackReplay(operationId)`. - Initial replay decision: any non-EXECUTION op present means we're replaying. The service always sends an EXECUTION-type op carrying the input payload — that's bookkeeping, not user history, so it does not count toward replay (matches Python execution.py:258, Java ExecutionManager:81, JS execution-context.ts:62). - TrackReplay flips IsReplaying false once every checkpointed terminal- status non-EXECUTION op has been visited. Terminal set matches Python's: SUCCEEDED, FAILED, CANCELLED, STOPPED. Operation changes: - DurableOperation.ExecuteAsync calls TrackReplay(OperationId) at the top, so every operation participates in visit accounting without each subclass needing to remember. - StepOperation/WaitOperation drop their manual EnterExecutionMode calls. Tests: - ExecutionStateTests rewritten around IsReplaying/TrackReplay, including pinning regressions: only-EXECUTION-op ⇒ NotReplaying, all-visited ⇒ flips out of replay, PENDING ops do not block transition, idempotency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Serializer DurableExecution now reads the registered ILambdaSerializer from the per-invocation ILambdaContext (added in the prior PR) for both step-result checkpointing and workflow input/output. AOT-safety is now determined entirely by which serializer the user registers with LambdaBootstrapBuilder.Create — there is no longer a forked path between reflection-based and AOT-safe APIs. Removed: - ICheckpointSerializer<T> + SerializationContext record - ReflectionJsonCheckpointSerializer<T> - The four JsonSerializerContext-taking overloads of DurableFunction.WrapAsync - The IDurableContext.StepAsync overload that took ICheckpointSerializer<T> - All [RequiresUnreferencedCode]/[RequiresDynamicCode] attributes and their related [UnconditionalSuppressMessage] shims Net result: 8 WrapAsync overloads → 4, 3 StepAsync overloads → 2, zero trim attributes in the public API. The AOT smoke test continues to publish with zero IL2026/IL3050 warnings.
- Wrap LambdaDurableServiceClient SDK calls in DurableExecutionException with
durable-execution context (which call, which ARN). User logs no longer show
bare AWSSDK stack traces. Update IsTerminalCheckpointError to unwrap the
inner AmazonServiceException for classification.
- Move public-API files out of Models/, Config/, Exceptions/ into the project
root so folder layout matches the Amazon.Lambda.DurableExecution namespace.
- Replace string action literals ("SUCCEED", "FAIL", "START") with the
Amazon.Lambda.OperationAction enum constants.
- Replace hand-rolled ToHex with Amazon.Util.AWSSDKUtils.ToHex. Drop the
netstandard2.0 SHA-256 fallback now that DurableExecution targets net8+.
- Spell "iff" as "if and only if" in ExecutionState replay-mode docs.
Tests updated for the new wrapping shape: terminal classification asserts on
DurableExecutionException with the inner SDK exception preserved; transient
and hydration paths assert ThrowsAsync<DurableExecutionException> with
InnerException set to the original AmazonServiceException.
|
Well-architected first slice of the durable execution SDK. The separation of concerns (DurableFunction entry point → DurableExecutionHandler race → per-operation classes → Important1.
Consider precomputing the count at private int _remainingReplayOps;
// in LoadFromCheckpoint:
_remainingReplayOps = _operations.Values
.Count(op => op.Type != OperationTypes.Execution && IsTerminalStatus(op.Status));
// in TrackReplay:
if (_visitedOperations.Add(operationId)
&& _operations.TryGetValue(operationId, out var op)
&& op.Type != OperationTypes.Execution
&& IsTerminalStatus(op.Status))
{
if (--_remainingReplayOps <= 0)
_isReplaying = false;
}
If the service envelope arrives without an EXECUTION-type operation (malformed input, wire issue), the workflow receives null/default input with no signal. This would surface
The state-pagination loop (while (!string.IsNullOrEmpty(nextMarker))) and DrainAsync don't observe any cancellation token. If the Lambda runtime signals shutdown mid-hydration,
This converter lives in the Amazon.Lambda.DurableExecution namespace (not Internal) and is public sealed. It seems like an implementation detail that shouldn't be on the public Nits
Questions
|
this will be in follow up PR. will address your other comments though |
i dont think this matters too much, im going to keep this as-is for now |
- ExecutionState.TrackReplay is now O(1) amortized via a remaining-ops counter populated at LoadFromCheckpoint time. - ExtractUserPayload throws DurableExecutionException on a malformed envelope (missing EXECUTION op) instead of silently returning default!. - UpperSnakeCaseEnumConverter is now internal and lives in the ...DurableExecution.Internal namespace.
|
@philasmar addressed your comments |
| } | ||
| ``` | ||
|
|
||
| For **NativeAOT** deployments, pass a `JsonSerializerContext` so the SDK can serialize/deserialize your input and output types without reflection: |
There was a problem hiding this comment.
Can you update the design doc now that for Native AOT we are relying on the user to properly configure ILambdaSerializer?
| /// <summary> | ||
| /// Execute a step with automatic checkpointing. The step result is serialized | ||
| /// to a checkpoint using the <see cref="ILambdaSerializer"/> registered on | ||
| /// <see cref="ILambdaContext.Serializer"/> (typically configured via |
There was a problem hiding this comment.
Don't say "(typically configured via LambdaBootstrapBuilder.Create(handler, serializer)" because most users use the class library programming model which relies on the assembly attribute.
| var serializer = LambdaContext.Serializer | ||
| ?? throw new InvalidOperationException( | ||
| "No ILambdaSerializer is registered on ILambdaContext.Serializer. " + | ||
| "Register a serializer via LambdaBootstrapBuilder.Create(handler, serializer) " + |
There was a problem hiding this comment.
You need to rework the wording here because must user use the class library programming model which likes on the assembly attribute.
| var serializer = lambdaContext.Serializer | ||
| ?? throw new InvalidOperationException( | ||
| "No ILambdaSerializer is registered on ILambdaContext.Serializer. " + | ||
| "Register a serializer via LambdaBootstrapBuilder.Create(handler, serializer) " + |
There was a problem hiding this comment.
You need to rework the wording here because must user use the class library programming model which likes on the assembly attribute.
| /// </summary> | ||
| [JsonPropertyName("InitialExecutionState")] | ||
| [JsonInclude] | ||
| internal InitialExecutionState? InitialExecutionState { get; set; } |
There was a problem hiding this comment.
I don't think internal properties will get set for serialization using a JsonSerializerContext when that JsonSerializerContext is defined in a different assembly then the type with internal properties. You should double check my thinking but we might need to handle this different to support source generator based serialization.
There was a problem hiding this comment.
i just checked this and you are right
philasmar
left a comment
There was a problem hiding this comment.
Approved, assuming you address Norm's comments
Adds LambdaBootstrapBuilder.Create(Func<Stream, ILambdaContext, Task<Stream>>, ILambdaSerializer) and the matching HandlerWrapper.GetHandlerWrapper overload so stream-stream handlers can expose the supplied serializer via ILambdaContext.Serializer. Enables frameworks that own envelope (de)serialization end-to-end but still need to delegate inner-payload (de)serialization to a user-supplied serializer.
#2216
What
This is the first real end-to-end slice of the
Amazon.Lambda.DurableExecutionSDK. After this PR, you can write a workflow that callsStepAsyncandWaitAsync, deploy it as a Lambda, and have it run against the actual durable-execution service with replay-aware checkpointing.Public API introduced:
DurableFunction.WrapAsyncIAmazonLambdaclient.IDurableContextStepAsync(typed-result and void overloads),WaitAsync,LambdaContext,Logger,ExecutionContext.StepConfigRetryStrategyandSemanticsget wired in #2363.DurableExecutionExceptionStep input/output and workflow envelope serialization both go through the
ILambdaSerializerregistered onILambdaContext.Serializer(added in #2378)Why
Durable execution lets a Lambda function suspend and resume across invocations by checkpointing each side-effect to the service. This PR lays down the minimum needed to build everything that comes after — retries, callbacks, parallelism, the Annotations integration, the test runner package — all of it builds on the
Step + Waitprimitives and the replay machinery here.I kept the scope narrow on purpose. Anything that does not block real-Lambda execution is pushed to follow-up PRs (see Out of scope) so this stays reviewable.
How
The runtime runs a
Task.WhenAnyrace inDurableExecutionHandlerbetween the user's workflow task and a suspension signal. EveryStepAsync/WaitAsynccall goes to a per-operation class (StepOperation/WaitOperation) that checksExecutionState(built from operations the service replayed) before deciding what to do:SUCCEEDED/FAILEDrecord from a prior invocation. Return the cached result (or rethrow) without re-running user code.SUCCEED/FAILcheckpoint, return.TerminationManager.SuspendAndAwaitto win theWhenAnyrace, returningPendingso the service re-invokes us when the timer fires.Replay determinism comes from
OperationIdGenerator, which produces stable IDs from the workflow's call sequence so the same step always lands on the same record across invocations.Serialization is unified on
ILambdaContext.Serializer. BothWrapAsync(workflow input/output envelope) andStepOperation(step result checkpointing) read the registeredILambdaSerializeroff the per-invocation context and use it for all JSON conversion. AOT-safety is determined entirely by which serializer the caller registers withLambdaBootstrapBuilder.Create— registerSourceGeneratorLambdaJsonSerializer<TContext>for AOT/trimmed deployments, orDefaultLambdaJsonSerializerfor reflection-based scenarios.StepConfigis intentionally empty in this PR — it's the configuration carrier forRetryStrategy(#2363) and future per-step knobs.Checkpoint flushing goes through
CheckpointBatcher, aChannel-based queue with a single background worker. EachEnqueueAsyncreturns aTaskthat completes when the worker has flushed the containing batch to the service.WrapAsyncCoresets up the batcher, threads it intoDurableContext, andawaitsDrainAsync()before returning to Lambda. Defaults match the Java SDK (MaxBatchOperations = 200,MaxBatchBytes = 750 KB,FlushInterval = 0— flush as soon as the queue drains). There's aTODOfor the async-flush overload that Map/Parallel/Child Context will eventually need.Key files:
DurableFunction.cs— envelope + replay-state hydration + batcher lifecycle; pullsILambdaSerializerfromILambdaContext.SerializerDurableContext.cs— facade; constructs the right per-operation classInternal/DurableOperation.cs— abstract base withStartAsync/ReplayAsynctemplate methodsInternal/StepOperation.cs/Internal/WaitOperation.cs— per-op replay logic;StepOperationserializes step results via the registeredILambdaSerializerInternal/CheckpointBatcher.cs/Internal/CheckpointBatcherConfig.cs— checkpoint queue + workerInternal/ExecutionState.cs— operation lookup + replay-mode flagInternal/OperationIdGenerator.cs— deterministic IDsInternal/TerminationManager.cs— suspension triggerInternal/DurableExecutionHandler.cs— theWhenAnyraceServices/LambdaDurableServiceClient.cs— service client wrapperTesting
Unit tests in
Amazon.Lambda.DurableExecution.Testscover: enums, exceptions, models,OperationIdGenerator,TerminationManager,ExecutionState, the handler race, bothStepandWaitpaths throughDurableContext,DurableFunction.WrapAsync, andCheckpointBatcher(enqueue/flush, batching within window, overflow splitting, error propagation, drain, dispose, token updates, concurrency).End-to-end integration tests in
Amazon.Lambda.DurableExecution.IntegrationTestsbuild each test workflow into a Docker container, deploy it as a real Lambda onprovided.al2023, and run against the durable-execution service:StepWaitStep— basic Step → Wait → Step sequenceMultipleSteps— several sequential stepsWaitOnly— wait without any stepsLongerWait— wait that spans multiple invocationsReplayDeterminism— verifies stable operation IDs across replaysStepFails— verifies a failed step surfaces correctlyThe AOT smoke test publishes the sample workflow with the source-generator serializer and produces zero
IL2026/IL3050warnings.Out of scope (follow-up PRs)
IRetryStrategy,ExponentialRetryStrategy, retry decision factories,StepConfig.RetryStrategy/StepConfig.Semantics(Adds retry support to the Amazon.Lambda.DurableExecution #2363)StepExceptionand the per-step failure exception typeDurableLoggerreplay-suppression (currently returnsNullLogger)Callbacks,InvokeAsync,ParallelAsync,MapAsync,RunInChildContextAsync,WaitForConditionAsync— the interface intentionally does not declare these yet[DurableExecution]marker attribute)DurableTestRunner/Amazon.Lambda.DurableExecution.Testingpackagedotnet new lambda.DurableFunctionblueprint