feat(controller): emit events for MCP handshake failures#186
Conversation
Emit Warning events when the MCP handshake fails (deduplicated by condition message) and when handshake retries are exhausted. Part of kubernetes-sigs#109.
✅ Deploy Preview for mcp-lifecycle-operator ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Hi @ibm-adarsh. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Tip We noticed you've done this a few times! Consider joining the org to skip this step and gain Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Extract reconcileHandshakeEventsAndRetryCount to keep Reconcile under complexity 30; use range over int in handshake test.
|
Notes (expected behavior)
|
|
Hi @matzew Can I get ok-to-test label and any suggestion here, please? |
| if r.Recorder == nil { | ||
| return | ||
| } | ||
| r.Recorder.Eventf(mcpServer, nil, corev1.EventTypeWarning, ReasonMCPEndpointUnavailable, eventActionMCPHandshakeFailed, "%s", message) |
There was a problem hiding this comment.
in the emitMCPHandshakeRetriesExhausted the name of the mcp server is used - perhaps we can do here too for more consistent approach?
Address review feedback on kubernetes-sigs#186 and align event messages with emitServerReady and emitMCPHandshakeRetriesExhausted.
| return | ||
| } | ||
| r.Recorder.Eventf(mcpServer, nil, corev1.EventTypeNormal, ReasonValid, eventActionConfigurationAccepted, "%s", "MCPServer configuration is valid; Accepted=True") | ||
| r.Recorder.Eventf(mcpServer, nil, corev1.EventTypeNormal, ReasonValid, eventActionConfigurationAccepted, |
There was a problem hiding this comment.
To maintain the consistency across all events that we are emitting.
|
@ibm-adarsh I think my last PR caused conflicts here (splitting up the Would you mind fixing those + pulling the handshake logic you have into the new `mcpserver_controller_handshake.go file) ? |
Resolve conflict with kubernetes-sigs#190 by keeping the refactored controller layout and restoring MCP handshake failed/retries-exhausted events plus MCPServer name in configuration event messages.
There was a problem hiding this comment.
Pull request overview
This PR extends the MCPServer controller’s Kubernetes event coverage by emitting Warning events when the MCP handshake fails and when handshake retries are exhausted, with deduplication to avoid event spam.
Changes:
- Add Warning events for MCP handshake failures and for “retries exhausted” when the max retry threshold is reached.
- Refactor handshake retry counting + event emission into a helper (
reconcileHandshakeEventsAndRetryCount) and add a dedupe helper (duplicateHandshakeUnavailable). - Update and extend unit/envtest coverage around handshake event emission and include MCPServer name in configuration accepted event assertions.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| internal/controller/mcpserver_controller.go | Adds handshake-related event actions and emitters; factors retry/event logic into a helper call; adjusts configuration event messages. |
| internal/controller/mcpserver_controller_test.go | Updates assertions to reflect the updated event message content (includes MCPServer name). |
| internal/controller/mcpserver_controller_handshake.go | Implements reconcileHandshakeEventsAndRetryCount to emit handshake events and compute retry count. |
| internal/controller/mcpserver_controller_handshake_test.go | Adds tests for handshake-failed event deduping and retries-exhausted event emission. |
| internal/controller/mcpserver_controller_conditions.go | Adds duplicateHandshakeUnavailable helper for deduping handshake-failed events. |
| internal/controller/mcpserver_controller_conditions_test.go | Adds unit test coverage for duplicateHandshakeUnavailable. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
DONE! |
Extend the MCPHandshakeFailed dedupe test to cover both no-duplicate when the error is unchanged and a new event when the message changes, matching the test name and Copilot review feedback on kubernetes-sigs#186.
| case ev := <-fr.Events: | ||
| collected = append(collected, ev) | ||
| default: | ||
| goto check |
There was a problem hiding this comment.
To get rid of this goto, we can use something like this:
func drainEvents(ch <-chan string) []string {
var events []string
for {
select {
case ev := <-ch:
events = append(events, ev)
default:
return events
}
}
}
Then the test becomes:
Eventually(func(g Gomega) {
collected = drainEvents(fr.Events)
exhausted := 0
for _, ev := range collected {
if strings.Contains(ev, "retries exhausted") {
exhausted++
}
}
g.Expect(exhausted).To(Equal(1))
}).Should(Succeed())Alternative is using a labeled break, which is also not preferred.
|
@ibm-adarsh I have a single comment above, which I think is blocking for this PR. |
Address aliok review on kubernetes-sigs#186 by draining fake recorder events via a shared helper instead of goto in the retries-exhausted assertion.
Rename local slice to avoid shadowing k8s.io/client-go/tools/events.
Done, Please take a look once you have sometime. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ibm-adarsh, matzew The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Summary
Part of #109 (Kubernetes event coverage).
Adds two Warning events for the MCP handshake lifecycle:
MCPEndpointUnavailableMCPEndpointUnavailablehandshakeRetryCountreachesmaxMCPHandshakeRetries(10) and the controller stops requeuingBuilds on #118 (configuration events) and #184 (server ready event).
Changes
duplicateHandshakeUnavailable()helper and unit/envtest coverage.Test plan
go test ./internal/controller/...— all specs passValid+Availableevents; no retries-exhausted eventManual verification