Skip to content

konnectivity-client: add reusable gRPC tunnel#823

Open
ipochi wants to merge 5 commits into
kubernetes-sigs:masterfrom
kinvolk:imran/reusable-tunnel-grpc
Open

konnectivity-client: add reusable gRPC tunnel#823
ipochi wants to merge 5 commits into
kubernetes-sigs:masterfrom
kinvolk:imran/reusable-tunnel-grpc

Conversation

@ipochi
Copy link
Copy Markdown
Contributor

@ipochi ipochi commented Apr 28, 2026

Add ReusableTunnel and CreateGRPCTunnel so callers can reuse one gRPC ClientConn and open one Proxy stream per DialContext, instead of creating a new gRPC connection
per proxied dial.

Changes

  • Add ReusableTunnel interface with explicit Close / Done drain semantics.
  • Add reusableGrpcTunnel implementation backed by one shared gRPC ClientConn.
  • Add DialFailureStreamSetup so callers can distinguish shared-transport stream setup failures from backend dial failures.
  • Refactor existing single-use tunnel setup through an internal helper.
  • Add reusable tunnel tests covering reuse, close/drain behavior, failure classification, races, and leaks.

Notes

This does not change the konnectivity wire protocol, server, agent, existing Tunnel interface, or single-use tunnel behavior. Reconnect and rebuild policy remain caller-
owned.

ipochi added 5 commits April 28, 2026 02:20
This is a pure refactor with no behavioral change. Factor Proxy()
stream setup and serve() goroutine launch out of
CreateSingleUseGrpcTunnelWithContext into an internal
newSingleStreamTunnel helper that takes a ProxyServiceClient and a
closeFn.

The helper invokes closeFn from the serve() goroutine on exit via the
clientConn interface instead of unconditionally calling grpcConn.Close().
For the existing single-use path, closeFn is c.Close, preserving today's
behavior: tunnelCtx still bounds both Proxy() establishment and the
serve() goroutine, and the *grpc.ClientConn is still closed when serve()
returns.

This separation prepares for a future reusable tunnel API that will share
one *grpc.ClientConn across many per-dial streams. In that path, closeFn
will cancel only the per-dial stream context rather than closing the
shared conn. No such caller is added in this commit.

Add a closerFunc adapter so a func() error satisfies the existing
clientConn interface without introducing a new abstraction.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
Add an exported DialFailureReason for failures opening a new Proxy
stream on a reusable tunnel's shared gRPC connection. This is distinct
from DialFailureEndpoint, which describes backend dial failures, and
DialFailureContext, which describes caller-side cancellation.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
Add a purely additive ReusableTunnel type. ReusableTunnel composes
Tunnel with Close, describing a tunnel whose DialContext may be called
many times and whose underlying *grpc.ClientConn lifetime matches the
tunnel's.

Reusable tunnel contract is:
- each returned net.Conn is independent
- Done closes only after Close has been called and all in-flight
  per-dial child streams have drained
- Done does not fire on remote unreachability alone; callers detect
  transport failure via DialContext errors and decide whether to Close
  and rebuild
- Close blocks until drain; concurrent Close calls all observe the same
  post-condition, with only the first caller returning any
  ClientConn.Close error

The existing Tunnel interface and CreateSingleUseGrpcTunnel* functions
are unchanged. No implementation is added in this commit; the concrete
reusableGrpcTunnel and CreateGRPCTunnel constructor follow separately.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
Add the concrete reusable tunnel implementation backing ReusableTunnel.
CreateGRPCTunnel dials the proxy and returns a reusableGrpcTunnel that
owns the underlying gRPC ClientConn for the tunnel's lifetime.

Each DialContext call opens one Proxy stream over the shared connection.
The implementation pre-validates tcp, synchronizes Close with child
stream admission under a mutex, races stream establishment against the
request context, returns typed failures for stream setup and closed-tunnel
cases, and cancels per-dial stream contexts on inner dial errors so child
serve goroutines can drain.

Close is blocking-idempotent: the first caller cancels all child stream
contexts, waits for them to drain, closes the underlying ClientConn, and
fires Done. Concurrent callers wait for the same teardown to complete and
return nil.

Add an internal newReusableGrpcTunnel constructor so tests can inject a
fake ProxyServiceClient and clientConn without dialing real gRPC.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
Add reusable_tunnel_test.go covering reusableGrpcTunnel and
CreateGRPCTunnel via the existing fake-stream harness.

A new fakeProxyClient implements client.ProxyServiceClient with hooks
for preempting stream creation, customizing the next-allocated
proxyServer, or substituting a fully custom ProxyService_ProxyClient.
A closeTrackingConn implements clientConn for asserting close counts and
propagating close errors. Purpose-built fake streams exercise lifecycle
paths the existing pipe-backed harness cannot deterministically pin.

Coverage targets the documented reusable tunnel contracts: shared
transport reuse, per-dial isolation, requestCtx-bounded establishment,
late-arriving inner tunnel tracking, unsupported-protocol short-circuit,
inner.DialContext failure cleanup, dial-after-close typing, concurrent
Close drain semantics, Close error attribution, Close-vs-DialContext
stress, Done not firing on remote disappearance alone, typed
DialFailureStreamSetup versus DialFailureEndpoint, and goleak smoke
coverage.

Signed-off-by: Imran Pochi <imranpochi@microsoft.com>
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ipochi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot requested a review from elmiko April 28, 2026 05:10
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 28, 2026
@k8s-ci-robot k8s-ci-robot requested a review from tallclair April 28, 2026 05:10
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 28, 2026
@cheftako
Copy link
Copy Markdown
Contributor

cheftako commented May 1, 2026

@ipochi I fixed a bunch of problems on CI. I think if you rebase on the latest this should go better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants