Skip to content
Open
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 194 additions & 0 deletions docs/proposals/0069-TelemetryPolicy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
Date: 9th February 2026<br/>
Authors: gkhom<br/>
Status: draft<br/>

# TelemetryPolicy
A Kubernetes API for Gateway/Mesh Observability

## Summary
This proposal introduces the `TelemetryPolicy`, a direct policy attachment designed to configure observability signals (metrics, logs, traces)
for Gateway API resources (via `Gateway` attachment) and Service Mesh resources (via `namespace` attachment).
Comment on lines +9 to +10
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to call it a "Direct Policy" while Gateway and Namespace as supported target kinds are for two completely disjoint use cases – ingress and mesh.

By definition, it's only direct when:

A single kind supported in spec.targetRefs.kind

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be more accurate to call it "inherited policy"?


This K8s API standardizes how users enable and configure telemetry across different data plane implementations, replacing vendor-specific CRDs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am (acting as) a naive reader, and I was immediately curious what some examples of these vendor-specific CRDs are. This also ties to and might clarify the below mention of "Observability lock-in".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples of such CRDs are:

  • Istio's Telemetry CRD
  • Envoy Gateway's EnvoyProxy and EnvoyGateway CRDs
  • Kong's MeshMetrics/MeshTrace/MeshAccessLog
  • Kuadrant's TelemetryPolicy

I intend to write a section that compares such existing APIs and the proposed TelemetryPolicy in the eventual proposal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as there's a mix of examples here, will the scope cover one resource for all of the signals (metrics, logs, traces) vs. separate ones? Are there tradeoffs to consider here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards one resource for all. The argument for splitting them might be that different personas are involved in configuring the different aspects of observability. In practice, I think that the persona that configures metrics, likely also configures tracing and access logs. So to avoid complicating the API with three additional resources, it seems worthwhile to put all of it in a single resource.

with a unified, portable spec.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you see implementations reconciling the TelemetryPolicy and reading the bits that are relevant to their components? So multiple controllers read the CR and take actions to enable telemetry across the components they are controlling?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed possible to distribute the responsibility across multiple controllers, it's up to the implementation. In most cases that I'm familiar with a single controller/control plane programs all three observability features (metrics, traces, logs).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible but a little challenging, what are the cases we see multiple impls reconcile the same thing?


# Context
## The Fragmentation of Observability
In the current Kubernetes landscape, the “Who, What, Where, and How Long” of network traffic is answered differently depending on the underlying
proxy technology. While the Gateway API specification has unified how traffic is routed via `HTTPRoute` and `Gateway`, it has deferred the standardization
of how that traffic is observed.
This deferral has led to "Observability Lock-in". Platform Engineering teams are forced to learn and manage distinct APIs for each environment.
A standardized `TelemetryPolicy` is necessary to decouple the intent of observability from the implementation. Without such standardization it is
difficult for platform owners to:

1. Enforce consistent auditing standards across different infrastructure providers.
2. Support emerging workloads like AI Agents, which require specialized metrics (e.g., token usage, model latency) and detailed audit logs for tool-use verification.
3. Manage “Mesh” and “Gateway” observability with a single unified API.

## The Emergence of Agentic Networking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a callout for this here, given that in the rest of the document is just specified that there is a need for a Telemetry API standardization, regardless of being Agentic Networking or not?


The most pressing driver for this proposal is the shift in traffic patterns introduced by agentic workloads. We are moving from a deterministic Service-to-Service
paradigm to a non-deterministic Agent-to-Tool and Agent-to-Agent paradigm.

In an Agentic Mesh:
* **Entities are Autonomous**: An AI Agent (Pod) decides entirely on its own to call a Tool (Service).
* **Cost is Volatile**: Usage is measured in tokens, not just requests. A single HTTP 200 OK could cost $0.01 or $10.00 depending on the prompt and model used.
* **Context is King**: Debugging requires knowing the semantic context: Which Model? Which Prompt? Which tool?

Existing telemetry policies are unaware of the emerging Generative AI semantic conventions. They see an opaque TCP stream or HTTP POST. Without a standardized API to
configure the extraction and export of these attributes, the “Agentic Mesh” will remain a black box, increasing governance and cost control challenges.

## Design Objectives

To address these challenges, the `TelemetryPolicy` proposal targets four core objectives:

1. **Standardization**: A single API for Gateway and Mesh to configure Access Logging, Metrics generation, and Tracing propagation.
2. **GEP-713 Compliance**: Support `targetRef` attachment to `Gateway` and `Namespace`. The latter covers Mesh use-cases.
3. **Agentic Support**: Enable the capture of OpenTelemetry GenAI Semantic Conventions and support the requirements of PR #33.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of standardization and not reinventing the wheel, I wanted to mention that the llm-d community is already moving on tracing + OTel + GenAI Semantic Conventions. In particular, Sally O'Malley from Red Hat proposed and did a POC for distributed tracing in llm-d. [aside: I learned about this work from Sally on another community call for our kagenti project]

This may be applicable here for a few reasons:

  • We are keen on integrating OTel and GenAI semantic conventions, too
  • One of our objectives is a single API for Gateways and Meshes, and Sally's POC has already landed some changes to support tracing to the Gateway API Inference Extension (GAIE) components like the endpoint pickers (proposal comment, GAIE PR).

While llm-d is focused on distributed LLM inferencing regardless of source (i.e., user chat -> LLM vs agent -> LLM), I think it's worth considering any lessons they may have already encountered and API definitions that could overlap with our case, at the very least at the Gateway level. I'd be willing to evangelize our thinking to Sally to get her thoughts, but more importantly curious on our interest level.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would certainly be valuable to get some of their insights and experiences. The proposal seems to cover configuration through environment variables, have they defined CRDs as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I did not see any CRD definitions. I'll keep this thread in mind as the definitions become more concrete.

4. **Protocol Agnostic**: Support OpenTelemetry as the primary data model while allowing vendor-specific extensions.

## The TelemetryPolicy Specification

We propose the `TelemetryPolicy` as a direct policy attachment in the `gateway.networking.k8s.io` API group.
Comment thread
gkhom marked this conversation as resolved.
Outdated

### Resource Structure

The following is an example that demonstrates the structure of the `TelemetryPolicy`.

```yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have the status specification please?

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: TelemetryPolicy
metadata:
name: standard-telemetry
namespace: prod-ns
spec:
# GEP-713 Attachment
targetRef:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the GEP-713 description, will we allow multiple attachments (targetRefs)? Perhaps with the "what namespace targets?" comment below it'd be good to understand the precedence for multiple policy (gateway vs. namespace policy) resolution more clearly. Are they to be non-overlapping?

similarly I assume policy status will be included?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is indeed to allow multiple attachments. A single TelemetryPolicy can be used to configure for multiple namespaces and/or gateways. I will fix the mistake in the example to state targetRefs.

Regarding multiple policies, I was thinking that only a single TelemetryPolicy is allowed to target a specific resource. A TelemetryPolicy that targets a namespace or Gateway that is already targeted by another TelemetryPolicy should be rejected.

Regarding status, it will be included. Does it need to be mentioned explicitly in the spec?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally was not sure if status was going to be unique enough to call out here, but seems @guicassolato also called out the status stanza here: #69 (comment) so including it explicitly will likely avoid further confusion

group: gateway.networking.k8s.io
kind: Gateway
name: my-gateway

# 1. Tracing Configuration
tracing:
provider:
type: OTLP # or implementation-specific
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is type a specific go type? would be useful to define something like

type TracingProviderType string

const (
  // OTLPTracingProvider is used to ....

  OTLPTracingProvider TracingProviderType = "OTLP"
)

And then an implementation specific would probably be a vendor-prefixed thing like <foo.io/some-provider-type-name>

Would probably be useful to have the go types as part of this PR like other GEPs in Gateway API

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Go types in the "Detailed resource description" section. I'm not sure that we need to include specific implementation providers as part of the API spec.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot come up with any good reason to support anything but OTLP for tracing...

endpoint: "otel-collector.monitoring.svc:4317"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this any url basically?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this example it's the URL of the OTLP endpoint.

samplingRate:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to explain how sampling of traces work. Is this respecting the existing "sampling" decisions? Is this for requests without an existing context?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a brief explanation. This is the base sampling rate across all requests. The optional parentBasedSampling config allows for a distinct sampling rate specifically for requests that are already part of a trace.

percent: 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would benefit from a go struct. We had loads of discussion on how to do percentages in gateway api. I could not find the long thread, I think it was in a meeting. This is the best thing I could fine kubernetes-sigs/gateway-api#3178 but maybe @robscott has something more concrete

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Go structs in the "Detailed resource description" section.

parentBasedSampling:
enabled: true
samplingRate:
percent: 50
context:
- W3C
- B3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will likely benefit from comments

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in the Go struct spec. Should we add it here as well?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stop encouraging the use of ancient tracing headers. Just use OTLP w3c context and ignore everything else.

customSpanAttributes:
- attributeName: "env"
literalValue: "production"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, whats the rationale behind literalValue vs just value?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to make it explicit that this is a static value for the attribute. In the future, we could also consider dynamic attributes that are derived at runtime.


# 2. Metrics Configuration
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its questionable how portable dynamic configuration of metrics is. I've only seen Envoy do this, and even then its extremely bug ridden historically. Its very confusing for users what the semantics are, or should be, to add or remove labels from metrics. Or even adding a metric -- imagine I have a dashboard like request_count/error_count but I added error_count after request_count so its all out of whack...

metrics:
enable: true
Comment thread
gkhom marked this conversation as resolved.
Outdated
provider:
type: Prometheus
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment on go struct and types

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO for metrics, just use OTLP. It's 2026... It's interchangeable with prometheus and OTLP is here to stay. I'd drop provider to avoid any kind of vendor interference here.

overrides:
- name: "request_count"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have an example of a full metric name

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the example and added comment in the Go structs.

type: Counter
dimensions: # Custom labels/dimensions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of the API is slightly confusing and seems somehow like a Prometheus rewrite rule. Is this something supported by OTEL, or some assumption that the Gateway implementation will do the rewrites/overrides here?

- key: "model_id"
fromHeader: "x-model-id" # Crucial for Agentic workloads
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any other possible sources in mind? E.g. fromMetadata?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, fromMetadata is an example. Potentially an advanced fromPayload might be worth considering in the future.


# 3. Access Logging
accessLogs:
enable: true
Comment thread
gkhom marked this conversation as resolved.
Outdated
format: JSON
Comment thread
gkhom marked this conversation as resolved.
Outdated
matches: # Conditional logging
- path: "/api/v1/sensitive"
Comment thread
gkhom marked this conversation as resolved.
Outdated
fields: # Configure specific fields to include
- "start_time"
- "response_code"
- "x-token-usage"
Comment on lines +101 to +104
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any definition of what these fields mean?

For a concrete example, say I want to log the MCP task name (I chose this since its not in https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/). Can I do it? what do I put as the field if I want to?

```

### Policy Attachment

Following [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/), the `TelemetryPolicy` supports the following attachments:

1. **Gateway (Instance Scope)**: Configures the telemetry for a specific `Gateway`.
2. **Namespace (Mesh Scope)**: Configures the telemetry for all mesh proxies (sidecar proxy / node proxy / etc.) in that namespace.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Namespace targets? Are Gateways excluded from targeting (note that some Gateways are in cluster and some outside the cluster).

Would be good to have some user journeys that we want to allow here with the configuration.

FWIW - we have went through a similar exercise in Gateway API for AuthZ and here is the conclusion that landed (note that this is still experimental though). Here it is for more context -- https://github.com/kubernetes-sigs/gateway-api/pull/3891/changes#diff-6886a6f78647100500384beb636df7b6487717be6d9f8366f50d8a0bd3581927R196-R238

@guicassolato have tons of experience in this as well, as it was initially proposed as part of GEP-713.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current thinking is that targeting a namespace implies a "mesh" target i.e., it would include all proxies in the namespace except gateways. I think it's reasonable to assume that different observability configurations may be desired/applicable to Gateway use-cases (north/south) compared to Mesh use-cases (east/west). That's why I think it's better to avoid making namespace a target that captures both.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining what namespace targeting means and even if that definition points to mesh use case only is fine. But it has to be more than implied IMO. It has to be by design and well specified/documented, so all implementations of the API will commit to the same meaning and behaviour.

(I believe that's what @gkhom has in mind, but good to spell it out, I think.)

On a side note, if namespace targeting is for the mesh use case, have you considered the Mesh kind (and avoid any possible confusion altogether)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although implied when calling it "direct policy", I think it may useful to add a note about the merge semantics, which I imagine will be the None one. I.e., describe the behaviour of what happens when 2 policy resources of this kind target the same object (same gateway or same namespace).


#### Alternatives Considered

##### GatewayClass

Targeting `GatewayClass` would set the default telemetry configurations for all Gateways of a specific class. While this would provide a powerful mechanism, the challenge is that `GatewayClass` is a cluster-scoped entity whereas `TelemetryPolicy` is namespace-scoped. Allowing a namespace-scoped resource to influence the behavior of an entire cluster introduces significant operational and security risks. We would also need to define the semantics in the presence of multiple `TelemetryPolicy` resources that target the same `GatewayClass`. This is out of scope for this proposal.

##### Route

Future iterations could support attachment directly to routes (e.g., `HTTPRoute`). This will allow specific telemetry configuration for critical paths or specific API endpoints. To maintain API simplicity in the initial proposal, this is deferred to a future proposal.

##### Workload

We evaluated the ability to target specific workloads directly using pod label selectors. This would allow for precise application of telemetry settings to specific groups of pods (e.g., forcing debug logging on a specific deployment). However, we are prioritizing namespace-level attachment for mesh use-cases to align with existing Gateway API patterns.

##### Service

Attachment to a `Service` is deferred because a `Service` resource primarily defines the "exposure" or inbound side of a workload. It is not intuitive for a policy attached to an inbound definition to configure telemetry for both inbound and outbound traffic. Additionally, since multiple Services can select the same Pod, resolving precedence or merging strategies when different `TelemetryPolicy` resources target those different Services introduces significant complexity.

### Detailed Resource Description

| Field Name | Type | Description |
| --------------------------------- | ------------ | ----------- |
| spec.targetRef | Object | *Required.* Identifies the target resource (Gateway or Namespace) to which this policy attaches, following GEP-713 compliance. |
| spec.tracing | Object | Configuration for distributed tracing options. |
| spec.tracing.provider | Object | Specifies the tracing backend. Includes type (e.g., "OTLP") and endpoint (e.g., collector URL). |
| spec.tracing.samplingRate | Fraction | The base sampling probability for traces. |
| spec.tracing.parentBasedSampling | Object | Configures whether to respect the sampling decision of the parent span, with an optional fallback sampling rate. |
| spec.tracing.context | List<String> | Specifies the context propagation formats to use (e.g., W3C, B3, Jaeger). |
| spec.tracing.customAttributes | List<Object> | Allows appending custom tags/attributes to spans. Supports literal values (e.g., env: production). |
| spec.metrics | Object | Configuration for metric generation and exports. |
| spec.metrics.enable | Boolean | Global switch to enable or disable metric generation. |
| spec.metrics.provider | Object | Specifies the metrics backend (e.g., Prometheus). |
| spec.metrics.overrides | List<Object> | List of configurations to customize specific metric families (e.g., request_count). |
| spec.metrics.overrides.dimensions | List<Object> | Defines custom dimensions (labels). Can extract values from headers (e.g., x-model-id) for Agentic telemetry. |
| spec.accessLogs | Object | Configuration for access log generation. |
| spec.accessLogs.enable | Boolean | Global switch to enable or disable access logging. |
| spec.accessLogs.format | String | The format of the logs (e.g., JSON, Text). |
| spec.accessLogs.matches | List<Object> | Conditions for logging, allowing filtering to specific paths (e.g., /api/v1/sensitive) or events. |
| spec.accessLogs.fields | List<String> | A list of specific fields or headers to include in the logs (e.g., x-token-usage, start_time). |

### Alignment with Requirements

#### Agentic Telemetry

* **Token Counting**: The `metrics.overrides` and `accessLogs.fields` sections allow extracting the values from headers (e.g., `x-usage-input-tokens`, `x-usage-output-tokens`) or request/response bodies (if supported by the data plane) into telemetry.
* **Tool Use Auditing**: By attaching a `TelemetryPolicy` to a `Gateway` serving LLM traffic, operators can enforce 100% access logging for specific routes (e.g., `/tool/execute`) to create an immutable audit trafil of agent actions.
Comment thread
gkhom marked this conversation as resolved.
Outdated
* **Latency Tracking**: Latency histograms can be configured to track "Time to First Token" (TTFT) if exposed by the backend protocol.

#### Tracing

* **Sampling**: Supports probabilistic and parent-based sampling.
* **Propagation**: Explicitly configures propagation formats (W3C TraceContext defaults, option B3, Jaeger, etc.)
* **Customization**: Allows appending custom tags/attributes to spans.

#### Metrics

* **Granularity**: Users can enable/disable specific metric families.
* **Dimensions**: The API supports "overrides" (similar to [OpenTelemetry Views](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view)) where users can add or remove dimensions (labels/attributes) to reduce cardinality or increase visibility.

#### Logging

* **Flexible Formatting**: Supports both JSON and text formats for compatibility with standard log aggregation stacks.
* **Smart Filtering**: Reduces noise and cost via CEL-based filtering, allowing logs to be generated only for specific events (e.g., 5xx errors, high latency, or critical paths).
Comment thread
gkhom marked this conversation as resolved.
* **Custom Attributes**: Enables the extraction of specific headers and proxy metadata into log entries.
* **Sinks**: Defaults to standard container logging (stdout) with extensibility for OTLP or external ports.

## Comparison with Prior Art
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B that all of these are wrappers around Envoy

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Though the Kuadrant TelemetryPolicy referenced is an API to supplement metrics coming from an internal component (limitador) that envoy filters to, rather than envoy itself.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn - can you offer other arts that are not relying on Envoy? I recall seeing a comment here or in slack welcoming help/contributions to the prior art section more

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


| | **Istio** | **Envoy Gateway** | **Kong** | **Kuadrant** | **GKE / Inference Gateway** | **TelemetryPolicy**<br />(this proposal) |
| --- | --- | --- | --- | --- | --- | --- |
| Primary API | [Telemetry CRD](https://istio.io/latest/docs/reference/config/telemetry/) | | | | | |
| Policy Model | | | | | | |
| Metrics & Logs | | | | | | |
| Portability | | | | | | |
| AI/LLM Support | | | | | | |