Skip to content

TelemetryPolicy proposal#69

Open
gkhom wants to merge 21 commits into
kubernetes-sigs:mainfrom
gkhom:main
Open

TelemetryPolicy proposal#69
gkhom wants to merge 21 commits into
kubernetes-sigs:mainfrom
gkhom:main

Conversation

@gkhom
Copy link
Copy Markdown

@gkhom gkhom commented Feb 9, 2026

What type of PR is this?

/kind documentation

What this PR does / why we need it:

This PR contains a proposal for a new TelemetryPolicy API. This K8s API aims to standardize how users enable and configure telemetry (metrics, logs, traces) across different data plane implementations, replacing vendor-specific CRDs with a unified, portable spec.

Which issue(s) this PR fixes:

Fixes #

Does this PR introduce a user-facing change?:

* TelemetryPolicy specification

gkhom added 2 commits February 9, 2026 14:58
This change include context, problem description, and design objectives for a TelemetryPolicy proposal. If the community agrees on this context then I will follow up with the actual API specification.
@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Feb 9, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Feb 9, 2026

Deploy Preview for kube-agentic-networking ready!

Name Link
🔨 Latest commit 8600d0b
🔍 Latest deploy log https://app.netlify.com/projects/kube-agentic-networking/deploys/69cdd554ee5388000880923d
😎 Deploy Preview https://deploy-preview-69--kube-agentic-networking.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla Bot commented Feb 9, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Welcome @gkhom!

It looks like this is your first PR to kubernetes-sigs/kube-agentic-networking 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kube-agentic-networking has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 9, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

Hi @gkhom. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 9, 2026
@haiyanmeng
Copy link
Copy Markdown
Contributor

CLA Not Signed

@gkhom , can you fix this?

@haiyanmeng
Copy link
Copy Markdown
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Feb 10, 2026
@LiorLieberman
Copy link
Copy Markdown
Member

/easycla

@LiorLieberman
Copy link
Copy Markdown
Member

/check-cla

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Feb 10, 2026
Copy link
Copy Markdown
Contributor

@rubambiza rubambiza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkhom Thanks for kicking this off. I left some clarifying questions and proposing a minor change. The bigger part of the review is to surface work already being proposed/done in the llm-d community with regard to tracing and whether it is applicable to our objectives.

This proposal introduces the `TelemetryPolicy`, a direct policy attachment designed to configure observability signals (metrics, logs, traces)
for Gateway API resources (via `Gateway` attachment) and Service Mesh resources (via `namespace` attachment).

This K8s API standardizes how users enable and configure telemetry across different data plane implementations, replacing vendor-specific CRDs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am (acting as) a naive reader, and I was immediately curious what some examples of these vendor-specific CRDs are. This also ties to and might clarify the below mention of "Observability lock-in".

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Examples of such CRDs are:

  • Istio's Telemetry CRD
  • Envoy Gateway's EnvoyProxy and EnvoyGateway CRDs
  • Kong's MeshMetrics/MeshTrace/MeshAccessLog
  • Kuadrant's TelemetryPolicy

I intend to write a section that compares such existing APIs and the proposed TelemetryPolicy in the eventual proposal.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing as there's a mix of examples here, will the scope cover one resource for all of the signals (metrics, logs, traces) vs. separate ones? Are there tradeoffs to consider here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards one resource for all. The argument for splitting them might be that different personas are involved in configuring the different aspects of observability. In practice, I think that the persona that configures metrics, likely also configures tracing and access logs. So to avoid complicating the API with three additional resources, it seems worthwhile to put all of it in a single resource.

Comment thread docs/proposals/0069-TelemetryPolicy.md Outdated
* **Cost is Volatile**: Usage is measured in tokens, not just requests. A single HTTP 200 OK could cost $0.01 or $10.00 depending on the prompt and model used.
* **Context is King**: Debugging requires knowing the semantic context: Which Model? Which Prompt? Which tool?

Existing telemetry policies are unaware of the Generative AI semantic conventions. They see an opaque TCP stream or HTTP POST. Without a standardized API to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In line with the header, may I suggest adding "unaware of the emerging Generative AI semantic conventions"?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will update.


1. **Standardization**: A single API for Gateway and Mesh to configure Access Logging, Metrics generation, and Tracing propagation.
2. **GEP-713 Compliance**: Support `targetRef` attachment to `Gateway` and `Namespace`. The latter covers Mesh use-cases.
3. **Agentic Support**: Enable the capture of OpenTelemetry GenAI Semantic Conventions and support the requirements of PR #33.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of standardization and not reinventing the wheel, I wanted to mention that the llm-d community is already moving on tracing + OTel + GenAI Semantic Conventions. In particular, Sally O'Malley from Red Hat proposed and did a POC for distributed tracing in llm-d. [aside: I learned about this work from Sally on another community call for our kagenti project]

This may be applicable here for a few reasons:

  • We are keen on integrating OTel and GenAI semantic conventions, too
  • One of our objectives is a single API for Gateways and Meshes, and Sally's POC has already landed some changes to support tracing to the Gateway API Inference Extension (GAIE) components like the endpoint pickers (proposal comment, GAIE PR).

While llm-d is focused on distributed LLM inferencing regardless of source (i.e., user chat -> LLM vs agent -> LLM), I think it's worth considering any lessons they may have already encountered and API definitions that could overlap with our case, at the very least at the Gateway level. I'd be willing to evangelize our thinking to Sally to get her thoughts, but more importantly curious on our interest level.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would certainly be valuable to get some of their insights and experiences. The proposal seems to cover configuration through environment variables, have they defined CRDs as well?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I did not see any CRD definitions. I'll keep this thread in mind as the definitions become more concrete.

for Gateway API resources (via `Gateway` attachment) and Service Mesh resources (via `namespace` attachment).

This K8s API standardizes how users enable and configure telemetry across different data plane implementations, replacing vendor-specific CRDs
with a unified, portable spec.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you see implementations reconciling the TelemetryPolicy and reading the bits that are relevant to their components? So multiple controllers read the CR and take actions to enable telemetry across the components they are controlling?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is indeed possible to distribute the responsibility across multiple controllers, it's up to the implementation. In most cases that I'm familiar with a single controller/control plane programs all three observability features (metrics, traces, logs).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possible but a little challenging, what are the cases we see multiple impls reconcile the same thing?

@david-martin
Copy link
Copy Markdown
Contributor

I'm generally in favour of this proposal.
/approve

Perhaps more relevant when it comes to the specification, it would be good to know more about the current 'state of the art' in this space.

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 17, 2026
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 17, 2026
Comment on lines +99 to +100
matches: # Conditional logging
- cel: "response.code >= 500" # CEL-based filtering for errors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only want CEL its a bit awkward to have a list. But may make sense if we have non-CEL

Comment on lines +101 to +104
fields: # Configure specific fields to include
- "start_time"
- "response_code"
- "x-token-usage"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any definition of what these fields mean?

For a concrete example, say I want to log the MCP task name (I chose this since its not in https://opentelemetry.io/docs/specs/semconv/gen-ai/mcp/). Can I do it? what do I put as the field if I want to?

Comment on lines +220 to +223
type Dimension struct {
Key string `json:"key"`
FromHeader string `json:"fromHeader,omitempty"`
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3 APIs all have the same property of "add a K/V" pair but do so in 3 different ways. Does it make sense? Should we be more consistent in them?

It seems odd that:

  • tracing: literal only
  • metrics: header only
  • log: a name only without a value

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Apr 8, 2026

Can we move this discussion to Gateway API as a provisional GEP first, and then experimental? I think a lot of the discussion here is happening on a context that is important not only for agentic, but for the whole Gateway API ecosystem and I wouldn't like to receive this proposal as "we discussed and approved on agentic and now this needs to be implemented this way on Gateway API".

Thanks!

@kflynn
Copy link
Copy Markdown

kflynn commented Apr 9, 2026

Seconding @rikatz's comment – this seems applicable to much more than just the agentic world, and I'd love to get eyes on it from Gateway API. Thanks!! 🙂

@LiorLieberman
Copy link
Copy Markdown
Member

@rikatz @kflynn have you had a chance to also review the API?

There has been some really good iteration here. The fact that it probably belongs to Gateway is not questionable and have been discussed multiple times.

Can you both review here, so we have the comments all cohesively in one place and we can move it to Gateway to the last round? Alternatively if any of you know of a way to move the proposal to with all the comments and history to Gateway then it will also be great.

I would like to make sure Agentic usecases are represented, for this group this is the main motivation and its beyond "telemetry is useful to standardize generic cases as well" -- for agentic its an increasingly important feature and we need to make sure we iterate fast here and have the agentic cases are meaningfully represented.

@rikatz
Copy link
Copy Markdown
Member

rikatz commented Apr 14, 2026

No, I didn't mostly because I didn't knew it existed and heard about it recently on EGADS.

I will, but again, I will be strongly against this getting merged to Gateway API if this is merged to this repo first. I think it is a different audience, and the initial part of the GEP already says it is on the wrong place. Having all of these discussions here will make them be lost from the official GEP process.

A Kubernetes API for Gateway/Mesh Observability

2. Support emerging workloads like AI Agents, which require specialized metrics (e.g., token usage, model latency) and detailed audit logs for tool-use verification.
3. Manage “Mesh” and “Gateway” observability with a single unified API.

## The Emergence of Agentic Networking
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need a callout for this here, given that in the rest of the document is just specified that there is a need for a Telemetry API standardization, regardless of being Agentic Networking or not?


## The TelemetryPolicy Specification

We propose the `TelemetryPolicy` as a direct policy attachment in the `agentic.networking.k8s.io` API group. See [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/#classes-of-policies) for more information on Direct attachment.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this API is too generic to belong to this specific group, maybe start with x-k8s.io?

accessLogs:
enabled: true
matches: # Conditional logging
- cel: "response.code >= 500" # CEL-based filtering for errors
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I am missing on this proposal is what is required and what is extended. Given this may become a Gateway API specification, we should at least define what is expected to be a core feature or not for this API.

eg.: The CEL matching here may not be implementable by all shippers.


## The TelemetryPolicy Specification

We propose the `TelemetryPolicy` as a direct policy attachment in the `agentic.networking.k8s.io` API group. See [GEP-713](https://gateway-api.sigs.k8s.io/geps/gep-713/#classes-of-policies) for more information on Direct attachment.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, WHY a policy attachment. Why are you chosing this approach instead of inline Gateway configuration? Can you be more explicit on your rationale behind this decision?

@markfisher1
Copy link
Copy Markdown

No explicit requirement for:

  • Policy decision trace (matched rule + context used)
  • Delegation + tool invocation lineage
  • Cross-agent correlation IDs

Recommend first-class “authorization trace graph” for debugging + compliance.


The following is an example that demonstrates the structure of the `TelemetryPolicy`.

```yaml
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also have the status specification please?

overrides:
- name: "gateway.networking.k8s.io/http/request_count"
type: Counter
dimensions: # Custom labels/dimensions
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this part of the API is slightly confusing and seems somehow like a Prometheus rewrite rule. Is this something supported by OTEL, or some assumption that the Gateway implementation will do the rewrites/overrides here?


type TelemetryPolicySpec struct {
// Identifies the target resources (Gateway or Namespace) to which this policy attaches (GEP-713).
TargetRefs []NamespacedPolicyTargetReference `json:"targetRefs"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can I attach the same policy to both a Gateway and a namespace? what happens? Is there a precedence? How would the status look like in the case of a namespace attachment?

type TracingConfig struct {

// Global switch to enable or disable tracing.
Enabled bool `json:"enabled"`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use bools on APIs. It is bad if you need something more than true/false (eg.: if you want a new Enabled semantic that means "partialEnable").

Please consider a different way to sate that this config is enabled or not (eg.: if tracingProvider is nil or not)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(this comment applies to any usage of bool on this API)

//
// +required
// +listType=atomic
// +kubebuilder:validation:MaxItems=16
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the targetRef is also limited to 16?

@youngnick
Copy link
Copy Markdown

Folks, I think it's very important to be clear. This is your subproject, and you can do what you want. But if you want to be able to take this upstream to Gateway API, you must use the Gateway API GEP process, including the Provisional step.

We will not be accepting a full API without full justification, using the full process. So if you design and implement an API without using the Gateway API process, you're risking that this will never end up upstream. And none of us wants that, this is a useful thing to have configured.

However, this reads like an attempt to force the Gateway API maintainers to do what you want by bring a completed API and implementations to us, and then saying "why are you being so difficult". Maybe it is, maybe it isn't, but that's what it looks like.

We have very good reasons to be wary of adding Policy objects to Gateway API. They are very difficult to get right, and I have huge concerns with using the same Policy object for two very different targets. This Policy is also clearly useful to other usecases aside from Agentic Networking, and so must be designed for those use cases as well.

@LiorLieberman
Copy link
Copy Markdown
Member

The conversation is getting heated without a reason. @youngnick whats the concrete concern? That the template is not Gateway template verbatim?

There are good comments from Ricardo about some missing sections - like why policy and a few others. And we are in agreement that we move this proposal to Gateway (was also discussed in EGADS iiuc).

I am ooo so was slow to respond on gh but chatted with Ricardo on slack this morning and confirmed that as well.

Regarding different layers @youngnick, I agree with you. It does look like starting with Gateway is more simple and more useful, at least for the time being. But happy to hear your thoughts as well.

Agentic Net has concrete needs (and it will continue to have such), a lot of the work here will intersect with Gateway, some will apis will intersect more and some will intersect less. It's ok to have the "is Gateway the right home for this api" discussions and its part of the process.

Anyways, i promised to move that to Gateway EOW.

@youngnick
Copy link
Copy Markdown

I've given this a better read, and here's my thoughts, more specifically:

tl;dr This Policy, as written, is not suitable for upstreaming into Gateway API as it stands.

It solves a problem that exists for most if not all Gateway API users, but builds the agentic parts into the core API. I'd much, much rather see a solution that allows for configuration of a generic telemetry config across HTTP requests (that is, requests where the Gateway implementation has access to an unencrypted HTTP stream), with an extension point (that can be pre-filled with agentic extensions, sure).

I'd also question, again, why it needs to be a Policy. The document assumes that the only way to do this is with Policy, which is true when the code lives in this repo, under this subproject, but is not the case upstream in Gateway API. If we want to configure all the HTTP traffic passing through a Gateway, why not add a top-level struct on the Gateway to do that? We've already done that recently for TLS, and telemetry is a core enough requirement that it makes sense. Maybe it's better to have the sink parameters and other definitely-global config at the Gateway level, and extend the fiddlier bits with something else.

But, that conversation is not really possible when we're not having it in the Gateway API process. So, I was a little forceful before, but I am telling you all that what will definitely not happen is building something here, and then bringing that thing upstream to Gateway and expecting it to get merged unchanged.

I also agree that if you want to target Mesh use cases, then the Mesh resource is probably the right thing to target - but again, that is a thing where a Policy is not required, because once the proposal is in upstream Gateway API, we can inline the fields inside the Mesh resource.

I would also strongly recommend that if you must have a Policy, and are going to allow this Policy to attach to multiple kinds of target, you carefully consider interaction and conflict management. What happens when there are two Policies targeting the same thing? What happens when a Gateway is in a namespace that is managed by a mesh? Do both configs take effect? If not, which one wins? @guicassolato mentions a merge strategy of None, which implies the latter, and which also means that you should use first-object-created conflict resolution, as upstream Gateway API does.

I also looked over @rikatz's API review, and agree with his comments.

In particular, bool in Kubernetes APIs is a bad idea, not only because of the concern he lists about extensibility, but also because of the annoying handling of truth-y values in YAML. It's way better to use a mode enum than a modeEnabled bool, in almost all cases.

@kflynn
Copy link
Copy Markdown

kflynn commented Apr 16, 2026

I have also reread and broadly agree with @youngnick and @rikatz. I'm not going to say much about specific details, because to me those are wildly overwhelmed by the fact that GEP-713 policy is very much not my first choice for how to approach this -- and in my mind, that immediately brings us back around to questions about upstream Gateway API and its processes.

Many of y'all participate in Gateway API pretty actively, but for the benefit of those who don't: if you want to add something to Gateway API - and my sense is that many or most of y'all would like this to be part of Gateway API itself - you start with a Provisional GEP. Provisional GEPs aren't about the details of the API, they're about the big questions: what problems are we trying to solve? why are they important to solve? for whom are we designing the solutions? These are critical, and they're rooted in explorations of user stories rather than API design -- but questions like "should we use GEP-713 policy for this?" are wrapped up in those questions, too.

So the first thing I'm saying here is that given that feeling that there seems to be consensus around moving into Gateway API, a Provisional GEP feels like a better next step than anything else -- because the other thing is that without the Provisional GEP, there isn't a way into Gateway API, and if someone were to present an API design, we would need to back up to the Provisional step where everything is back on the table anyway, because though many of y'all may already have the answers to what, why, and who in your head, the Gateway API folks don't, and will need it explained.

Additionally, as for why GEP-713 isn't my first choice... GEP-713 is really a way of coping with the fact that we can't extend core Kubernetes resources like Service. When we're dealing with resources we do control - like Gateways or Routes - GEP-713 should never be the first thing to reach for. Policies are deceptively easy to create, but they introduce really nasty questions (as @youngnick already mentioned) and every last one we accept as a part of Gateway API itself comes with the burden of supporting it - with all its issues - forever. 🙁 That's why you're seeing pushback there from me, and from others.

@david-martin
Copy link
Copy Markdown
Contributor

For avoidance of doubt, a GEP is/will be opened in gateway-api for this.

/remove-lgtm
/remove-approve
/hold

Keeping open for any remaining agentic specific discussion to conclude before closing.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 21, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gkhom
Once this PR has been reviewed and has the lgtm label, please assign liorlieberman for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot removed the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 21, 2026
@sjberman
Copy link
Copy Markdown

sjberman commented Apr 24, 2026

Wanted to chime in since I don't see NGINX considered yet. We currently support OTEL tracing through an ObservabilityPolicy. This enables tracing for a Route.

We also have a global nginx CRD where a user would set their exporter URL for the Gateway. This design was originally so a user didn't have to set this value on every single policy, and instead just set it globally. This CRD also allows for setting global span attributes for the Gateway.

Here is our tracing document on how we set everything up.

We also have considerations for future enhancements to our own module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. kind/documentation Categorizes issue or PR as related to documentation. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.