Skip to content

[processor/genainormalizer] Fix OpenInference flattened message normalization#48442

Closed
Jwrede wants to merge 1 commit into
open-telemetry:mainfrom
Jwrede:fix/genainormalizer-openinference-messages
Closed

[processor/genainormalizer] Fix OpenInference flattened message normalization#48442
Jwrede wants to merge 1 commit into
open-telemetry:mainfrom
Jwrede:fix/genainormalizer-openinference-messages

Conversation

@Jwrede
Copy link
Copy Markdown

@Jwrede Jwrede commented May 17, 2026

Summary

Fixes #48421

OpenInference represents messages as flattened indexed span attributes (llm.input_messages.0.message.role, llm.input_messages.0.message.content, etc.) but the processor only performed exact key matching against llm.input_messages, which never appears as a literal attribute key. This was a silent no-op for all message normalization.

Changes:

  • Add message reconstruction that collects flattened llm.{input,output}_messages.N.message.* attributes and rebuilds them into GenAI semconv JSON (gen_ai.input.messages / gen_ai.output.messages)
  • Support text content, tool call requests, and tool call responses
  • Remove broken exact-match entries from the lookup table
  • Add attributeAggregator interface to the processor for multi-attribute-to-one transformations
  • Add unit tests and end-to-end test matching the real-world format from the issue

Test plan

  • Unit tests for message reconstruction (simple messages, tool calls, overwrite/remove behavior)
  • End-to-end test through createTracesProcessor with the exact attribute format from the issue
  • All existing tests pass unchanged
  • go vet ./... clean

…lization

OpenInference emits messages as flattened indexed span attributes
(e.g. llm.input_messages.0.message.role) but the processor only
performed exact key matching against "llm.input_messages", which
never appeared as a literal attribute key.

This adds message reconstruction that collects the flattened attributes,
rebuilds the message structure, and emits gen_ai.input.messages /
gen_ai.output.messages as JSON strings following the GenAI semconv
message schema.

Fixes open-telemetry#48421
@Jwrede Jwrede requested review from a team and TylerHelmuth as code owners May 17, 2026 09:02
@github-actions
Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

  • Read our Contributing Guidelines.
  • Sign the CLA if you haven't already.
  • First-time contributors should have at most one PR not marked as draft until their first PR is merged.
  • If your change isn't one of our priority components, reviews may take more time.
  • Give reviewers at least a few days before pinging them for feedback.
  • If you need help or struggle to move your PR forward, raise the topic on #otel-collector-dev or a Collector SIG meeting.

@TylerHelmuth
Copy link
Copy Markdown
Member

@Jwrede the genainormalizer is still being donated to Contrib (see #46069). We are opening issues as we add the component, but we are not ready for those issues to be worked. The component will likely continue to undergo serious change as it is being donated and we aren't ready to accept other PRs for it yet. I'm going to close this for now, but look for the issue to be workable once the on-hold label is removed.

@Jwrede
Copy link
Copy Markdown
Author

Jwrede commented May 18, 2026

Thanks for the context @TylerHelmuth -- makes sense. I will hold off until the donation is complete and the component stabilizes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[processor/genainormalizer] OpenInference input/output message are not normalized due to flattened indexed attribute format

3 participants