Skip to content

Common sql comment extractor oracle#48404

Open
tharun0064 wants to merge 8 commits into
open-telemetry:mainfrom
newrelic-forks:common-extractor-package-oracle
Open

Common sql comment extractor oracle#48404
tharun0064 wants to merge 8 commits into
open-telemetry:mainfrom
newrelic-forks:common-extractor-package-oracle

Conversation

@tharun0064
Copy link
Copy Markdown

Description

This change enhances the Oracle DB receiver by adding SQL comment extraction support for APM correlation. The
receiver can now extract key-value pairs from leading SQL block comments (e.g., /* traceparent=... */) and include
them as telemetry attributes, enabling correlation between database queries and distributed traces.

What Changed

New Configuration Option: allowed_comment_keys

  • Available for both top_query_collection and query_sample_collection
  • Specifies which comment keys to extract from SQL queries
  • Secure by default: if empty or not specified, no comments are extracted

New Attribute: query.comments

  • Added to both db.server.top_query and db.server.query_sample events
  • Contains comma-separated key=value pairs extracted from SQL comments
  • Only includes keys specified in allowed_comment_keys configuration

New Package: internal/common/sqlcomments

  • Reusable SQL comment extraction logic for APM correlation
  • Can be used by other database receivers in the future
  • Implements secure parsing with explicit allowlist filtering

Use Cases

Distributed Tracing Correlation

  • Applications can embed trace context in SQL comments (e.g., /* traceparent=00-trace-id-span-id-01 */)
  • The receiver extracts these identifiers and includes them as attributes
  • Enables linking database performance metrics to specific application traces

Application Context Propagation

  • Pass custom application metadata through SQL comments
  • Examples: request IDs, user IDs, feature flags, environment tags
  • Correlate query performance with application-level context

Multi-tenant Performance Analysis

  • Identify queries by tenant/customer ID embedded in comments
  • Analyze performance patterns per tenant
  • Detect noisy neighbor issues in multi-tenant databases

Deployment and Feature Tracking

  • Tag queries with deployment version or feature flags
  • Track performance impact of new releases
  • Identify queries from specific application versions

A/B Testing and Experimentation

  • Correlate query performance with experiment variants
  • Measure database impact of feature experiments
  • Analyze performance across cohorts

Configuration Example

  receivers:
    oracledb:
      endpoint: localhost:1521
      service: ORCL
      username: otel
      password: ${ORACLE_PASSWORD}

      top_query_collection:
        enabled: true
        top_query_count: 10
        allowed_comment_keys:
          - traceparent
          - trace_id
          - request_id

      query_sample_collection:
        enabled: true
        allowed_comment_keys:
          - traceparent
          - tenant_id

Security Considerations

  • Explicit Allowlist: Only keys specified in allowed_comment_keys are extracted
  • No Extraction by Default: If allowed_comment_keys is empty/unset, no comment extraction occurs
  • Leading Comments Only: Only extracts from leading /* */ block comments, not inline or trailing comments
  • No Value Validation: Values are extracted as-is; downstream processors should validate/sanitize

Link to tracking issue

Fixes #48338

Testing

Unit Tests

  • SQL comment extraction logic (internal/common/sqlcomments/extractor_test.go)
    • Multiple leading comments
    • Inline and trailing comments (should be ignored)
    • Key filtering with allowlists
    • Malformed comment handling
    • Special characters in keys/values
    • Empty input edge cases
  • Receiver integration with comment extraction
    • Comments extracted for top query collection
    • Comments extracted for query sample collection
    • Empty allowlist returns empty string
    • Integration with existing log record creation

Integration Tests

  • Golden file tests updated with query.comments attribute
    • testdata/expectedQueryTextAndPlanQuery.yaml
    • testdata/expectedSamplesFile.yaml
  • Mock data includes realistic comment formats
  • Metadata generation tests validate new attribute

Manual Validation

  • Tested with Oracle database and real SQL comments
  • Verified extraction works with W3C traceparent format
  • Confirmed secure by default behavior (empty config → no extraction)
  • Validated comment filtering respects allowlist configuration

Test Coverage

internal/common/sqlcomments/extractor.go 100% coverage
receiver/oracledbreceiver/scraper.go Integration tested

Documentation

  • Updated receiver/oracledbreceiver/documentation.md
    • Added query.comments attribute description to events table
    • Documented attribute type and format
    • Updated configuration schema (config.schema.yaml)
      • Added allowed_comment_keys to top_query_collection
      • Added allowed_comment_keys to query_sample_collection
      • Included descriptions and array type specifications
    • Updated metadata definition (metadata.yaml)
      • Added query.comments attribute with full description
      • Included attribute in both event types
    • Changelog entry created (.chloggen/oracledb-sql-comment-extraction.yaml)

  Add support for extracting and filtering SQL query comments to enable correlation
  between database monitoring and APM traces. This feature allows users to configure
  allowlists of comment keys that should be extracted from SQL queries and included
  as telemetry attributes.

  Changes:
  - Add internal/common/sqlcomments package for extracting key-value pairs from
    leading SQL block comments
  - Add AllowedCommentKeys configuration to both TopQueryCollection and QuerySample
    in oracledbreceiver config
  - Extract and filter comments in both top query collection and query sampling
  - Add query.comments attribute to db.server.query_sample and db.server.top_query
    events in metadata
  - Add unit tests for comment extraction functionality

  The implementation is secure by default - only comments with keys explicitly listed
  in allowed_comment_keys configuration will be extracted. Empty allowlist returns
  empty string.

  This enables correlation with APM tools that inject metadata into SQL comments
@github-actions github-actions Bot added the first-time contributor PRs made by new contributors label May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Welcome, contributor! Thank you for your contribution to opentelemetry-collector-contrib.

Important reminders:

  • Read our Contributing Guidelines.
  • Sign the CLA if you haven't already.
  • First-time contributors should have at most one PR not marked as draft until their first PR is merged.
  • If your change isn't one of our priority components, reviews may take more time.
  • Give reviewers at least a few days before pinging them for feedback.
  • If you need help or struggle to move your PR forward, raise the topic on #otel-collector-dev or a Collector SIG meeting.

Comment on lines +184 to +185
query.comments:
description: Filtered SQL query comments extracted from leading block comments. Contains comma-separated key=value pairs for keys specified in allowed_comment_keys configuration. Used for correlation with APM traces.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would much rather see this as a templated attribute with a base name of db.query.comment.<Key> and the value simply being the value which corresponds to that key.

Copy link
Copy Markdown
Author

@tharun0064 tharun0064 May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @thompson-tomo , have thought of this approach but all the attributes must be defined in metadata.yml and key name is dynamic and can't be pre-defined in metadata.yml
WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you need to concat the attribute & the key in your reciever.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, i understand the approach and it would definitely make these values more queryable in observability backends.

but what i'am trying to say is

For example, if a user has this SQL query:

  /* key1=val1,key2=val2,key3=val3 */
  SELECT * FROM exampleTable

And configures:

  allowed_comment_keys:
    - key1
    - key3

We would extract and send key1=val1,key3=val3 in the query.comments attribute. Since we can't know in advance which keys users will configure, we can't pre-define them as individual attributes in metadata.yaml.

The Collector's metadata generation system (mdatagen) requires all attribute names to be predefined in metadata.yaml, which doesn't support truly dynamic attribute names.

What are your thoughts? If you feel strongly about the templated attributes approach, I'm happy to explore some approaches which might add complexity to work around the mdatagen generated code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand but I don't think you need to define them in mdatagen but rather just define the base key as an opt-in.

One of the big issues by concating key-value pairs as the order is now important and usability is poor.

Note templated attributes are heavily used in the k8s attributes https://opentelemetry.io/docs/specs/semconv/registry/attributes/k8s/ hence it should be a supported option.

}

// Join filtered pairs with comma (no space)
return strings.Join(filteredPairs, ",")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return strings.Join(filteredPairs, ",")
return filteredPairs

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, i want to understand about the suggested change more please?
The sql comments is designed to float in comma seperated key value pairs in query.comments attr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It goes hand in hand with the other comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Query comments extractor

3 participants