Skip to content

[pkg/ottl] The Substring function corrupts multibyte UTF-8 strings (byte based slicing) #48436

@Nagato-Yuzuru

Description

@Nagato-Yuzuru

Component(s)

pkg/ottl

What happened?

Description

Substring slices its target by bytes, not by characters. ASCII inputs happen to work. Multibyte UTF-8 inputs (like CJK or emoji) either return invalid UTF-8 when the slice cuts mid-rune or a string shorter than the user asked for.

Steps to Reproduce

use v0.151.0 (likely exists in all less current versions)

Reproducible on https://ottl.run/ with the config and payload below.

payload:

{
    "resourceLogs": [{
      "resource": {},
      "scopeLogs": [{
        "scope": {},
        "logRecords": [{
          "timeUnixNano": "1700000000000000000",
          "body": {"stringValue": "test"},
          "attributes": [
            {"key": "greeting", "value": {"stringValue": "日本語"}}
          ]
        }]
      }]
    }]
  }

config:

transform:
  log_statements:
    - context: log
      statements:
        - set(attributes["first_char"], Substring(attributes["greeting"], 0, 1))
        - set(attributes["three"], Substring(attributes["greeting"], 0, 3))

Expected Result

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "timeUnixNano": "1700000000000000000",
              "body": {
                "stringValue": "test"
              },
              "attributes": [
                {
                  "key": "greeting",
                  "value": {
                    "stringValue": "日本語"
                  }
                },
                {
                  "key": "first_char",
                  "value": {
                    "stringValue": ""
                  }
                },
                {
                  "key": "three",
                  "value": {
                    "stringValue": "日本語"
                  }
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Actual Result

{
  "resourceLogs": [
    {
      "resource": {},
      "scopeLogs": [
        {
          "scope": {},
          "logRecords": [
            {
              "timeUnixNano": "1700000000000000000",
              "body": {
                "stringValue": "test"
              },
              "attributes": [
                {
                  "key": "greeting",
                  "value": {
                    "stringValue": "日本語"
                  }
                },
                {
                  "key": "first_char",
                  "value": {
                    "stringValue": ""
                  }
                },
                {
                  "key": "three",
                  "value": {
                    "stringValue": ""
                  }
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

Collector version

v0.151.0

Environment information

OS: alpine:3.22.4

OpenTelemetry Collector configuration

Log output

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageNew item requiring triagepkg/ottl

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions