Component(s)
pkg/ottl
What happened?
Description
Substring slices its target by bytes, not by characters. ASCII inputs happen to work. Multibyte UTF-8 inputs (like CJK or emoji) either return invalid UTF-8 when the slice cuts mid-rune or a string shorter than the user asked for.
Steps to Reproduce
use v0.151.0 (likely exists in all less current versions)
Reproducible on https://ottl.run/ with the config and payload below.
payload:
{
"resourceLogs": [{
"resource": {},
"scopeLogs": [{
"scope": {},
"logRecords": [{
"timeUnixNano": "1700000000000000000",
"body": {"stringValue": "test"},
"attributes": [
{"key": "greeting", "value": {"stringValue": "日本語"}}
]
}]
}]
}]
}
config:
transform:
log_statements:
- context: log
statements:
- set(attributes["first_char"], Substring(attributes["greeting"], 0, 1))
- set(attributes["three"], Substring(attributes["greeting"], 0, 3))
Expected Result
{
"resourceLogs": [
{
"resource": {},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1700000000000000000",
"body": {
"stringValue": "test"
},
"attributes": [
{
"key": "greeting",
"value": {
"stringValue": "日本語"
}
},
{
"key": "first_char",
"value": {
"stringValue": "日"
}
},
{
"key": "three",
"value": {
"stringValue": "日本語"
}
}
]
}
]
}
]
}
]
}
Actual Result
{
"resourceLogs": [
{
"resource": {},
"scopeLogs": [
{
"scope": {},
"logRecords": [
{
"timeUnixNano": "1700000000000000000",
"body": {
"stringValue": "test"
},
"attributes": [
{
"key": "greeting",
"value": {
"stringValue": "日本語"
}
},
{
"key": "first_char",
"value": {
"stringValue": "�"
}
},
{
"key": "three",
"value": {
"stringValue": "日"
}
}
]
}
]
}
]
}
]
}
Collector version
v0.151.0
Environment information
OS: alpine:3.22.4
OpenTelemetry Collector configuration
Log output
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Component(s)
pkg/ottl
What happened?
Description
Substring slices its target by bytes, not by characters. ASCII inputs happen to work. Multibyte UTF-8 inputs (like CJK or emoji) either return invalid UTF-8 when the slice cuts mid-rune or a string shorter than the user asked for.
Steps to Reproduce
use v0.151.0 (likely exists in all less current versions)
Reproducible on https://ottl.run/ with the config and payload below.
payload:
{ "resourceLogs": [{ "resource": {}, "scopeLogs": [{ "scope": {}, "logRecords": [{ "timeUnixNano": "1700000000000000000", "body": {"stringValue": "test"}, "attributes": [ {"key": "greeting", "value": {"stringValue": "日本語"}} ] }] }] }] }config:
Expected Result
{ "resourceLogs": [ { "resource": {}, "scopeLogs": [ { "scope": {}, "logRecords": [ { "timeUnixNano": "1700000000000000000", "body": { "stringValue": "test" }, "attributes": [ { "key": "greeting", "value": { "stringValue": "日本語" } }, { "key": "first_char", "value": { "stringValue": "日" } }, { "key": "three", "value": { "stringValue": "日本語" } } ] } ] } ] } ] }Actual Result
{ "resourceLogs": [ { "resource": {}, "scopeLogs": [ { "scope": {}, "logRecords": [ { "timeUnixNano": "1700000000000000000", "body": { "stringValue": "test" }, "attributes": [ { "key": "greeting", "value": { "stringValue": "日本語" } }, { "key": "first_char", "value": { "stringValue": "�" } }, { "key": "three", "value": { "stringValue": "日" } } ] } ] } ] } ] }Collector version
v0.151.0
Environment information
OS: alpine:3.22.4
OpenTelemetry Collector configuration
Log output
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding
+1orme too, to help us triage it. Learn more here.