Skip to content

*: add keyspace observability labels#68404

Open
zeminzhou wants to merge 8 commits into
pingcap:release-nextgen-202603from
zeminzhou:cluster-information2
Open

*: add keyspace observability labels#68404
zeminzhou wants to merge 8 commits into
pingcap:release-nextgen-202603from
zeminzhou:cluster-information2

Conversation

@zeminzhou
Copy link
Copy Markdown
Contributor

@zeminzhou zeminzhou commented May 15, 2026

What problem does this PR solve?

Issue Number: close #68405

Problem Summary:

NextGen deployments need keyspace identity labels for metrics. Starter mode also needs configured keyspace metadata values to be resolved at TiDB startup and attached to metrics, slow logs, and statement summary logs.

What changed and how does it work?

  • Add keyspace observability config mapping from keyspace metadata keys to metric labels, slow log fields, and statement log fields.
  • Resolve keyspace metadata during TiDB startup for NextGen TiKV deployments.
  • For NextGen non-Starter mode, add keyspace ID and keyspace name to metrics labels.
  • For NextGen Starter mode, additionally attach configured keyspace metadata values to metrics, slow logs, and statement summary logs.
  • Keep BR metrics registration behavior compatible with existing keyspace ID handling.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Add keyspace observability configuration for NextGen deployments.

Summary by CodeRabbit

  • New Features

    • Keyspace observability: map keyspace metadata to Prometheus metric labels, slow-query fields, and statement-log fields; applied at startup and during metrics init.
  • Refactor

    • Centralized metrics/label handling and statement-log JSON shaping; TiKV driver PD options extracted to helper.
  • Documentation

    • Added commented example configuration showing keyspace-observability mappings.
  • Tests

    • Expanded unit tests for resolution, merging, slow-log and statement-log serialization.
  • Chores

    • Updated build/test targets and sharding for new tests.

Review Change Stack

@ti-chi-bot ti-chi-bot Bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels May 15, 2026
@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 15, 2026

Hi @zeminzhou. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 15, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds keyspace observability: new config schema and resolution, startup PD metadata fetch for NextGen, updates global config with resolved metric labels and optional log fields, integrates resolved values into metrics registration and slow/statement log serialization, and adds tests and BUILD updates.

Changes

Keyspace Observability

Layer / File(s) Summary
Configuration schema and validation
pkg/config/config.go, pkg/config/config_test.go, pkg/config/config.toml.example, pkg/config/config.toml.nextgen.example
Defines KeyspaceObservability types, validation rules, resolution into KeyspaceObservabilityValues, clone/getter helpers, and tests for valid/invalid cases.
Keyspace observability startup initialization
cmd/tidb-server/main.go, cmd/tidb-server/main_test.go, cmd/tidb-server/BUILD.bazel
Calls prepareKeyspaceObservability() during startup (NextGen + TiKV + keyspace name), constructs a PD client with TLS/timeout, retries PD fetches, resolves metadata into metric labels and optional slow/stmt-log fields, and updates global config. Includes tests for Starter/Non-Starter and classic-mode skip.
Metric registration and PD client options
pkg/util/metricsutil/common.go, pkg/metrics/common/wrapper.go, pkg/util/metricsutil/common_test.go, pkg/util/metricsutil/BUILD.bazel, pkg/store/driver/tikv_driver.go, pkg/store/driver/config_test.go
Refactors registerMetrics() to clone existing const labels, merge resolved keyspace metric labels, and apply them before initializing metrics. Adds SetConstLabelsFromMap, updates TiKV driver to inject metric labels into PD client options, and adds tests and BUILD updates.
Slow log and statement-summary field integration
pkg/sessionctx/variable/slow_log.go, pkg/sessionctx/variable/tests/session_test.go, pkg/util/stmtsummary/v2/logger.go, pkg/util/stmtsummary/v2/record_test.go, pkg/util/stmtsummary/v2/BUILD.bazel
SlowLogFormat now appends configured slow-log key/value fields. Statement-summary logging uses marshalStmtRecord to optionally reshape JSON payloads by inserting configured stmt-log fields resolved from keyspace metadata. Tests assert emitted fields and serialized JSON.
BUILD/test shard adjustments
pkg/config/BUILD.bazel, cmd/tidb-server/BUILD.bazel, pkg/util/metricsutil/BUILD.bazel, pkg/util/stmtsummary/v2/BUILD.bazel
Increases some shard_count values and updates test deps to include //pkg/config and //pkg/metrics/common where new tests require them.

🎯 4 (Complex) | ⏱️ ~60 minutes

Sequence Diagram

sequenceDiagram
  participant TiDBMain
  participant PDClient
  participant PD
  participant Config
  participant MetricsCommon
  TiDBMain->>PDClient: create PD client (TLS, timeout)
  TiDBMain->>PD: fetch KeyspaceMeta via PDClient
  PD->>TiDBMain: KeyspaceMeta
  TiDBMain->>Config: prepareKeyspaceObservabilityWithKeyspaceMeta -> UpdateGlobal
  Config->>MetricsCommon: GetKeyspaceObservabilityMetricLabels()
  MetricsCommon->>MetricsCommon: SetConstLabelsFromMap(merged) and initMetrics()
Loading

Possibly related PRs

  • pingcap/tidb#67030: Modifies pkg/sessionctx/variable/slow_log.go and is related to slow-log formatting paths.

Suggested labels

ok-to-test, type/cherry-pick-for-release-nextgen-202603

Suggested reviewers

  • joechenrh
  • GMHDBJD
  • D3Hunter
  • yudongusa

"I hopped through PD to find the keys,
stitched labels softly in the midnight breeze,
metrics wear their keyspace name,
logs now hold the fields I claim,
a rabbit cheers for observability." 🐇

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.63% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The PR title '*: add keyspace observability labels' directly reflects the main change, which is adding keyspace observability label support throughout the codebase for NextGen deployments.
Description check ✅ Passed The PR description fully addresses the required template sections with Issue Number, Problem Summary, What Changed, comprehensive test coverage confirmation, and release note. It clearly documents the feature and its scope.
Linked Issues check ✅ Passed The PR successfully implements all requirements from #68405: configurable keyspace observability mapping (config.go, config_test.go), metadata resolution at startup (main.go), keyspace ID/name labels for NextGen non-Starter (metricsutil), slow/stmt log field attachment for Starter (slow_log.go, logger.go), and BR compatibility (metricsutil).
Out of Scope Changes check ✅ Passed All changes are directly aligned with keyspace observability feature requirements: configuration system, startup resolution, metric label integration, logging integration, and test coverage. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/tidb-server/main_test.go`:
- Around line 202-213: The test
TestSetupKeyspaceObservabilityForStartSkipsClassic should be gated to run only
in classic mode to avoid environment-dependent failures: at the start of
TestSetupKeyspaceObservabilityForStartSkipsClassic, call kerneltype.IsClassic()
and if it returns false call t.Skip with a brief reason so the test only
exercises the classic short-circuit path; this change is minimal and keeps the
rest of the test (the config.Update, prepareKeyspaceObservability call, and
assertions) unchanged.

In `@cmd/tidb-server/main.go`:
- Around line 1227-1229: The current copy
(maps.Copy(resolvedValues.MetricLabels, configuredValues.MetricLabels)) allows
user-configured labels to overwrite reserved labels like "keyspace_id" and
"keyspace_name"; change the logic so reserved keys are preserved by filtering
out those keys from configuredValues.MetricLabels before copying (or by copying
then restoring original reserved values from resolvedValues.MetricLabels).
Locate the block around copiedConfig.KeyspaceObservabilityValues.Clone(),
configuredValues.MetricLabels and maps.Copy and implement a filter that excludes
"keyspace_id" and "keyspace_name" (or ensures resolvedValues' reserved entries
are not overwritten) when populating resolvedValues.MetricLabels.

In `@pkg/sessionctx/variable/tests/session_test.go`:
- Around line 388-401: The test currently calls restore := config.RestoreFunc()
but only invokes restore() at the end, which can leave global config mutated if
an earlier assertion fails; after obtaining restore (restore :=
config.RestoreFunc()) call defer restore() or t.Cleanup(restore) immediately so
the global config is always restored even on test failures—ensure this is done
before calling config.UpdateGlobal or require.NoError(t,
conf.ResolveKeyspaceObservability(...)) so the cleanup guarantees restoring the
original config.

In `@pkg/util/metricsutil/common.go`:
- Around line 92-94: GetConstLabels() may return a nil map so cloning and
writing into it can panic; before modifying or copying, ensure you initialize a
non-nil map. In the block that builds labels for defaultKeyspaceLabel (around
metricsutil/common.go where you call maps.Clone(metricscommon.GetConstLabels())
and then set labels[defaultKeyspaceLabel] = fmt.Sprint(keyspaceMeta.GetId())),
replace the direct clone with creating a new map when Clone returns nil (or
always allocate a new map and then maps.Copy into it) and then call
metricscommon.SetConstLabelsFromMap(labels). Do the same for the other
occurrence that uses maps.Copy(): ensure the destination map is non-nil
(allocate make(map[string]string)) before calling maps.Copy or writing keys.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 40faa2a3-a280-4b6e-a6c1-73fb04b01cb5

📥 Commits

Reviewing files that changed from the base of the PR and between 56ea7a2 and ce59bc2.

📒 Files selected for processing (17)
  • cmd/tidb-server/BUILD.bazel
  • cmd/tidb-server/main.go
  • cmd/tidb-server/main_test.go
  • pkg/config/BUILD.bazel
  • pkg/config/config.go
  • pkg/config/config.toml.example
  • pkg/config/config.toml.nextgen.example
  • pkg/config/config_test.go
  • pkg/metrics/common/wrapper.go
  • pkg/sessionctx/variable/slow_log.go
  • pkg/sessionctx/variable/tests/session_test.go
  • pkg/util/metricsutil/BUILD.bazel
  • pkg/util/metricsutil/common.go
  • pkg/util/metricsutil/common_test.go
  • pkg/util/stmtsummary/v2/BUILD.bazel
  • pkg/util/stmtsummary/v2/logger.go
  • pkg/util/stmtsummary/v2/record_test.go

Comment thread cmd/tidb-server/main_test.go
Comment thread cmd/tidb-server/main.go
Comment thread pkg/sessionctx/variable/tests/session_test.go
Comment thread pkg/util/metricsutil/common.go Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented May 15, 2026

Codecov Report

❌ Patch coverage is 87.56219% with 25 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release-nextgen-202603@c79551e). Learn more about missing BASE report.

Additional details and impacted files
@@                     Coverage Diff                     @@
##             release-nextgen-202603     #68404   +/-   ##
===========================================================
  Coverage                          ?   77.5998%           
===========================================================
  Files                             ?       1963           
  Lines                             ?     544613           
  Branches                          ?          0           
===========================================================
  Hits                              ?     422619           
  Misses                            ?     121142           
  Partials                          ?        852           
Flag Coverage Δ
unit 76.2162% <87.5621%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 61.5065% <0.0000%> (?)
parser ∅ <0.0000%> (?)
br 61.0028% <0.0000%> (?)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 18, 2026

@ChangRui-Ryan: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Comment thread cmd/tidb-server/main.go
@ChangRui-Ryan
Copy link
Copy Markdown
Contributor

/retest

@tiprow
Copy link
Copy Markdown

tiprow Bot commented May 20, 2026

@ChangRui-Ryan: PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test.

Details

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Copy Markdown
Contributor

@yibin87 yibin87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ti-chi-bot ti-chi-bot Bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label May 21, 2026
@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 21, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-05-21 10:01:02.872317589 +0000 UTC m=+31193.513174392: ☑️ agreed by yibin87.

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot Bot commented May 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ChangRui-Ryan, yibin87
Once this PR has been reviewed and has the lgtm label, please assign nolouch, terry1purcell for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants