feat: add v1-registry-sync dry-run tool for syncing V2 data into V1 entries#467
Conversation
…ntries Adds a new ecosystem-automation/v1-registry-sync Python package that reads the latest V2 registry snapshot and generates a report showing which stability, display_name, and description values would be written into matching V1 entries under opentelemetry.io/data/registry/. The tool runs in dry-run mode only for now and outputs a JSON or YAML report. It selects the most stable signal level across all signals for each component (stable > beta > alpha > development > deprecated > unmaintained) and omits null fields from the output. 18 unit tests cover the reader and reporter modules. Closes open-telemetry#465
✅ Deploy Preview for otel-ecosystem-explorer ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
Adds a new v1-registry-sync Python workspace package that reads the latest V2 collector registry snapshot and emits a dry-run JSON/YAML report of the stability, display_name, and description values that would be synced into the matching V1 entries on opentelemetry.io. No writes to V1 are performed in this PR.
Changes:
- New
v1-registry-syncworkspace package withmodels,reader,reporter, andmainmodules; per-component "most stable" stability is selected via a fixed priority list (stable > beta > alpha > development > deprecated > unmaintained). - CLI entry point
v1-registry-syncwith--inventory-dir,--distribution,--output, and--formatflags. - 18 unit tests for reader and reporter using a synthetic V2 registry layout.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Registers the new v1-registry-sync package in the uv workspace. |
| ecosystem-automation/v1-registry-sync/pyproject.toml | Project metadata, deps (PyYAML, semantic-version), and CLI script entry. |
| ecosystem-automation/v1-registry-sync/src/v1_registry_sync/init.py | Apache header for the package module. |
| ecosystem-automation/v1-registry-sync/src/v1_registry_sync/models.py | ComponentSyncData / V1SyncReport dataclasses and stability priority list. |
| ecosystem-automation/v1-registry-sync/src/v1_registry_sync/reader.py | Locates latest v{version} dir per distribution and parses component YAMLs. |
| ecosystem-automation/v1-registry-sync/src/v1_registry_sync/reporter.py | Writes the report to a stream as JSON or YAML. |
| ecosystem-automation/v1-registry-sync/src/v1_registry_sync/main.py | argparse CLI entry point with logging. |
| ecosystem-automation/v1-registry-sync/tests/init.py | Apache header for tests package. |
| ecosystem-automation/v1-registry-sync/tests/test_reader.py | Tests for stability priority, latest-version selection, and parsing. |
| ecosystem-automation/v1-registry-sync/tests/test_reporter.py | Tests for JSON/YAML report serialization and proposed-changes filtering. |
|
Hi @Rama542, a few things worth checking before this lands. 1.
2. The report does not say which V1 entry would be updated Each entry reads as It could be worth adding a 3. Could reuse
Picking up Smaller notes
|
- Remove stability from proposed_v1_changes: the V1 schema declares additionalProperties false and has no stability field, so the validator would reject any entry containing it - Remove title/display_name from proposed_v1_changes: a handful of V1 titles carry more information than the V2 display_name (e.g. otelarrowexporter), so limiting the initial sync to description avoids losing fidelity - Add target_v1_file and v1_entry_exists to each report entry so the dry-run output is directly actionable - Replace local _find_latest_version and _parse_component_file with InventoryManager from collector-watcher, which is the same pattern used by explorer-db-builder and configuration-watcher; this also fixes a latent issue where the old helper could pick up SNAPSHOT directories since it sorted all version dirs including pre-releases - Add --v1-registry-dir CLI argument to enable v1_entry_exists checks against a local clone of opentelemetry.io/data/registry - Add README.md to match sibling watcher packages
|
Six changes look good as applied! One follow-up on In my opinion the cleanest fix would be to join on the Go module path, which both registries already carry:
Building a Including One related note: the existing |
The previous target_v1_file used f"collector-{name}.yml" but actual V1
files follow collector-{component_type}-{slug}.yml, so v1_entry_exists
was returning false for nearly every component.
The fix builds a dict[go_module_path -> v1_filename] at startup by
reading the package.name field from each V1 file. Each V2 component's
expected module path is constructed as:
github.com/open-telemetry/opentelemetry-collector-contrib/{type}/{name}
Matching on the module path is consistent across both registries and
avoids naming-convention guesswork. Across 249 contrib components in
v0.151.0, 244 match this way; the 5 that do not (azurefunctionsreceiver,
googlesecopsexporter, drainprocessor, spanpruningprocessor,
datadogconnector) are genuinely missing from V1, not matcher bugs.
expected_go_module_path is also included on every report row so misses
are easy to triage. The test fixture now uses realistic V1 file names
(collector-receiver-fooreceiver.yml) instead of the old wrong convention.
|
Good catch, thank you. The naming convention approach was broken because actual V1 files follow collector-{component_type}-{slug}.yml, not collector-{name}.yml, so v1_entry_exists was returning false for almost everything. Switched to matching on Go module path instead. At startup the tool now reads every .yml file in --v1-registry-dir and builds a {package.name -> filename} index. Each V2 component's expected module path is constructed as github.com/open-telemetry/opentelemetry-collector-contrib/{component_type}/{name} and looked up in the index to find the actual V1 file. expected_go_module_path is included on every row so misses are easy to triage. I also updated the test fixture to use a realistic V1 filename (collector-receiver-fooreceiver.yml) so the test covers the actual matching logic rather than the old wrong convention. |
lucacavenaghi97
left a comment
There was a problem hiding this comment.
The Go module path matching reads as I'd expect, and all the follow-up points from both rounds are now addressed. Approving.
Worth merging main into the branch before this lands. It's a few commits behind.
Summary
This PR adds a new ecosystem-automation/v1-registry-sync Python package as a
first step toward issue #465.
The tool reads the latest V2 registry snapshot from ecosystem-registry/collector/
and generates a report showing exactly which stability, display_name, and
description values would be written into matching V1 entries under
opentelemetry.io/data/registry/. For stability, it selects the most stable signal
level across all signals per component (stable > beta > alpha > development >
deprecated > unmaintained).
This version runs in dry-run mode only. The actual write step to opentelemetry.io
would require coordination with the opentelemetry.io maintainers on the V1 schema
and PR workflow, and is left for a follow-up.
What was added
How to run
From the repo root:
uv run v1-registry-sync --distribution contrib --format json
Closes #465