feat: add Tinybird pipes for organization page#4128
Conversation
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
|
Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability. Example:
Projects:
Please add a Jira issue key to your PR title. |
…page pipes Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
There was a problem hiding this comment.
Pull request overview
Adds new Tinybird datasources and pipes to power the /organization/{orgId} frontend page, including nightly precomputed COPY datasources for cheaper request-time lookups and several request-time endpoints scoped by orgId.
Changes:
- Introduces nightly COPY pipes + ReplacingMergeTree datasources for org KPIs and org→project metrics.
- Adds request-time pipes for org profile, KPIs, projects list, activity timeseries, contributor timeseries, and contributor leaderboard.
- Routes org activity-based queries through
activityRelations_deduplicated_cleaned_bucket_union.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| services/libs/tinybird/pipes/org_page_projects.pipe | Request-time pipe to list projects for an org from precomputed copy DS. |
| services/libs/tinybird/pipes/org_page_projects_copy_pipe.pipe | Nightly COPY pipe to build org→project metrics datasource. |
| services/libs/tinybird/pipes/org_page_profile.pipe | Request-time pipe to fetch org profile + website/domain. |
| services/libs/tinybird/pipes/org_page_kpis.pipe | Request-time pipe to fetch KPI values + trends from precomputed copy DS. |
| services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe | Nightly COPY pipe to build org KPI datasource. |
| services/libs/tinybird/pipes/org_page_contributors.pipe | Request-time contributors leaderboard for an org (count + paginated data). |
| services/libs/tinybird/pipes/org_page_contributors_timeseries.pipe | Request-time contributor-count timeseries by granularity. |
| services/libs/tinybird/pipes/org_page_activities_timeseries.pipe | Request-time activity-count timeseries by granularity. |
| services/libs/tinybird/datasources/org_page_projects_copy_ds.datasource | New ReplacingMergeTree datasource for org→project metrics. |
| services/libs/tinybird/datasources/org_page_kpis_copy_ds.datasource | New ReplacingMergeTree datasource for org-level KPIs. |
Comments suppressed due to low confidence (1)
services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe:74
- In
org_page_kpis_final, the FULL OUTER JOINs are all joined onc.organizationId. This prevents metrics from being merged into a single row when an org is missing fromcurrent_contributorsbut present inprevious_contributors/maintainer_roles/critical_projects(it can yield split rows per org and wrong coalesced values). Consider joining usingUSING (organizationId)or joining oncoalesce(c.organizationId, p.organizationId, m.organizationId, cp.organizationId)/ a derived distinct-orgs base set, then LEFT JOIN each metric node.
FROM org_page_kpis_current_contributors c
FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId
FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId
FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId | ||
| FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId | ||
| FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId | ||
| WHERE organizationId != '' |
| - INTERVAL 1 DAY | ||
| ) | ||
| END AS endDate, | ||
| count() AS activityCount |
| uniqExact(af.memberId) AS contributorCount | ||
| FROM numbers(1000) numbers |
| SELECT count(distinct memberId) as count | ||
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE | ||
| organizationId = {{ String(orgId, '') }} |
| AND platform = 'website' | ||
| AND type = 'primary' |
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.
Comments suppressed due to low confidence (1)
services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe:80
- These new nightly COPY schedules (01:15 and 01:30 UTC) overlap with existing scheduled COPY pipes (e.g. multiple leaderboards jobs already run at 01:15 and 01:30). Running additional cross-segment scans at the same minute can increase Tinybird load/cost and raise the risk of timeouts. Consider staggering these schedules (or coordinating them with existing jobs) to smooth peak load.
TYPE COPY
TARGET_DATASOURCE org_page_kpis_copy_ds
COPY_MODE replace
COPY_SCHEDULE 15 1 * * *
| AND platform = 'website' | ||
| AND type = 'primary' |
| coalesce( | ||
| c.organizationId, p.organizationId, m.organizationId, cp.organizationId | ||
| ) AS organizationId, | ||
| coalesce(c.activeContributors, 0) AS activeContributors, | ||
| coalesce(p.activeContributorsPrevious, 0) AS activeContributorsPrevious, | ||
| coalesce(m.maintainerRoles, 0) AS maintainerRoles, | ||
| coalesce(cp.criticalProjects, 0) AS criticalProjects, | ||
| now() AS computedAt | ||
| FROM org_page_kpis_current_contributors c | ||
| FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId | ||
| FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId | ||
| FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId |
| organizationId = {{ String(orgId, '') }} | ||
| {% if defined(startDate) %} AND timestamp >= {{ DateTime(startDate) }} {% end %} | ||
| {% if defined(endDate) %} AND timestamp < {{ DateTime(endDate) }} {% end %} |
| DESCRIPTION > | ||
| Activity timeseries for a given organization, bucketed by year (all-time). | ||
|
|
||
| TAGS "Organization page" | ||
|
|
||
| NODE org_page_activities_timeseries_data | ||
| SQL > | ||
| % | ||
| SELECT | ||
| toStartOfYear(timestamp) AS startDate, | ||
| toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate, | ||
| count() AS activityCount | ||
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE | ||
| organizationId = {{ String(orgId, '', description="Organization ID", required=True) }} | ||
| AND timestamp >= '2005-01-01' | ||
| GROUP BY startDate, endDate | ||
| ORDER BY startDate |
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_projects_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_projects_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
| DESCRIPTION > | ||
| Activity timeseries for a given organization, bucketed by year (all-time). | ||
|
|
| FULL OUTER JOIN | ||
| org_page_kpis_critical_projects cp | ||
| ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId | ||
| WHERE organizationId != '' |
| b.industry, | ||
| b.description, | ||
| w.website, | ||
| domain(w.website) AS domain |
| contributorCount, | ||
| maintainersCount, | ||
| totalContributors, | ||
| orgContributors, |
| SELECT | ||
| organizationId, | ||
| toStartOfYear(timestamp) AS startDate, | ||
| toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate, | ||
| count() AS activityCount, | ||
| now() AS computedAt | ||
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE organizationId != '' AND timestamp >= '2005-01-01' |
| SELECT | ||
| organizationId, | ||
| toStartOfYear(timestamp) AS startDate, | ||
| toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate, | ||
| uniq(memberId) AS contributorCount, | ||
| now() AS computedAt | ||
| FROM activityRelations_deduplicated_cleaned_bucket_union | ||
| WHERE organizationId != '' AND timestamp >= '2005-01-01' |
| b.industry, | ||
| b.description, | ||
| w.website, | ||
| domain(w.website) AS domain |
There was a problem hiding this comment.
domain() returns empty string for non-URL input
Medium Severity
The ClickHouse domain() function expects a full URL with protocol (e.g., https://example.com/path) and returns empty string for plain domain names. The value from organizationIdentities with type = 'primary-domain' likely stores a bare domain (e.g., example.com) not a URL, so domain(w.website) would always produce an empty string instead of the expected domain value.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit fc1a7d1. Configure here.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
…d-pipes Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
| FULL OUTER JOIN | ||
| org_page_kpis_critical_projects cp | ||
| ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId | ||
| WHERE organizationId != '' |
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_activities_timeseries_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
| DESCRIPTION > | ||
| Activity timeseries for a given organization, bucketed by year (all-time). | ||
|
|
||
| TAGS "Organization page" |
| FROM members_sorted AS m ANY | ||
| INNER JOIN org_page_contributors_activity_aggregates agg ON agg.memberId = m.id | ||
| LEFT JOIN member_roles mr ON mr.memberId = m.id | ||
| WHERE m.id IN (SELECT memberId FROM org_page_contributors_activity_aggregates) |
| NODE org_page_contributors_activity_aggregates | ||
| SQL > | ||
| % | ||
| {% if Boolean(count, false) %} | ||
| SELECT count(distinct memberId) | ||
| FROM activityRelations_deduplicated_cleaned_bucket_union |
| totalPrsOpened, | ||
| orgPrsOpened, | ||
| technicalScore | ||
| FROM org_page_projects_copy_ds FINAL |
| NODE org_page_contributors_timeseries_data | ||
| SQL > | ||
| SELECT startDate, endDate, contributorCount | ||
| FROM org_page_contributors_timeseries_copy_ds FINAL |
| NODE org_page_activities_timeseries_data | ||
| SQL > | ||
| SELECT startDate, endDate, activityCount | ||
| FROM org_page_activities_timeseries_copy_ds FINAL |
| a.contributorCount, | ||
| now() AS computedAt | ||
| FROM org_page_projects_org_segment_activity a | ||
| LEFT JOIN insights_projects_populated_ds p ON a.segmentId = p.segmentId |
| DESCRIPTION > | ||
| Nightly copy pipe that precomputes yearly unique contributor counts per organization for the org page. | ||
| Writes one row per (organizationId, startDate) into org_page_contributors_timeseries_copy_ds. | ||
|
|
||
| TAGS "Organization page" |
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_projects_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
There was a problem hiding this comment.
Two heavy copy pipes scheduled at identical time
Low Severity
org_page_projects_copy_pipe and org_page_activities_timeseries_copy_pipe both use COPY_SCHEDULE 30 1 * * *. Both perform heavy full scans of activityRelations_deduplicated_cleaned_bucket_union (a 10-bucket UNION ALL). The other two new copy pipes are staggered at 01:15 and 01:45, suggesting the intent was to spread load, but these two ended up at the same time.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b387268. Configure here.
| WHERE | ||
| organizationId = (SELECT id FROM org_slug_lookup) | ||
| {% if defined(startDate) %} AND timestamp >= {{ DateTime(startDate) }} {% end %} | ||
| {% if defined(endDate) %} AND timestamp < {{ DateTime(endDate) }} {% end %} |
There was a problem hiding this comment.
Count path duplicates expensive cross-segment table scan
Medium Severity
When count=true, the org_page_contributors_leaderboard node independently queries activityRelations_deduplicated_cleaned_bucket_union for the same count already computed (but unused) in org_page_contributors_activity_aggregates. Unlike contributors_leaderboard.pipe which references the project-scoped activities_filtered, this pipe scans the full unscoped cross-segment union (10-bucket UNION ALL) twice per request-time count call.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit b387268. Configure here.
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
| SELECT id, displayName, logo, size AS employeeCount, industry, headline AS description | ||
| FROM organizations FINAL | ||
| WHERE id = (SELECT id FROM org_slug_lookup) |
| FROM members_sorted AS m ANY | ||
| INNER JOIN org_page_contributors_activity_aggregates agg ON agg.memberId = m.id | ||
| LEFT JOIN member_roles mr ON mr.memberId = m.id | ||
| WHERE m.id IN (SELECT memberId FROM org_page_contributors_activity_aggregates) |
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_activities_timeseries_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
| FULL OUTER JOIN | ||
| org_page_kpis_critical_projects cp | ||
| ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId | ||
| WHERE organizationId != '' |
| b.industry, | ||
| b.description, | ||
| w.website, | ||
| domain(w.website) AS domain, |
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a4d60ca. Configure here.
| TYPE COPY | ||
| TARGET_DATASOURCE org_page_activities_timeseries_copy_ds | ||
| COPY_MODE replace | ||
| COPY_SCHEDULE 30 1 * * * |
There was a problem hiding this comment.
Two heavy copy pipes share the same schedule
Low Severity
org_page_activities_timeseries_copy_pipe and org_page_projects_copy_pipe both use COPY_SCHEDULE 30 1 * * *, meaning they run simultaneously at 01:30 UTC. The other two copy pipes are intentionally staggered (01:15 and 01:45), and the PR architecture notes mention offset scheduling to avoid resource contention. The projects copy pipe is especially heavy (multiple full scans with complex joins), so running another copy pipe concurrently is likely unintentional. The activities timeseries pipe likely belongs at a different offset (e.g., 00 2 * * * or similar).
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a4d60ca. Configure here.


Summary
org_page_kpis_copy_ds,org_page_projects_copy_ds) rebuilt nightly for cheap request-time lookupsorgIdThese pipes back the new
/organization/{orgId}frontend page on thefeat/org-pagebranch. Frontend handlers currently return mock data and will be wired to these pipes once they are deployed and seeded.Architecture notes
org_dash_metric_copy_pipe) to avoid expensive cross-segment scans at request timeactivityRelations_deduplicated_cleaned_bucket_union(cross-segment, not the routing pipe)Test plan
org_page_kpis_copy_pipeandorg_page_projects_copy_pipemanually ("Run now" in Tinybird UI) after deploy to seed the datasourcesorg_page_kpis_copy_ds(≈ #orgs) andorg_page_projects_copy_ds(≈ #orgs × avg #projects)org_page_kpisandorg_page_projectspipes via Tinybird API with a knownorgIdand confirm correct output shapesorg_page_activities_timeseriesreturns filled buckets forgranularity=monthly🤖 Generated with Claude Code
Note
Medium Risk
Adds multiple new Tinybird COPY jobs and request-time pipes that will run on schedules and power a new org page, so failures or inefficient queries could impact data freshness and Tinybird workload.
Overview
Adds a new Organization page Tinybird dataset: four new
ReplacingMergeTreecopy datasources plus nightlyCOPYpipes to precompute org KPIs, per-project metrics (including a computedtechnicalScore), and yearly activity/contributor timeseries for cheap request-time lookups.Introduces request-time pipes (
org_page_profile,org_page_kpis,org_page_projects,org_page_activities_timeseries,org_page_contributors_timeseries,org_page_contributors) that resolveorgSlugtoorganizationIdand return the org profile, KPI trends, projects table, timeseries, and contributor leaderboard.Updates
organizations_leaderboard.pipeto includeslugin results and to join againstorganizations_populated_sluginstead oforganizations.Reviewed by Cursor Bugbot for commit a4d60ca. Bugbot is set up for automated code reviews on this repo. Configure here.