Skip to content

feat: add Tinybird pipes for organization page#4128

Open
gaspergrom wants to merge 14 commits into
mainfrom
feat/org-page-tinybird-pipes
Open

feat: add Tinybird pipes for organization page#4128
gaspergrom wants to merge 14 commits into
mainfrom
feat/org-page-tinybird-pipes

Conversation

@gaspergrom
Copy link
Copy Markdown
Contributor

@gaspergrom gaspergrom commented May 18, 2026

Summary

  • Adds 2 precomputed copy datasources (org_page_kpis_copy_ds, org_page_projects_copy_ds) rebuilt nightly for cheap request-time lookups
  • Adds 2 nightly COPY pipes to aggregate org-level KPIs and per-org/per-project metrics across all segments
  • Adds 6 request-time pipes: profile, KPIs, projects table, activity timeseries, contributor timeseries, contributors leaderboard — all scoped by orgId

These pipes back the new /organization/{orgId} frontend page on the feat/org-page branch. Frontend handlers currently return mock data and will be wired to these pipes once they are deployed and seeded.

Architecture notes

  • KPIs and projects use precomputed copy datasources (pattern from org_dash_metric_copy_pipe) to avoid expensive cross-segment scans at request time
  • All activity queries go through activityRelations_deduplicated_cleaned_bucket_union (cross-segment, not the routing pipe)
  • Copy pipes run at 01:15 and 01:30 UTC, offset from the existing 00:50 UTC copy job
  • Membership tier and technical influence score are intentionally omitted — data not available yet; will be added in a follow-up

Test plan

  • Trigger org_page_kpis_copy_pipe and org_page_projects_copy_pipe manually ("Run now" in Tinybird UI) after deploy to seed the datasources
  • Verify row counts in org_page_kpis_copy_ds (≈ #orgs) and org_page_projects_copy_ds (≈ #orgs × avg #projects)
  • Hit org_page_kpis and org_page_projects pipes via Tinybird API with a known orgId and confirm correct output shapes
  • Verify org_page_activities_timeseries returns filled buckets for granularity=monthly

🤖 Generated with Claude Code


Note

Medium Risk
Adds multiple new Tinybird COPY jobs and request-time pipes that will run on schedules and power a new org page, so failures or inefficient queries could impact data freshness and Tinybird workload.

Overview
Adds a new Organization page Tinybird dataset: four new ReplacingMergeTree copy datasources plus nightly COPY pipes to precompute org KPIs, per-project metrics (including a computed technicalScore), and yearly activity/contributor timeseries for cheap request-time lookups.

Introduces request-time pipes (org_page_profile, org_page_kpis, org_page_projects, org_page_activities_timeseries, org_page_contributors_timeseries, org_page_contributors) that resolve orgSlug to organizationId and return the org profile, KPI trends, projects table, timeseries, and contributor leaderboard.

Updates organizations_leaderboard.pipe to include slug in results and to join against organizations_populated_slug instead of organizations.

Reviewed by Cursor Bugbot for commit a4d60ca. Bugbot is set up for automated code reviews on this repo. Configure here.

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings May 18, 2026 15:50
@github-actions
Copy link
Copy Markdown
Contributor

⚠️ Jira Issue Key Missing

Your PR title doesn't contain a Jira issue key. Consider adding it for better traceability.

Example:

  • feat: add user authentication (CM-123)
  • feat: add user authentication (IN-123)

Projects:

  • CM: Community Data Platform
  • IN: Insights

Please add a Jira issue key to your PR title.

…page pipes

Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds new Tinybird datasources and pipes to power the /organization/{orgId} frontend page, including nightly precomputed COPY datasources for cheaper request-time lookups and several request-time endpoints scoped by orgId.

Changes:

  • Introduces nightly COPY pipes + ReplacingMergeTree datasources for org KPIs and org→project metrics.
  • Adds request-time pipes for org profile, KPIs, projects list, activity timeseries, contributor timeseries, and contributor leaderboard.
  • Routes org activity-based queries through activityRelations_deduplicated_cleaned_bucket_union.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
services/libs/tinybird/pipes/org_page_projects.pipe Request-time pipe to list projects for an org from precomputed copy DS.
services/libs/tinybird/pipes/org_page_projects_copy_pipe.pipe Nightly COPY pipe to build org→project metrics datasource.
services/libs/tinybird/pipes/org_page_profile.pipe Request-time pipe to fetch org profile + website/domain.
services/libs/tinybird/pipes/org_page_kpis.pipe Request-time pipe to fetch KPI values + trends from precomputed copy DS.
services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe Nightly COPY pipe to build org KPI datasource.
services/libs/tinybird/pipes/org_page_contributors.pipe Request-time contributors leaderboard for an org (count + paginated data).
services/libs/tinybird/pipes/org_page_contributors_timeseries.pipe Request-time contributor-count timeseries by granularity.
services/libs/tinybird/pipes/org_page_activities_timeseries.pipe Request-time activity-count timeseries by granularity.
services/libs/tinybird/datasources/org_page_projects_copy_ds.datasource New ReplacingMergeTree datasource for org→project metrics.
services/libs/tinybird/datasources/org_page_kpis_copy_ds.datasource New ReplacingMergeTree datasource for org-level KPIs.
Comments suppressed due to low confidence (1)

services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe:74

  • In org_page_kpis_final, the FULL OUTER JOINs are all joined on c.organizationId. This prevents metrics from being merged into a single row when an org is missing from current_contributors but present in previous_contributors/maintainer_roles/critical_projects (it can yield split rows per org and wrong coalesced values). Consider joining using USING (organizationId) or joining on coalesce(c.organizationId, p.organizationId, m.organizationId, cp.organizationId) / a derived distinct-orgs base set, then LEFT JOIN each metric node.
    FROM org_page_kpis_current_contributors c
    FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId
    FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId
    FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId
FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId
FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId
WHERE organizationId != ''
- INTERVAL 1 DAY
)
END AS endDate,
count() AS activityCount
Comment on lines +68 to +69
uniqExact(af.memberId) AS contributorCount
FROM numbers(1000) numbers
SELECT count(distinct memberId) as count
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId = {{ String(orgId, '') }}
Comment on lines +29 to +30
AND platform = 'website'
AND type = 'primary'
Comment thread services/libs/tinybird/pipes/org_page_activities_timeseries.pipe Outdated
Comment thread services/libs/tinybird/pipes/org_page_contributors_timeseries.pipe Outdated
Comment thread services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
@gaspergrom gaspergrom requested review from epipav and joanagmaia May 18, 2026 17:12
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings May 18, 2026 17:16
@gaspergrom gaspergrom removed the request for review from epipav May 18, 2026 17:16
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Comments suppressed due to low confidence (1)

services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe:80

  • These new nightly COPY schedules (01:15 and 01:30 UTC) overlap with existing scheduled COPY pipes (e.g. multiple leaderboards jobs already run at 01:15 and 01:30). Running additional cross-segment scans at the same minute can increase Tinybird load/cost and raise the risk of timeouts. Consider staggering these schedules (or coordinating them with existing jobs) to smooth peak load.
TYPE COPY
TARGET_DATASOURCE org_page_kpis_copy_ds
COPY_MODE replace
COPY_SCHEDULE 15 1 * * *

Comment on lines +21 to +22
AND platform = 'website'
AND type = 'primary'
Comment on lines +63 to +74
coalesce(
c.organizationId, p.organizationId, m.organizationId, cp.organizationId
) AS organizationId,
coalesce(c.activeContributors, 0) AS activeContributors,
coalesce(p.activeContributorsPrevious, 0) AS activeContributorsPrevious,
coalesce(m.maintainerRoles, 0) AS maintainerRoles,
coalesce(cp.criticalProjects, 0) AS criticalProjects,
now() AS computedAt
FROM org_page_kpis_current_contributors c
FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId
FULL OUTER JOIN org_page_kpis_maintainer_roles m ON c.organizationId = m.organizationId
FULL OUTER JOIN org_page_kpis_critical_projects cp ON c.organizationId = cp.organizationId
Comment on lines +50 to +52
organizationId = {{ String(orgId, '') }}
{% if defined(startDate) %} AND timestamp >= {{ DateTime(startDate) }} {% end %}
{% if defined(endDate) %} AND timestamp < {{ DateTime(endDate) }} {% end %}
Comment on lines +1 to +18
DESCRIPTION >
Activity timeseries for a given organization, bucketed by year (all-time).

TAGS "Organization page"

NODE org_page_activities_timeseries_data
SQL >
%
SELECT
toStartOfYear(timestamp) AS startDate,
toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate,
count() AS activityCount
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId = {{ String(orgId, '', description="Organization ID", required=True) }}
AND timestamp >= '2005-01-01'
GROUP BY startDate, endDate
ORDER BY startDate
TYPE COPY
TARGET_DATASOURCE org_page_projects_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Signed-off-by: Gašper Grom <gasper.grom@gmail.com>
Copilot AI review requested due to automatic review settings May 19, 2026 16:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 7 comments.

TYPE COPY
TARGET_DATASOURCE org_page_projects_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Comment on lines +1 to +3
DESCRIPTION >
Activity timeseries for a given organization, bucketed by year (all-time).

FULL OUTER JOIN
org_page_kpis_critical_projects cp
ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId
WHERE organizationId != ''
b.industry,
b.description,
w.website,
domain(w.website) AS domain
Comment on lines +15 to +18
contributorCount,
maintainersCount,
totalContributors,
orgContributors,
Comment on lines +9 to +16
SELECT
organizationId,
toStartOfYear(timestamp) AS startDate,
toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate,
count() AS activityCount,
now() AS computedAt
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE organizationId != '' AND timestamp >= '2005-01-01'
Comment on lines +9 to +16
SELECT
organizationId,
toStartOfYear(timestamp) AS startDate,
toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate,
uniq(memberId) AS contributorCount,
now() AS computedAt
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE organizationId != '' AND timestamp >= '2005-01-01'
b.industry,
b.description,
w.website,
domain(w.website) AS domain
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain() returns empty string for non-URL input

Medium Severity

The ClickHouse domain() function expects a full URL with protocol (e.g., https://example.com/path) and returns empty string for plain domain names. The value from organizationIdentities with type = 'primary-domain' likely stores a bare domain (e.g., example.com) not a URL, so domain(w.website) would always produce an empty string instead of the expected domain value.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fc1a7d1. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
…d-pipes

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings May 20, 2026 14:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 12 comments.

FULL OUTER JOIN
org_page_kpis_critical_projects cp
ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId
WHERE organizationId != ''
TYPE COPY
TARGET_DATASOURCE org_page_activities_timeseries_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Comment on lines +1 to +4
DESCRIPTION >
Activity timeseries for a given organization, bucketed by year (all-time).

TAGS "Organization page"
FROM members_sorted AS m ANY
INNER JOIN org_page_contributors_activity_aggregates agg ON agg.memberId = m.id
LEFT JOIN member_roles mr ON mr.memberId = m.id
WHERE m.id IN (SELECT memberId FROM org_page_contributors_activity_aggregates)
Comment on lines +13 to +18
NODE org_page_contributors_activity_aggregates
SQL >
%
{% if Boolean(count, false) %}
SELECT count(distinct memberId)
FROM activityRelations_deduplicated_cleaned_bucket_union
totalPrsOpened,
orgPrsOpened,
technicalScore
FROM org_page_projects_copy_ds FINAL
NODE org_page_contributors_timeseries_data
SQL >
SELECT startDate, endDate, contributorCount
FROM org_page_contributors_timeseries_copy_ds FINAL
NODE org_page_activities_timeseries_data
SQL >
SELECT startDate, endDate, activityCount
FROM org_page_activities_timeseries_copy_ds FINAL
a.contributorCount,
now() AS computedAt
FROM org_page_projects_org_segment_activity a
LEFT JOIN insights_projects_populated_ds p ON a.segmentId = p.segmentId
Comment on lines +1 to +5
DESCRIPTION >
Nightly copy pipe that precomputes yearly unique contributor counts per organization for the org page.
Writes one row per (organizationId, startDate) into org_page_contributors_timeseries_copy_ds.

TAGS "Organization page"
TYPE COPY
TARGET_DATASOURCE org_page_projects_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two heavy copy pipes scheduled at identical time

Low Severity

org_page_projects_copy_pipe and org_page_activities_timeseries_copy_pipe both use COPY_SCHEDULE 30 1 * * *. Both perform heavy full scans of activityRelations_deduplicated_cleaned_bucket_union (a 10-bucket UNION ALL). The other two new copy pipes are staggered at 01:15 and 01:45, suggesting the intent was to spread load, but these two ended up at the same time.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b387268. Configure here.

WHERE
organizationId = (SELECT id FROM org_slug_lookup)
{% if defined(startDate) %} AND timestamp >= {{ DateTime(startDate) }} {% end %}
{% if defined(endDate) %} AND timestamp < {{ DateTime(endDate) }} {% end %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count path duplicates expensive cross-segment table scan

Medium Severity

When count=true, the org_page_contributors_leaderboard node independently queries activityRelations_deduplicated_cleaned_bucket_union for the same count already computed (but unused) in org_page_contributors_activity_aggregates. Unlike contributors_leaderboard.pipe which references the project-scoped activities_filtered, this pipe scans the full unscoped cross-segment union (10-bucket UNION ALL) twice per request-time count call.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b387268. Configure here.

Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copilot AI review requested due to automatic review settings May 20, 2026 15:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 15 out of 15 changed files in this pull request and generated 5 comments.

Comment on lines +15 to +17
SELECT id, displayName, logo, size AS employeeCount, industry, headline AS description
FROM organizations FINAL
WHERE id = (SELECT id FROM org_slug_lookup)
FROM members_sorted AS m ANY
INNER JOIN org_page_contributors_activity_aggregates agg ON agg.memberId = m.id
LEFT JOIN member_roles mr ON mr.memberId = m.id
WHERE m.id IN (SELECT memberId FROM org_page_contributors_activity_aggregates)
TYPE COPY
TARGET_DATASOURCE org_page_activities_timeseries_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
FULL OUTER JOIN
org_page_kpis_critical_projects cp
ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId
WHERE organizationId != ''
b.industry,
b.description,
w.website,
domain(w.website) AS domain,
Signed-off-by: Joana Maia <jmaia@contractor.linuxfoundation.org>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit a4d60ca. Configure here.

TYPE COPY
TARGET_DATASOURCE org_page_activities_timeseries_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two heavy copy pipes share the same schedule

Low Severity

org_page_activities_timeseries_copy_pipe and org_page_projects_copy_pipe both use COPY_SCHEDULE 30 1 * * *, meaning they run simultaneously at 01:30 UTC. The other two copy pipes are intentionally staggered (01:15 and 01:45), and the PR architecture notes mention offset scheduling to avoid resource contention. The projects copy pipe is especially heavy (multiple full scans with complex joins), so running another copy pipe concurrently is likely unintentional. The activities timeseries pipe likely belongs at a different offset (e.g., 00 2 * * * or similar).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a4d60ca. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants