Skip to content
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
DESCRIPTION >
Precomputed yearly activity counts per organization for the org page. Rebuilt nightly by org_page_activities_timeseries_copy_pipe.
One row per (organizationId, startDate). Used by org_page_activities_timeseries.pipe for cheap request-time lookups.

SCHEMA >
`organizationId` String,
`startDate` Date,
`endDate` Date,
`activityCount` UInt64,
`computedAt` DateTime

ENGINE ReplacingMergeTree
ENGINE_SORTING_KEY organizationId, startDate
ENGINE_VER computedAt
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
DESCRIPTION >
Precomputed yearly unique contributor counts per organization for the org page. Rebuilt nightly by org_page_contributors_timeseries_copy_pipe.
One row per (organizationId, startDate). Used by org_page_contributors_timeseries.pipe for cheap request-time lookups.

SCHEMA >
`organizationId` String,
`startDate` Date,
`endDate` Date,
`contributorCount` UInt64,
`computedAt` DateTime

ENGINE ReplacingMergeTree
ENGINE_SORTING_KEY organizationId, startDate
ENGINE_VER computedAt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
DESCRIPTION >
Precomputed organization-level KPIs for the org page. Rebuilt nightly by org_page_kpis_copy_pipe.
One row per organizationId. Used by org_page_kpis.pipe for cheap request-time lookups.

SCHEMA >
`organizationId` String,
`activeContributors` UInt64,
`activeContributorsPrevious` UInt64,
`maintainerRoles` UInt64,
`criticalProjects` UInt64,
`computedAt` DateTime

ENGINE ReplacingMergeTree
ENGINE_SORTING_KEY organizationId
ENGINE_VER computedAt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
DESCRIPTION >
Precomputed per-org per-project metrics for the org page. Rebuilt nightly by org_page_projects_copy_pipe.
One row per (organizationId, segmentId). Used by org_page_projects.pipe.

SCHEMA >
`organizationId` String,
`segmentId` String,
`projectSlug` String,
`projectName` String,
`projectLogo` String,
`activityCount` UInt64,
`contributorCount` UInt64,
`maintainersCount` UInt64,
`totalContributors` UInt64,
`orgContributors` UInt64,
`totalCommits` UInt64,
`orgCommits` UInt64,
`totalPrsOpened` UInt64,
`orgPrsOpened` UInt64,
`technicalScore` Float64,
`computedAt` DateTime

ENGINE ReplacingMergeTree
ENGINE_SORTING_KEY organizationId, segmentId
ENGINE_VER computedAt
17 changes: 17 additions & 0 deletions services/libs/tinybird/pipes/org_page_activities_timeseries.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
DESCRIPTION >
Activity timeseries for a given organization, bucketed by year (all-time).

Comment on lines +1 to +3
TAGS "Organization page"
Comment on lines +1 to +4

NODE org_slug_lookup
SQL >
%
SELECT id FROM organizations_populated_slug
WHERE slug = {{ String(orgSlug, '', description="Organization slug", required=True) }}

NODE org_page_activities_timeseries_data
SQL >
SELECT startDate, endDate, activityCount
FROM org_page_activities_timeseries_copy_ds FINAL
WHERE organizationId = (SELECT id FROM org_slug_lookup)
ORDER BY startDate
Comment on lines +1 to +17
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
DESCRIPTION >
Nightly copy pipe that precomputes yearly activity counts per organization for the org page.
Writes one row per (organizationId, startDate) into org_page_activities_timeseries_copy_ds.

TAGS "Organization page"

NODE org_page_activities_timeseries_copy_pipe_data
SQL >
SELECT
organizationId,
toStartOfYear(timestamp) AS startDate,
toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate,
count() AS activityCount,
now() AS computedAt
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE organizationId != '' AND timestamp >= '2005-01-01'
GROUP BY organizationId, startDate, endDate

TYPE COPY
TARGET_DATASOURCE org_page_activities_timeseries_copy_ds
COPY_MODE replace
COPY_SCHEDULE 30 1 * * *
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two heavy copy pipes share the same schedule

Low Severity

org_page_activities_timeseries_copy_pipe and org_page_projects_copy_pipe both use COPY_SCHEDULE 30 1 * * *, meaning they run simultaneously at 01:30 UTC. The other two copy pipes are intentionally staggered (01:15 and 01:45), and the PR architecture notes mention offset scheduling to avoid resource contention. The projects copy pipe is especially heavy (multiple full scans with complex joins), so running another copy pipe concurrently is likely unintentional. The activities timeseries pipe likely belongs at a different offset (e.g., 00 2 * * * or similar).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a4d60ca. Configure here.

73 changes: 73 additions & 0 deletions services/libs/tinybird/pipes/org_page_contributors.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
DESCRIPTION >
Top contributors for a given organization leaderboard.
Returns members sorted by contribution count within the specified date range.

TAGS "Organization page"

NODE org_slug_lookup
SQL >
%
SELECT id FROM organizations_populated_slug
WHERE slug = {{ String(orgSlug, '', description="Organization slug", required=True) }}

NODE org_page_contributors_activity_aggregates
SQL >
%
{% if Boolean(count, false) %}
SELECT count(distinct memberId)
FROM activityRelations_deduplicated_cleaned_bucket_union
Comment on lines +13 to +18
WHERE
organizationId = (SELECT id FROM org_slug_lookup)
{% if defined(startDate) %}
AND timestamp
>= {{ DateTime(startDate, description="Filter activity timestamp after") }}
{% end %}
{% if defined(endDate) %}
AND timestamp < {{ DateTime(endDate, description="Filter activity timestamp before") }}
{% end %}
{% else %}
SELECT
memberId,
count() as "contributionCount",
ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) as "contributionPercentage"
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId = (SELECT id FROM org_slug_lookup)
{% if defined(startDate) %}
AND timestamp
>= {{ DateTime(startDate, description="Filter activity timestamp after") }}
{% end %}
{% if defined(endDate) %}
AND timestamp < {{ DateTime(endDate, description="Filter activity timestamp before") }}
{% end %}
GROUP BY memberId
ORDER BY contributionCount DESC, memberId DESC
LIMIT {{ Int32(limit, 10) }}
OFFSET {{ Int32(offset, 0) }}
{% end %}

NODE org_page_contributors_leaderboard
SQL >
%
{% if Boolean(count, false) %}
SELECT count(distinct memberId) as count
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId = (SELECT id FROM org_slug_lookup)
{% if defined(startDate) %} AND timestamp >= {{ DateTime(startDate) }} {% end %}
{% if defined(endDate) %} AND timestamp < {{ DateTime(endDate) }} {% end %}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Count path duplicates expensive cross-segment table scan

Medium Severity

When count=true, the org_page_contributors_leaderboard node independently queries activityRelations_deduplicated_cleaned_bucket_union for the same count already computed (but unused) in org_page_contributors_activity_aggregates. Unlike contributors_leaderboard.pipe which references the project-scoped activities_filtered, this pipe scans the full unscoped cross-segment union (10-bucket UNION ALL) twice per request-time count call.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b387268. Configure here.

{% else %}
SELECT
m.id,
m.avatar,
m.displayName,
m.githubHandleArray,
agg.contributionCount,
agg.contributionPercentage,
mr.roles
FROM members_sorted AS m ANY
INNER JOIN org_page_contributors_activity_aggregates agg ON agg.memberId = m.id
LEFT JOIN member_roles mr ON mr.memberId = m.id
WHERE m.id IN (SELECT memberId FROM org_page_contributors_activity_aggregates)
ORDER BY agg.contributionCount DESC
{% end %}
17 changes: 17 additions & 0 deletions services/libs/tinybird/pipes/org_page_contributors_timeseries.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
DESCRIPTION >
Contributor count timeseries for a given organization, bucketed by year (all-time).

TAGS "Organization page"

NODE org_slug_lookup
SQL >
%
SELECT id FROM organizations_populated_slug
WHERE slug = {{ String(orgSlug, '', description="Organization slug", required=True) }}

NODE org_page_contributors_timeseries_data
SQL >
SELECT startDate, endDate, contributorCount
FROM org_page_contributors_timeseries_copy_ds FINAL
WHERE organizationId = (SELECT id FROM org_slug_lookup)
ORDER BY startDate
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
DESCRIPTION >
Nightly copy pipe that precomputes yearly unique contributor counts per organization for the org page.
Writes one row per (organizationId, startDate) into org_page_contributors_timeseries_copy_ds.

TAGS "Organization page"
Comment on lines +1 to +5

NODE org_page_contributors_timeseries_copy_pipe_data
SQL >
SELECT
organizationId,
toStartOfYear(timestamp) AS startDate,
toDate(toStartOfYear(timestamp) + INTERVAL 1 YEAR - INTERVAL 1 DAY) AS endDate,
uniq(memberId) AS contributorCount,
now() AS computedAt
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE organizationId != '' AND timestamp >= '2005-01-01'
GROUP BY organizationId, startDate, endDate

TYPE COPY
TARGET_DATASOURCE org_page_contributors_timeseries_copy_ds
COPY_MODE replace
COPY_SCHEDULE 45 1 * * *
33 changes: 33 additions & 0 deletions services/libs/tinybird/pipes/org_page_kpis.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
DESCRIPTION >
Returns KPIs for a given organization from the precomputed org_page_kpis_copy_ds.
Includes trend calculations comparing current to previous 365-day period.

TAGS "Organization page"

NODE org_slug_lookup
SQL >
%
SELECT id FROM organizations_populated_slug
WHERE slug = {{ String(orgSlug, '', description="Organization slug", required=True) }}

NODE org_page_kpis_main
SQL >
SELECT
activeContributors,
if(
activeContributorsPrevious = 0,
0,
round(
(toInt64(activeContributors) - toInt64(activeContributorsPrevious))
/ activeContributorsPrevious
* 100,
1
)
) AS activeContributorsTrend,
toInt64(activeContributors)
- toInt64(activeContributorsPrevious) AS activeContributorsTrendAbsolute,
activeContributorsPrevious AS activeContributorsTrendPrevious,
maintainerRoles,
criticalProjects
FROM org_page_kpis_copy_ds FINAL
WHERE organizationId = (SELECT id FROM org_slug_lookup)
84 changes: 84 additions & 0 deletions services/libs/tinybird/pipes/org_page_kpis_copy_pipe.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
DESCRIPTION >
Nightly copy pipe that precomputes org-level KPIs for the org page.
Writes one row per organizationId into org_page_kpis_copy_ds.

TAGS "Organization page"

NODE org_page_kpis_current_contributors
DESCRIPTION >
Active contributors per org in the last 365 days

SQL >
SELECT organizationId, uniq(memberId) AS activeContributors
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId != ''
AND timestamp >= toStartOfDay(now() - toIntervalDay(365))
AND timestamp < toStartOfDay(now() + toIntervalDay(1))
GROUP BY organizationId

NODE org_page_kpis_previous_contributors
DESCRIPTION >
Active contributors per org in the prior 365-day window (for trend calc)

SQL >
SELECT organizationId, uniq(memberId) AS activeContributorsPrevious
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId != ''
AND timestamp >= toStartOfDay(now() - toIntervalDay(730))
AND timestamp < toStartOfDay(now() - toIntervalDay(365))
GROUP BY organizationId

NODE org_page_kpis_maintainer_roles
DESCRIPTION >
Count of active maintainer role assignments per org

SQL >
SELECT organizationId, uniq((memberId, insightsProjectId)) AS maintainerRoles
FROM maintainers_roles_copy_ds
WHERE role = 'maintainer' AND toYear(endDate) <= 1970 AND organizationId != ''
GROUP BY organizationId

NODE org_page_kpis_critical_projects
DESCRIPTION >
Count of distinct projects (segmentIds) an org contributed to in the last 365 days.
Serves as the "critical projects" placeholder until a real criticality filter is added.

SQL >
SELECT organizationId, uniq(segmentId) AS criticalProjects
FROM activityRelations_deduplicated_cleaned_bucket_union
WHERE
organizationId != ''
AND timestamp >= toStartOfDay(now() - toIntervalDay(365))
AND timestamp < toStartOfDay(now() + toIntervalDay(1))
GROUP BY organizationId

NODE org_page_kpis_final
DESCRIPTION >
Join all nodes into one row per org

SQL >
SELECT
coalesce(
c.organizationId, p.organizationId, m.organizationId, cp.organizationId
) AS organizationId,
coalesce(c.activeContributors, 0) AS activeContributors,
coalesce(p.activeContributorsPrevious, 0) AS activeContributorsPrevious,
coalesce(m.maintainerRoles, 0) AS maintainerRoles,
coalesce(cp.criticalProjects, 0) AS criticalProjects,
now() AS computedAt
FROM org_page_kpis_current_contributors c
FULL OUTER JOIN org_page_kpis_previous_contributors p ON c.organizationId = p.organizationId
FULL OUTER JOIN
org_page_kpis_maintainer_roles m
ON coalesce(c.organizationId, p.organizationId) = m.organizationId
FULL OUTER JOIN
org_page_kpis_critical_projects cp
ON coalesce(c.organizationId, p.organizationId, m.organizationId) = cp.organizationId
WHERE organizationId != ''
Comment thread
cursor[bot] marked this conversation as resolved.

TYPE COPY
TARGET_DATASOURCE org_page_kpis_copy_ds
COPY_MODE replace
COPY_SCHEDULE 15 1 * * *
40 changes: 40 additions & 0 deletions services/libs/tinybird/pipes/org_page_profile.pipe
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
DESCRIPTION >
Organization profile for the org page. Returns one row for a given orgId.
Joins organizationIdentities for website and domain.
Comment on lines +2 to +3

TAGS "Organization page"

NODE org_slug_lookup
SQL >
%
SELECT id FROM organizations_populated_slug
WHERE slug = {{ String(orgSlug, '', description="Organization slug", required=True) }}

NODE org_page_profile_base
SQL >
SELECT id, displayName, logo, employees AS employeeCount, industry, headline AS description
FROM organizations FINAL
WHERE id = (SELECT id FROM org_slug_lookup)
Comment on lines +15 to +17

NODE org_page_profile_website
SQL >
SELECT organizationId, argMax(value, updatedAt) AS website
FROM organizationIdentities FINAL
WHERE
organizationId = (SELECT id FROM org_slug_lookup)
AND type = 'primary-domain'
GROUP BY organizationId

NODE org_page_profile_final
SQL >
SELECT
b.id,
b.displayName,
b.logo,
b.employeeCount,
b.industry,
b.description,
w.website,
domain(w.website) AS domain
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

domain() returns empty string for non-URL input

Medium Severity

The ClickHouse domain() function expects a full URL with protocol (e.g., https://example.com/path) and returns empty string for plain domain names. The value from organizationIdentities with type = 'primary-domain' likely stores a bare domain (e.g., example.com) not a URL, so domain(w.website) would always produce an empty string instead of the expected domain value.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit fc1a7d1. Configure here.

FROM org_page_profile_base b
LEFT JOIN org_page_profile_website w ON b.id = w.organizationId
Loading
Loading